What’s the Difference Between Machine Learning Training and Inference?




At a high level, the goal artificial intelligence is to replace algorithms that are explicitly programmed by people with algorithms that learn on their own. Hand-coding routines is highly-inefficient, error-prone and not adaptable to new information. Instead, deep learning is used to learn features & patterns that best represent data automatically. The process by which these systems learn is called Training. Training is the phase in which the computer essentially tries to learn from your data.

Let’s use an example: Think of the almost infinite number of rules required to describe the nearly infinite ways to compose a handwritten letter A. With deep learning, we can simply “show” the computer thousands (or millions) of letter A’s until the computer “learns” how to best describe what an A looks like. After the computer learns how to identify an A, we can then ask it to do exactly that. This phase is called Inference.


Inference happens after training (and can’t happen without it). Just like a human education, the goal is to learn to do a job. That job phase is called inference, regardless if the trained neural network learned how to recognize images, text, or cancer cells. Inference essentially takes real-world data and quickly comes back with a prediction.

The importance of this phase is to minimize the scope of the model by removing any parts not necessary to make this prediction. Other layers are combined if the impact is negligible. These two processes are similar to image or video compression where the goal is to minimize file size while having the smallest impact on quality.

Which is more computationally intensive?

Training is computationally very expensive and is best accelerated with GPUs. Using even a small dataset, the time taken per epoch (one complete pass through all of the training samples) can be reduced from 3-4 minutes on a CPU to just a few seconds when using a GPU. This is especially true with deep learning, the fastest-growing field in machine learning, which uses many-layered and highly complex Deep Neural Networks (DNNs). For example, training a 152 layer ResNet network took Microsoft 3 weeks on a 4x GPU system.

Inference is less computationally intense but typically still benefits from a GPU due to the massive amount of data GPUs can process concurrently.

Wrapping up

Training is the process of developing an algorithm that will then get deployed where it can infer a result. These two steps are necessary for everything from your phone’s voice assistant, to Google’s spam detection algorithm, to Netflix’s recommendation engine. In a typical scenario today, Data Scientists operate on the model itself which is then handed off to a Dev Ops team who are responsible for make the model ready and available to process information. This is typically an iterative back-and-forth where the model is tuned (further optimized) and then re-deployed.