Intro to optimization in deep learning: Gradient Descent


Image Credits: O'Reilly Media

Deep Learning, to a large extent, is really about solving massive nasty optimization problems. A Neural Network is merely a very complicated function, consisting of millions of parameters, that represents a mathematical solution to a problem. Consider the task of image classification. AlexNet is a mathematical function that takes an array representing RGB values of an image, and produces the output as a bunch of class scores.

This is a companion discussion topic for the original entry at


Hi, I’d like to translate this optimization series to Chinese. Can you give me the permission to translate it?

The translated text will be published at and related Chinese social network accounts.



@weakish Definitely. As long as the post receives credit (link to blog mentioning it was originally posted there), then that’s fine.


Chinese translation:

Attribution to author is given at the beginning (in translator’s note) and there is a backlink at the end of the translated text.

BTW, some possible typos encountered during translation:

classifying images of images of cats as humans

images of cats

we can infinite directions on this plane

can have infinite directions

or convergence has has taken place


with only one minima we can converge too


out earlier approach processed all examples in one single batch


while trying to converge to a global maximum



@weakish Thanks for the corrections, the post has been updated.