Intro to optimization in deep learning: Gradient Descent

Image Credits: O'Reilly Media

Deep Learning, to a large extent, is really about solving massive nasty optimization problems. A Neural Network is merely a very complicated function, consisting of millions of parameters, that represents a mathematical solution to a problem. Consider the task of image classification. AlexNet is a mathematical function that takes an array representing RGB values of an image, and produces the output as a bunch of class scores.

This is a companion discussion topic for the original entry at

Hi, I’d like to translate this optimization series to Chinese. Can you give me the permission to translate it?

The translated text will be published at and related Chinese social network accounts.


@weakish Definitely. As long as the post receives credit (link to blog mentioning it was originally posted there), then that’s fine.

Chinese translation:

Attribution to author is given at the beginning (in translator’s note) and there is a backlink at the end of the translated text.

BTW, some possible typos encountered during translation:

classifying images of images of cats as humans

images of cats

we can infinite directions on this plane

can have infinite directions

or convergence has has taken place


with only one minima we can converge too


out earlier approach processed all examples in one single batch


while trying to converge to a global maximum


@weakish Thanks for the corrections, the post has been updated.

1 Like

Would it be alright it I were to use the image of the local and global minima in my undergraduate research thesis on training using multitple loss functions? Definitely citing this article of course.

@Jeffrey_Cordero Not a problem at all if it’s cited. Thanks!

1 Like

I find funny that so much emphasis is given to citing this article when none of the images is original nor has citations. In Chrome, right-clicking an image and clicking “Search Google for image”.

I really like your explanations of these areas, and I would like to assign them as class reading in a Deep Learning course I teach. But, could you perhaps review the English a bit? For instance, “minimum” is the singular, and “minima” is the plural (Latin-root words). I don’t mean to criticize unfairly – you really do have a gift for clear explanation. It would just make it even clearer if a few of these items were cleared up.

A few more typos:

ot the steepness

The the size

the the projection

Very nice, thanks for posting it.

Do you have any python code tutorial for the article?