This is the third post in the optimization series, where we are trying to give the reader a comprehensive review of optimization in deep learning. So far, we have looked at how:

This is a companion discussion topic for the original entry at https://blog.paperspace.com/vanishing-gradients-activation-function/