No

Activation Function

Pros

Cons

1.

Sigmoid
 
The output of the sigmoid function always ranges between 0 and 1

Sigmoid is Sshaped, ‘monotonic’ & ‘differential’ function.

Derivative /Differential of the sigmoid function (f’(x)) will lies between 0 and 0.25.

Derivative of the sigmoid function is not monotonic”.
 
Derivative of sigmoid function suffers “Vanishing gradient and Exploding gradient problem”.

Sigmoid function in not “zerocentric”. This makes the gradient updates go too far in different directions. 0 < output < 1, and it makes optimization harder.

Slow convergence as its computationally heavy. (Reason use of exponential math function



TanH
 
The function and its derivative both are monotonic

Output is zero “centric”

Optimization is easier

Derivative /Differential of the Tanh function (f’(x)) will lies between 0 and 1.
 
Derivative of Tanh function suffers “Vanishing gradient and Exploding gradient problem”.

Slow convergence as its computationally heavy. (Reason use of exponential math function)



ReLU
 
The function and its derivative both are monotonic.

Main advantage of using the ReLU function It does not activate all the neurons at the same time.

Computationally efficient

Derivative /Differential of the Tanh function (f’(x)) will be 1 if f(x) > 0 else 0.

Converge very fast
 
ReLu function in not “zerocentric”. This makes the gradient updates go too far in different directions. 0 < output < 1, and it makes optimization harder.

Dead neuron is the biggest problem. This is due to Nondifferentiable at zero.



LReLu
 