hyperparameter (a-trainable parameter)” which seems to work better the leaky ReLU. This extension to leaky ReLU is known as Parametric ReLU.
The parameter α is generally a number between 0 and 1, and it is generally relatively small.
Have slight advantage over Leaky Relu due to trainable parameter.
Handle the problem of dying neuron.
Same as leaky Relu.
f(x) is monotonic when a> or =0 and f’(x) is monotonic when a =1
Google Brain Team has proposed a new activation function, named Swish, which is simply f(x) = x · sigmoid(x).
Their experiments show that Swish tends to work better than ReLU on deeper models across a number of challenging data sets.
The curve of the Swish function is smooth and the function is differentiable at all points. This is helpful during the model optimization process and is considered to be one of the reasons that swish outperforms ReLU.
Swish function is “not monotonic”. This means that the value of the function may decrease even when the input values are increasing.
ALReLU: A different approach on Leaky ReLU activation function to improve
Neural Networks Performance
Parametric Deformable Exponential Linear Units for deep neural
Elastic exponential linear units for convolutional neural networks
QReLU and m-QReLU: Two novel quantum activation functions to aid medical diagnostics
QReLU and m-QReLU
A two-step quantum approach was applied to ReLU first, by selecting its solution for positive values ( 𝑅(𝑧)= 𝑧,∀ 𝑧>0), and the Leaky ReLU’s solution for negative values (𝑅(𝑧)=𝛼×𝑧, ∀ 𝑧 ≤ 0, 𝑤ℎ𝑒𝑟𝑒 𝛼=0.01) as a starting point to improve quantistically.