ReLU (Rectified Linear Unit), or Rectifier is an activation function defined as the positive part of the argument: $\mathrm{ReLU}(x) = \max(0,x)$.
In context of deep learning one of major advantages of ReLU is more computationally efficient and have shown better convergence performance in deep neural networks than other popular activation functions such as sigmoid[1].
Major benefits of using ReLU include:
- Reduced likelihood of vanishing gradient problem for large inputs. In $x > 0$ regime the gradient has a constant value of $1$. In contrast, gradient of sigmoid tends to $0$ as the absolute value of input increases.
- Sparsity. In $x \leq 0$ regime value of ReLU output is constant, resulting in a more sparse representation. Sigmoids on the other hand generate small non-zero values resulting in dense representations.
Disadvantages of using ReLU is blowing up activation (since $\mathrm{ReLU}(x) = x$ for $x > 0$ and inputs in turn are a sum of outputs of previous layers, the value of outputs can get very large); as well as "dying ReLU" problem when too many activations occur in $x < 0$ regime network will output zero prohibiting learning. This can be handled by using LeakyReLU instead.
See also
|