Understand Relu as a Piecewise Function

Relu is an activation function defined as:$$
Relu(x)=\begin{cases}
x, & x > 0 \\
0, & otherwise
\end{cases}
$$. This function has an interesting property that it can act like an open/close switch. To see why, suppose $\vec{x}$ is a vector, its entries are $x_i$’s. When a Relu layer is applied to $\vec{x}$, it will be an entry-wise check. Only those $x_i$’s that are greater than $0$ are passed, just as if electricity passes through a closed switch. All non-positive values are zeroed out, just as if electricity is blocked by an open switch.

When a neural network uses Relu layers inside it, it can be thought as each of the Relu layers is selecting some outputs of its preceding layer, therefore behaves as a feature selection. The training process of the neural network is then training a feature selection model. When the model has multiple layers, Relu is in fact selecting computed features.

So, one can try to design a feature selecting process using Relu. Like $$\text{Input} \rightarrow \text{D} \rightarrow \text{Relu} \rightarrow \text{More DNN}$$. Here, the $\text{D}$ is a layer that is equivalent to multiplying a diagonal matrix. After several epochs of training, all entries in the input that correspond to non-positive entries in $\text{D}$ can be discarded.

Understand Relu as a Piecewise Function

Comments