Analytical Example

Single-Layer Network with Sigmoid Activation Function

Let’s consider the simplest case of a neural network, consisting of a single output neuron and several inputs, to analytically derive the weight update formula.

Network Structure:

Input layer with $m$ neurons. Input vector: $\boldsymbol{x} = [x_1, x_2, \ldots, x_m]^T$ .
Output layer with one neuron $j$ .
Weights connecting the inputs to the output neuron: $\boldsymbol{w} = [w_{j1}, w_{j2}, \ldots, w_{jm}]^T$ .
Bias for the output neuron: $b_j$ .

Activation Function: As the non-linear activation function $\varphi$ , we will choose the logistic function (sigmoid):

\varphi(v) = \frac{1}{1 + e^{-v}}

Its derivative, which will be needed for the backward pass, has a simple expression in terms of the function itself:

\varphi'(v) = \varphi(v) \cdot (1 - \varphi(v))

Forward Pass:

Induced Local Field (Weighted Sum of Inputs):
$v_j = \sum_{i=1}^{m} w_{ji}x_i + b_j$
Neuron Output Signal: The output $y_j$ is obtained by applying the sigmoid activation function:
$y_j = \varphi(v_j) = \frac{1}{1 + e^{-v_j}}$
Output Error: The error is calculated as the difference between the desired output $d_j$ and the actual output $y_j$ :
$e_j = d_j - y_j$

Backward Pass and Explicit Weight Update Formula:

Local Gradient ( $\delta_j$ ): For an output neuron, the local gradient is the product of the error and the derivative of the activation function:
$\delta_j = e_j \cdot \varphi'(v_j)$
Since $y_j = \varphi(v_j)$ , we can substitute the expression for the derivative of the sigmoid:
$\delta_j = (d_j - y_j) \cdot y_j \cdot (1 - y_j)$
Weight Change ( $\Delta w_{ji}$ ): The correction for each weight $w_{ji}$ is calculated using the “delta rule”:
$\Delta w_{ji} = \eta \cdot \delta_j \cdot x_i$
where $\eta$ is the learning rate.
Explicit Expression for the Updated Weight: By substituting the expression for $\delta_j$ into the update formula, we get the final analytical expression for the new weight value:
$w_{ji}^{\text{new}} = w_{ji}^{\text{old}} + \Delta w_{ji}$ $w_{ji}^{\text{new}} = w_{ji}^{\text{old}} + \eta \cdot (d_j - y_j) \cdot y_j \cdot (1 - y_j) \cdot x_i$
Bias Update ( $\Delta b_j$ ): The bias is updated similarly (the input for the bias is +1):
$b_{j}^{\text{new}} = b_{j}^{\text{old}} + \eta \cdot (d_j - y_j) \cdot y_j \cdot (1 - y_j)$

Thus, we have obtained explicit formulas for updating all trainable parameters of the network by analytically calculating the gradient for a specific activation function.