Skip to Content

Analytical Example

Single-Layer Network with Sigmoid Activation Function

Let’s consider the simplest case of a neural network, consisting of a single output neuron and several inputs, to analytically derive the weight update formula.

Network Structure:

  • Input layer with mm neurons. Input vector: x=[x1,x2,,xm]T\boldsymbol{x} = [x_1, x_2, \ldots, x_m]^T.
  • Output layer with one neuron jj.
  • Weights connecting the inputs to the output neuron: w=[wj1,wj2,,wjm]T\boldsymbol{w} = [w_{j1}, w_{j2}, \ldots, w_{jm}]^T.
  • Bias for the output neuron: bjb_j.

Activation Function: As the non-linear activation function φ\varphi, we will choose the logistic function (sigmoid):

φ(v)=11+ev\varphi(v) = \frac{1}{1 + e^{-v}}

Its derivative, which will be needed for the backward pass, has a simple expression in terms of the function itself:

φ(v)=φ(v)(1φ(v))\varphi'(v) = \varphi(v) \cdot (1 - \varphi(v))

Forward Pass:

  1. Induced Local Field (Weighted Sum of Inputs):

    vj=i=1mwjixi+bjv_j = \sum_{i=1}^{m} w_{ji}x_i + b_j
  2. Neuron Output Signal: The output yjy_j is obtained by applying the sigmoid activation function:

    yj=φ(vj)=11+evjy_j = \varphi(v_j) = \frac{1}{1 + e^{-v_j}}
  3. Output Error: The error is calculated as the difference between the desired output djd_j and the actual output yjy_j:

    ej=djyje_j = d_j - y_j

Backward Pass and Explicit Weight Update Formula:

  1. Local Gradient (δj\delta_j): For an output neuron, the local gradient is the product of the error and the derivative of the activation function:

    δj=ejφ(vj)\delta_j = e_j \cdot \varphi'(v_j)

    Since yj=φ(vj)y_j = \varphi(v_j), we can substitute the expression for the derivative of the sigmoid:

    δj=(djyj)yj(1yj)\delta_j = (d_j - y_j) \cdot y_j \cdot (1 - y_j)
  2. Weight Change (Δwji\Delta w_{ji}): The correction for each weight wjiw_{ji} is calculated using the “delta rule”:

    Δwji=ηδjxi\Delta w_{ji} = \eta \cdot \delta_j \cdot x_i

    where η\eta is the learning rate.

  3. Explicit Expression for the Updated Weight: By substituting the expression for δj\delta_j into the update formula, we get the final analytical expression for the new weight value:

    wjinew=wjiold+Δwjiw_{ji}^{\text{new}} = w_{ji}^{\text{old}} + \Delta w_{ji} wjinew=wjiold+η(djyj)yj(1yj)xiw_{ji}^{\text{new}} = w_{ji}^{\text{old}} + \eta \cdot (d_j - y_j) \cdot y_j \cdot (1 - y_j) \cdot x_i
  4. Bias Update (Δbj\Delta b_j): The bias is updated similarly (the input for the bias is +1):

    bjnew=bjold+η(djyj)yj(1yj)b_{j}^{\text{new}} = b_{j}^{\text{old}} + \eta \cdot (d_j - y_j) \cdot y_j \cdot (1 - y_j)

Thus, we have obtained explicit formulas for updating all trainable parameters of the network by analytically calculating the gradient for a specific activation function.