Backpropagation

Definition

  • Learning algorithm for neural networks;

  • Updates the weights for better predictions.

Loss Function

  • Metric telling how the model performs by showing difference between the predicted and the actual output;

  • A high value means the model is not performing well.

Loss derivative (gradient)

  • Used in backpropagation at the output layer level (first step of the backpropagation):

# Backward pass
error = compute_loss_gradient(y, output)
for layer in reversed(self.layers):
    error = layer.backward(error, learning_rate, self.optimizer, epoch + 1)
  • Don't confuse the "normal" loss, which is for the model metrics, and the loss derivative, used in backpropagation.

Mean Squared Error (MSE)

Cross Entropy

Optimization Algorithms

  • Minimize the loss function by updating the weights of the network.

SGD

Snippet a backpropagation method in a model class:

def backward(
    self,
    output_error: np.ndarray,
    learning_rate: float
) -> np.ndarray:
    delta = output_error * self.activation_derivative(self.output)
    input_error = np.dot(delta, self.weights.T)
    weights_error = np.dot(self.inputs.T, delta)
    bias_error = np.sum(delta, axis=0, keepdims=True)

    # SGD
    self.weights -= learning_rate * weights_error
    self.bias -= learning_rate * bias_error

    return input_error

RMSProp

Adam

Snippet of a backpropagation method in a model class:

def backward(
    self,
    output_error: np.ndarray,
    learning_rate: float
) -> np.ndarray:
    delta = output_error * self.activation_derivative(self.output)
    input_error = np.dot(delta, self.weights.T)
    weights_error = np.dot(self.inputs.T, delta)
    bias_error = np.sum(delta, axis=0, keepdims=True)

    # Adam
    beta1, beta2, epsilon = 0.9, 0.999, 1e-8
    # Momentum
    self.m_w = beta1 * self.m_w + (1 - beta1) * weights_error
    self.m_b = beta1 * self.m_b + (1 - beta1) * bias_error
    # RMSprop
    self.v_w = beta2 * self.v_w + (1 - beta2) * (weights_error**2)
    self.v_b = beta2 * self.v_b + (1 - beta2) * (bias_error**2)
    # Bias correction
    m_w_hat = self.m_w / (1 - beta1**t)
    m_b_hat = self.m_b / (1 - beta1**t)
    v_w_hat = self.v_w / (1 - beta2**t)
    v_b_hat = self.v_b / (1 - beta2**t)
    # Update weights and biases
    self.weights -= learning_rate * m_w_hat / (np.sqrt(v_w_hat) + epsilon)
    self.bias -= learning_rate * m_b_hat / (np.sqrt(v_b_hat) + epsilon)

    return input_error

References

Last updated