Multilayer Perceptron (MLP)

Structure

  • Multiple linear models (perceptrons);

  • TanH activation in hidden layers over sigmoid (logistic);

  • Linear (identity) activation for output layer for regression;

  • TanH activation for output layer for binary classification;

  • Small learning rate can lead to local minimum;

  • The higher the dimension of the loss, the lower the chance is to fall into a local minimum;

  • SGD will always lead to the global minimum of the loss;

  • Initialize weights between -0.01 and 0.01. Small weights with TanH require less iterations;

  • High learning rates can lead to NaN values (number greater than bit limits);

Last updated