Multilayer Perceptron (MLP)
Structure
Multiple linear models (perceptrons);
TanH activation in hidden layers over sigmoid (logistic);
Linear (identity) activation for output layer for regression;
TanH activation for output layer for binary classification;
Small learning rate can lead to local minimum;
The higher the dimension of the loss, the lower the chance is to fall into a local minimum;
SGD will always lead to the global minimum of the loss;
Initialize weights between -0.01 and 0.01. Small weights with TanH require less iterations;
High learning rates can lead to NaN values (number greater than bit limits);
Last updated