Convolutional Neural Network (CNN)
Definition
Type of feed-forward neural network;
Extract attributes by from grid-like data (images, 2D data, etc.) via filters (aka kernels).
Network Structure
Base
Extracts features from the input.
Bases are generally chosen from existing and knowns one, such as VGG16 or InceptionV1. Using pretrained models or parts of models is called "transfer learning". Their weights are generally frozen (the base is set to "not trainable") so the head weights don't alter the pretrained ones. If the bases weights trainable, and are planned to be improved, we call that "fine-tuning".
Steps:
Filter (using kernels);
Detect (using ReLU);
Condense (using pooling).
Head
Classifies the input using the extracted features.
Layers
Convolutional Layers
Most of the computation is done here;
Extracts attributes from the input data;
The number of neurons is equal to the number of filters (kernels);
Each neuron has a different kernel;
Each neuron processes a part of the input data (receptive field) at the time. When it is done with a receptive field, it moves to the next one;
After a neuron has processed the whole input data, it results in a feature map.

Pooling Layers
Reduces the spatial dimension of the input data;
Helps reduce the number of parameters and computational complexity in the network;
Methods are average, global average and max pooling;
Global average pooling can replace some or even all hidden dense layers and the flatten one in the head of the network;
Creates invariance (recognize an object even after its appearance varies), which destroys small translations.

Fully Connected Layers
Used to classify the extracted features;
Each neuron is connected to all the neurons in the previous layer;
Needs a flattened input (1D array).

References
Last updated