### Introduction to convolution neural netework(CNN)

## Introduction to convolution neural netework(CNN)

**Introduction**

Convolution Neural Network is the deep learning technique that uses Feed-Forward Neural Network. This is based on an animal’s visual cortex where individual visual neurons progressively focus on overlapping tile shape regions. These vision title regions sequentially shift, which is the convolution process, to cover the overall visual field. CNN uses Multi-Layer **Perceptrons**, to do this convolution process.

CNN became well known based on an outstanding, or winning performance, of Image Recognition at ImageNet challenge 2012. In this method, ReLU(Rectifiled Linear unit) is often used as activation functions. It is used to find the same feature or kernels, that are passed through the input data, input image.The convolutional layer uses multiple filters where each filter moves sequentially across the input data or image to make a 2-dimensional activation map based on each filter. Feature maps are made from the activation maps of the filters. And the number of learn-able filters, or kernels, in the convolution process, determines how many feature maps are generated after the convolution process. Sub-sampling uses a selecting operation, which is called pooling, on the feature maps. Sub-sampling is a non-linear down-sampling process, that results in smaller feature maps. The most popular sub-sampling schemes, which there are many that exist, include median value, average value, max pooling.

**Motivation and High level considerations**

The most straightforward way to improve the performance of deep neural networks is by increasing their size. This includes both increasing in depth (i.e number of levels of network) and its width(i.e. number of units at each level). But, it has two drawbacks. First, bigger the size means a larger amount of parameters and this makes the enlarged network more prone to over fitting, especially if number of labelled examples in the training set is limited. Second, is the dramatically increased use of computational resources.

These two problems can be solved by ultimately moving from fully connected to sparsely connected architectures, even inside the convolutions. Their main result states that if the probability distribution of data-set is re-presentable by a large, very sparse deep neural network, then the optimal network topology can be constructed layer by layer by analyzing the correlation statistics of the activation of the last layer and clustering neurons with highly correlated outputs. ConvNets have traditionally used random and sparse connection tables in the feature dimensions since in order to break the symmetry and improve learning, the trend changed back to full connections with in order to better optimize parallel computing. The uniformity of the structure and a large number of filters and greater batch size allow for utilizing efficient dense computation.

Firstly, inception architecture was started as the hypothetical output of a sophisticated network topology construction algorithm that tries to approximate a sparse structure. It would give modest gains against traditional CNN methods only after two iterations on the exact choice of topology. This was especially useful in the context of localization and object detection. It has become a success for computer vision and the most

convincing proof would be if an automated system would create network topologies resulting in similar gains in other domains using the same algorithm but with very differently looking global structure. The advantage of this architecture is that it allows for increasing the number of the units at each stage significantly without an uncontrolled blow-up in computational complexity. It also aligns with the intuition that visual information should be processed at various scales and then aggregated so that the next stage can abstract features from different scales simultaneously.

The above figure show inception module ( taken from the paper Rethinking the inception architecture for computer vision ).