Building #ImageClassification models that are accurate and efficient
Posted on April 28th, 2017
04/28/2017 @NYUCourantInstitute, 251 Mercer Street, NYC, room 109
Laurens van der Maaten @Facebook spoke about some of the new technologies used by Facebook to increase accuracy and lower processing needed in image identification.
He first talked about residual networks which they are developing to replace standard convolutional neural networks. Residual networks can be thought of as a series of blocks each of which is a tiny #CNN:
- 1×1 layer, like a PCA
- 3×3 convolution layer
- 1×1 layer, inverse PCA
The raw input is added to the output of this mini-network followed by a RELU transformation.
These transformations extract features while keeping information that is input into the block, so the map is changed, but does not need to be re-learned from scratch. This eliminates some problems with vanishing gradients in the back propagation as well as the unidentifiabiliy problem.
Blocks when executed in sequence gradually add features, but removing a block after training hardly degrades performance (Huang et al 2016). From this observation they concluded that the blocks were performing two functions: detect new features and pass through some of the information in the raw input. Therefore, this structure could be made more efficient if they pass through the information yet allowed each block to only extract features.
DenseNets gives each block in each layer access to all features in the layer before it. The number of feature maps increases in each layer, so there is the possibility of a combinatorial explosion of units with each layer. Fortunately, this does not happen as each layer adds 32 new modules but the computation is more efficient, so the aggregate amount of computation for a given level of accuracy decreases when using DenseNet in favor of ResNet while accuracy improves.
Next Laurens talked about making image recognition more efficient, so a larger number of images could be processed with the same level of accuracy in a shorter average time.
He started by noting that some images are easier to identify than others. So, the goal is to quickly identify the easy images and only spend further processing time on the harder, more complex images.
The key is noting that easy images are classified using only a coarse grid, but then harder images would not be classifiable. On the other hand, using a fine grid makes it harder to classify the easy image.
Laurens described a hybrid 2-d network in which there are layers analyzing the image using the coarse grid and layers analyzing the fine grid. The fine grain blocks occasionally feed into the coarse grain blocks. At each layer outputs are tested to see if the confidence level for any image exceeds a threshold. Once the threshold is exceeded, processing is stopped and the prediction is output. In this way, when the decision is easy, this conclusion is arrived at quickly. Hard images continue further down the layers and require more processing.
By estimating the percentage of the classifier exiting at each threshold, then can time the threshold levels so that more images can be processed within a given time budget
During the Q&A, Laurens said
- To avoid overfitting the model, they train the network on both the original images as well as these same images after small transformation have been done on each image.
- They are still working to expand the #DenseNet to see its upper limits on accuracy
- He is not aware of any neurophysiological structures in the human brain that correspond to the structure of blocks in #ResNet / DenseNet.