Going Deeper with Convolutions

Year

2014

Going deeper with convolutions

Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke, Rabinovich. 2014. (View Paper → )

We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

The Inception module, which processes information at multiple scales simultaneously within a single layer using parallel convolutions of different sizes (1x1, 3x3, 5x5) connected by efficient 1x1 "bottleneck" layers. This architectural innovation achieved state-of-the-art ImageNet results while using 12x fewer parameters than AlexNet, fundamentally shifting the field from "bigger is better" to "smarter design matters more." It opened up new areas of research that weren’t just brute-force scaling.