Photo by Rashi Raffi on Unsplash


The advent of ResNet family at the end of 2015 shattered many computer vision records. Its model ensemble achieved 3.57% of top-5 error on ImageNet classification challenges (nearly halved GoogLeNet’s 6.67%). It attained a 28% of relative improvement on COCO object detection challenge, by simply replacing the detector’s backbone. It surely popularized the usage of skip connection (now it is everywhere!). In fact, ResNet is so effective that Geoffrey Hinton expressed some degree of regret that his team did not develop a similar idea into ResNet.

In this article, we will go through the ideas behind ResNet and its PyTorch…

Photo by Ed 259 on Unsplash


Two articles ago, we dissected the structures of AlexNet and VGG-family. While the two networks differ largely in their choice of filters, strides and depths, they all have the straight-forward linear architecture. GoogLeNet (and later the whole Inception family) in contrast, has a more complex structure.

At the first glance of the well-known GoogLeNet structure diagram and table (see below), we tend to be over-whelmed by the nontrivial sophistication in the design, and baffled by the specification over the filter choices. …

This article is rather long with detailed code. If you just want to learn the concepts of data augmentation or (distributed) data parallelism, just skip to the respective sections :)


In the previous two articles, we have written the evaluation scripts, as well as, coded two important but relatively simple models. However, models and evaluation scripts are only half the equation. A right training and data augmentation scheme is also non-disposable for achieving the SOTA performance.

Here, we will go through all the nitty-gritty details of training — from the popular training data augmentation regime, to setting learning rate/weight decay…


In the last article, we reviewed how some of the most famous classification networks are evaluated on ImageNet. We also finished building a PyTorch evaluation dataset class, as well as, efficient evaluation functions. We will soon see how they come handy in validating our model structures and training.

We will build AlexNet and VGG models in this article. Despite their influential contribution to computer vision and deep learning, their structures are straightforward in retrospect. Therefore, in addition to building them, we will also play with “weight porting” and sliding window implementation with convolutional layers.

I personally find borrowing weights from…

This series of articles are like documentation for the PyTorch codes I am writing. I include detailed remarks here. It is partly for me to (re-)organize my codes and thoughts, and partly for the deep learning newbies, who want some guide to build famous networks and train them to their SOTA performance.


Introduction to the Series

As a deep learning practitioner, I constantly apply the newest models to the problems at hand for desirable performance. Given that machine learning is presently a very happening field and its practitioners are open to sharing, the state-of-the-art models with pretrained weights are easily accessible to everyone. …


Learn, tinker and create.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store