ResNet Implementation
The best way to learn a new framework or a deep learning architecture is to implement it or to try to reproduce it and as a newbie in research like me, a friend suggested that I start with reproducing the ResNet architecture.
This is the 34-layers architecture from the official paper https://arxiv.org/abs/1512.03385
As you can see there’s a beginning layer with 7x7 kernel that’s different from the following layers before I begin explaining the architecture there’s a few thing I would like to clarify for the newbies like myself.
- 7x7 is the kernel size/ filter
- 64 is the number of filters
- /2 means stride of 2
- pool of course means that after the cnn layer there will be a maxpool layer
so back to the main architecture as you can see there’s a starting cnn layer and afterwards there are blocks of cnn layers with equal number of filters. The definition if these blocks can be something like this.
class Block(nn.Module):def __init__(self,nums_layers,in_channels,out_channels,
identity=None,
stride=1): super(Block, self).__init__() if nums_layers > 34: self.expansion = 4 self.conv1 = conv1x1(in_channels,out_channels,stride)
else: self.expansion = 1 self.conv1 = conv3x3(in_channels,out_channels,stride) self.bn = nn.BatchNorm2d(out_channels) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(out_channels,out_channels,stride) self.bn1 = nn.BatchNorm2d(out_channels) self.identity = identity
This block has 2 cnn layers after each layer there’s a batch normalization layer, an interesting variable is the self.indentity variable as it plays an important role in the skip connection computing/residual connections. the new approach that came with the ResNet architecture.
def forward(self,x): out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) out = self.relu(out) if self.identity: self.identity = conv1x1(x) out+= self.identity out = self.relu(out) return out
This forward method is a part of the block definition, as you can see it take the input image/matrix x and pass it to the first layer and the output of the first layer goes to the following layer and so on, what’s interesting agian the identity variable as it performs downsampling on the input of the first layer and then add it to the final output value and this is what is called resiudual connection. The other thing is using 1x1 convolution to downsample the data and here’s an interesting video explains the process.
https://www.youtube.com/watch?v=vcp0XvDAX68
The full implementation can be found here: https://github.com/omar178/resenet-implementation/blob/main/ResNet_implementation.ipynb