Introduction to Transfer Learning: Effective Machine Learning Without Custom Architecture
<h2><span data-preserver-spaces="true">What is Transfer Learning?</span></h2> <span data-preserver-spaces="true"><strong>Transfer Learning</strong> (TL) is one of the most powerful methods for building high-performance deep learning models in computer vision. TL is based on the knowledge-reusability concept - one can use knowledge from one area and apply it to another. By leveraging previous experience, you don't need to start from scratch with every new model or new situation. So effectively, you can learn how to do new tasks more efficiently by drawing on previous knowledge. </span> <ul><li><a href="#transfer-learning-workflow">Transfer Learning Workflow</a></li><li><a href="#example-1">Example: Classification with a Custom Network</a></li><li><a href="#example-2">Example: Transfer Learning </a></li><li><a href="#conclusion">Conclusion</a></li></ul> <blockquote><span data-preserver-spaces="true">New to deep learning? Check out our </span><a class="editor-rtfLink" href="https://wordpress.appsilon.com/convolutional-neural-networks/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Introduction to Convolutional Neural Networks</span></a><span data-preserver-spaces="true">.</span></blockquote> <h2 id="transfer-learning-workflow"><span data-preserver-spaces="true">Transfer Learning Workflow</span></h2> <span data-preserver-spaces="true">Let's say you are a data scientist proficient in Python, and now you need to perform a new analysis using R. R might be a new programming language for you. Still, since you already know Python, learning R will be much easier for you compared to a version of yourself that doesn't know how to program at all. R is a new language, but many of the same principles and fundamentals apply to both Python and R, so you can transfer some of your existing Python knowledge to get a headstart in your pursuit of learning R.</span> <span data-preserver-spaces="true">The same principle is used in deep learning with <strong>Transfer Learning</strong>. Instead of starting from scratch (model with random weights), you can take an existing network that has been trained to do a thing <em>X</em> and customize the network to your particular problem or task. If you read our article on Convolutional Neural Networks (<a href="https://appsilon.com/convolutional-neural-networks/" target="_blank" rel="noopener noreferrer">CNNs</a>), then you know that the deeper we go into a network, the more sophisticated are the features that get extracted.</span> <span data-preserver-spaces="true">Now imagine that you want to detect dogs and cats in a collection of images. Here's a 4 step solution for this type of task:</span> <ol><li><span data-preserver-spaces="true">We take a network which is very good at detecting objects and customize it. In other words, we take a pre-trained model, e.g., </span><em><span data-preserver-spaces="true">ResNet</span></em><span data-preserver-spaces="true"> trained on </span><em><span data-preserver-spaces="true">ImageNet</span></em><span data-preserver-spaces="true">. The existing network's starting layers focus on detecting ears, eyes, or fur, which will help detect cats and dogs. </span></li><li><span data-preserver-spaces="true">We then cut the last few layers (called the </span><em><span data-preserver-spaces="true">head</span></em><span data-preserver-spaces="true">) specialized in a particular task and replace it with a fully connected layer or a few random layers. </span></li><li><span data-preserver-spaces="true">Next, we fine-tune the added final layers by training the network on the set of images relevant to our problem (containing cats and dogs). The weights in the initial layers (called the body) won't get updated (the layers are </span><em><span data-preserver-spaces="true">frozen</span></em><span data-preserver-spaces="true">).</span></li><li><span data-preserver-spaces="true">Optionally, after fine-tuning the head, we can unfreeze the whole network and train a model a bit more, allowing for weight updates through the entire network.</span></li></ol> <span data-preserver-spaces="true">The advantages of Transfer Learning are <strong>faster training</strong> and <strong>better results</strong> with significantly <strong>less data</strong>.</span> <h2 id="example-1"><span data-preserver-spaces="true">Example: Classification With a Custom Network</span></h2> <span data-preserver-spaces="true">To start, let's download the <em>Dogs and Cats</em> dataset (link below) from the web and untar the file. We're doing this task in Google Colab on a Tesla T4 GPU, so your download and training times may vary. Here are the library imports and device configuration: </span> <figure class="highlight"> <pre><code class="language-python"> import os import numpy as np import pandas as pd import matplotlib.pyplot as plt from datetime import datetime, timedelta <br>import torch import torch.nn as nn import torch.nn.functional as F from torch.utils.data import DataLoader from torchvision.utils import make_grid from torchvision import models, transforms, datasets <br>device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') </code></pre> </figure> <span data-preserver-spaces="true">Now we can download the dataset and unpack it:</span> <figure class="highlight"> <pre><code class="language-python">%mkdir data %cd /content/data/ !wget http://files.fast.ai/data/examples/dogscats.tgz <br>!tar zxvf dogscats.tgz </code></pre> </figure> <span data-preserver-spaces="true">Libraries like </span><code class="language-r">PyTorch</code> allow us to enlarge the training data's size without acquiring more images by performing operations such as rotations and horizontal flips. Further, we'll prepare every image in the same way (resizing is optional): <ul><li><span data-preserver-spaces="true">Resize to 224x224 </span></li><li><span data-preserver-spaces="true">Transform from matrix to tensor</span></li><li><span data-preserver-spaces="true">Normalize RGB color channels </span></li></ul> <span data-preserver-spaces="true">This is performed both for training and testing images, as we essentially want them in the same format. Here's the code snippet:</span> <figure class="highlight"> <pre><code class="language-python">DIR_DATA = '/content/data/dogscats/' <br>train_transforms = transforms.Compose([ transforms.RandomRotation(10), transforms.RandomHorizontalFlip(p=0.5), transforms.Resize(224), transforms.CenterCrop((224, 224)), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) <br>valid_transforms = transforms.Compose([ transforms.Resize(224), transforms.CenterCrop((224, 224)), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) </code></pre> </figure> <span data-preserver-spaces="true">We can now apply the transformation to the images and load them in batches with a <code class="language-r">PyTorch.DataLoader</code> module. Feel free to experiment with the actual batch size, but we've set it to 32 for this case.</span> <figure class="highlight"> <pre><code class="language-python">train_data = datasets.ImageFolder(os.path.join(DIR_DATA, 'train'), transform=train_transforms) valid_data = datasets.ImageFolder(os.path.join(DIR_DATA, 'valid'), transform=valid_transforms) <br>torch.manual_seed(42) train_loader = DataLoader(train_data, batch_size=32, shuffle=True) valid_loader = DataLoader(valid_data, batch_size=32, shuffle=False) class_names = train_data.classes </code></pre> </figure> <span data-preserver-spaces="true">We can now use the declared <code class="language-r">train_loader</code> to see if transformations were applied. With the help of the <code class="language-r">matplotlib</code> library, we can visualize the entire batch (32 images):</span> <img class="aligncenter size-large wp-image-5596" src="https://wordpress.appsilon.com/wp-content/uploads/2020/10/001-1024x411.png" alt="First batch" width="1024" height="411" /> <span data-preserver-spaces="true">Judging by the rotation in the images, we can say that everything works as expected up to this point. The next step is to define the neural network class. We decided to go simple, with three convolutional layers, a fully connected layer, and the output layer. <em>Max pooling</em> operation is performed after every convolutional layer, alongside with the <em>ReLU</em> activation. Here's the model class:</span> <figure class="highlight"> <pre><code class="language-python">class MyCNN(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1) self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1) self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1) self.fc1 = nn.Linear(in_features=26*26*128, out_features=128) self.out = nn.Linear(in_features=128, out_features=2) def forward(self, x): x = F.relu(self.conv1(x)) x = F.max_pool2d(x, kernel_size=2, stride=2) x = F.relu(self.conv2(x)) x = F.max_pool2d(x, kernel_size=2, stride=2) x = F.relu(self.conv3(x)) x = F.max_pool2d(x, kernel_size=2, stride=2) x = x.view(-1, 26*26*128) x = F.relu(self.fc1(x)) x = F.dropout(x, p=0.2) return self.out(x) <br>torch.manual_seed(42) <br>model = MyCNN() model.to(device) criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.001) </code></pre> </figure> <span data-preserver-spaces="true">As you can see, the model was moved to the GPU. The model training took around an hour and ten minutes to complete for 10 epochs, and resulted in an 84% accuracy:</span> <img class="aligncenter size-large wp-image-5672" src="https://wordpress.appsilon.com/wp-content/uploads/2020/10/Screenshot-2020-10-15-at-15.17.27-1024x243.png" alt="Custom model training time" width="1024" height="243" /> <span data-preserver-spaces="true">Not a bad result for a simple network like this, but can we do better? </span><strong><span data-preserver-spaces="true">Transfer learning says yes.</span></strong> <blockquote><a href="https://appsilon.com/pp-yolo-object-detection/" target="_blank" rel="noopener noreferrer">PP-YOLO Object Detection: Why It's Faster Than YOLOv4</a></blockquote> <h2 id="example-2"><span data-preserver-spaces="true">Example: Transfer Learning</span></h2> <span data-preserver-spaces="true">The transfer learning approach will be much more straightforward than the custom one. Here are the steps:</span> <ol><li><span data-preserver-spaces="true">Download a pretrained network - </span><em><span data-preserver-spaces="true">ResNet</span></em><span data-preserver-spaces="true"> with 101 layers will do just fine</span></li><li><span data-preserver-spaces="true">Freeze the parameters of the pretrained network</span></li><li><span data-preserver-spaces="true">Update the output layer - as it predicts for 1000 classes and we only have two (dogs and cats) </span></li><li><span data-preserver-spaces="true">Transfer the model to the GPU (no pun intended) </span></li><li><span data-preserver-spaces="true">Define criterion and optimizer</span></li></ol> <span data-preserver-spaces="true">We can do that in a couple of lines of code:</span> <figure class="highlight"> <pre><code class="language-python">pretrained_model = models.resnet101(pretrained=True) for param in pretrained_model.parameters(): param.requires_grad = False <br>nb_features = pretrained_model.fc.in_features <br>pretrained_model.fc = nn.Linear(nb_features, 2) pretrained_model.to(device) <br>pretrained_criterion = nn.CrossEntropyLoss() pretrained_optimizer = torch.optim.Adam(pretrained_model.fc.parameters(), lr=0.001) </code></pre> </figure> <span data-preserver-spaces="true">And that's it! We can start the training process now. It took only <strong>15 minutes</strong> for a single epoch and yielded far greater accuracy than our custom architecture:</span> <img class="aligncenter size-large wp-image-5598" src="https://wordpress.appsilon.com/wp-content/uploads/2020/10/003-1024x236.png" alt="Transfer learning results" width="1024" height="236" /> <span data-preserver-spaces="true">Now you see how powerful Transfer Learning can be. The existence of Transfer Learning means that custom architectures are obsolete in many cases. </span> <h2 id="conclusion"><span data-preserver-spaces="true">Conclusion</span></h2> <span data-preserver-spaces="true">This article's take-home point is that stressing out about layers in custom neural network architectures is a waste of time in most cases. Pretrained networks are far more powerful than anything you can come up with on your own in any reasonable amount of time. </span> <span data-preserver-spaces="true">The transfer learning approach requires fewer data and fewer epochs (less training time), so it's a win-win situation. To be more precise, transfer learning requires more training time per epochs but requires fewer epochs to train a usable model. </span> <span data-preserver-spaces="true"><strong>If your company needs help with Transfer Learning or you need help with a custom Machine Learning model, reach out to <a href="https://appsilon.com/computer-vision/" target="_blank" rel="noopener noreferrer">Appsilon</a>.</strong> We are experts in Machine Learning and Computer Vision.</span>