timm with fastai - The Largest Computer Vision Models Library

Estimated time:

time

min

<h2>What is timm?</h2> Timm stands for py<strong>T</strong>orch <strong>IM</strong>age <strong>M</strong>odels. Ross Wightman created the Python library in 2019, with the purpose of collecting state-of-the-art image classification models from the latest papers. He would then implement and train them, in order to create the largest library of computer vision models enabling users to quickly check and compare their results in practice. At the time of publishing this post, there are over 450 pre-trained model variations in the timm library. And new ones are added almost every month. You can install <a href="https://github.com/rwightman/pytorch-image-models" target="_blank" rel="noopener noreferrer">timm</a> easily using pip: <pre class="language-r"><code class="language-r">pip install timm</code></pre> Then you can list all available models using list_models() function: <img class="size-full wp-image-11982 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f25d6bd062d7ff8101d_model_names-img1.webp" alt="" width="390" height="455" />   <h2>How to use timm in practice?</h2> Being offered access to such a large collection of computer vision models, you can be overwhelmed by the vast amount of research and work that went into their creation. But it is a unique opportunity to be able to leverage the accessibility of those models and use them for your own purposes! Which model architecture should you choose for your use case? There are often many good answers. And too many peculiarities to take into account. Timm offers a helping hand: <ol><li>This <a href="https://rwightman.github.io/pytorch-image-models/results/" target="_blank" rel="noopener noreferrer">table comparing results</a> and basic parameters can be helpful to get a rough idea of what is worth trying</li><li>Timm contains implemented and pre-trained models and allows for a quick change between models, once you set any of them up (even the newest Vision Transformers models!)</li></ol> Another resource, useful on such an occasion is the table from <a href="https://paperswithcode.com/lib/timm" target="_blank" rel="noopener noreferrer">PapersWithCode</a>. <h3>Pre-trained model library</h3> Timm's huge pre-trained model library is a wonderful thing. The library can be easily adopted into fastai training code. In fact, it can be used as a classic transfer learning application. Reusing a neural network pre-trained for a task on a new, updated version. Typically, this is done by freezing the "body" of such a model, and training only its last few layers called the "head." This way, after a much shorter training time - compared to the time-consuming initial training - the model is trained to perform well on the new task. Using fastai, one can quickly prepare the dataset, plan the training stage and simply replace the timm's model names in the customized learning. This detaches the "head" and freezes the rest of the model's "body." In doing so, we don't have to limit ourselves to one model. We can now build an ensemble of models to test and compare on our given task or dataset. In the next section, I'll show you how to do this quickly using the fastai module. <h2>Transfer learning using timm and fastai</h2> As a transfer learning example, I chose the image classification problem with the 'Flower' dataset from the fastai datasets library. The library contains 102 classes, with around 10 images for each class (English flower species). With a single line, you can download any dataset from the fastai library: <img class="size-full wp-image-11976 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f264d7961f82670120a_import-all-img-2.webp" alt="" width="272" height="55" />   Here's an example of the dataset images: <img class="size-full wp-image-11974 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f277a92f47f699b2145_flower-dataset-img-3.webp" alt="" width="513" height="512" />   <h3>Adapting and fine-tuning</h3> The next steps are to adapt a given pre-trained model and fine-tune it to your own task/dataset: <h4>Step 1) Define a timm body of a neural network model.</h4> <img class="size-full wp-image-11970 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f29487b2c34e796cdb4_defining-timm-body-img4.webp" alt="" width="674" height="175" />   <h4>Step 2) Define timm with a body and a head.</h4> <img class="alignnone size-full wp-image-11968 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f2ad6bd062d7ff815f0_defining-timm-body-and-head-img5.webp" alt="" width="966" height="194" />   <h4>Step 3) Define a timm learner.</h4> <img class="size-full wp-image-11972 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f2cccf2ef0d95c48457_defining-timm-learner-img6.webp" alt="" width="863" height="192" />   <h4>Step 4) Create the learner.</h4> As an example, here we create a learner based on rexnet_100, with Neptune tracking. Stay tuned to the <a href="https://appsilon.com/blog/" target="_blank" rel="noopener noreferrer">Appsilon blog</a> for an article on Neptune. <img class="size-full wp-image-11978 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f2d6c7e8f92760a8cb2_learner-with-Neptune-tracking-img7.webp" alt="" width="1140" height="41" />   <h4>Step 5) Train the model.</h4> <img class="size-full wp-image-11980 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f2e40d120c4e25742d9_model-training-img8.webp" alt="" width="594" height="739" />   <h4>Step 6) Check the model learning process.</h4> You can plot the loss from the training and validation stages: <img class="alignnone size-full wp-image-11984 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f2fc328baac9ee03158_plotting-loss-from-training-and-validation-img9.webp" alt="" width="399" height="308" />   The code used above is inspired by the <a href="https://walkwithfastai.com/" target="_blank" rel="noopener noreferrer">Walk with fastai blog</a>. I recommend checking the site for more useful content. <h3>Testing pytorch image models (timm)</h3> At this stage, we are ready to make use of timm's power. Since we already created and trained a model (ReXNet 100) from the module, we can now easily test others as well! Let's try two other models, also from 2020. I chose these because they were as light as possible (having a small number of parameters) and they gave decent results on ImageNet. So far we've used: <ul><li>ReXNet 100 with 5 million parameters, 18.5 MB</li><li>RegNetY with 3 million parameters, 12 MB</li><li>TF EfficientNet Lite with 5 million parameters, 18 MB</li></ul> I trained all three in the same manner - 12 epochs with the same learning rate values. The only thing I had to change was the architecture's name in the learner part. <pre class="language-r"><code class="language-r">'learn_regnety = timm_learner(...) 'learn_tf_efficient_lite = timm_learner(...)'</code></pre> <h3>Comparing trained models</h3> Depending on the metric we want to optimize, we can now compare and contrast the trained models. Here, I compare their validation loss, error rate, and accuracy. Each time, highlighting the model with the highest value of the given metric. <img class="alignnone size-full wp-image-11966 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f30487b2c34e796d1ca_compare-models-img10.webp" alt="" width="1068" height="348" />   On the above plots, the largest values are represented by the most intense colors. At first glance, the best results, lowest losses/errors, and largest accuracy were reached by the smallest yet most efficient architecture - RegNetY. In this way, using timm pre-trained models, we can easily test and compare multiple architectures and find the best suited for a given computer vision problem. <h2>Conclusion</h2> When encountering a computer vision task that requires fast, but concrete resolution, consider using timm pre-trained models. Check out their state-of-the-art implementations and solve more with timm! Gists of code used in the blog: <script src="https://gist.github.com/GajaKlaudel/f78ee74503959190da5958bc78a1bdcb.js"></script>