Getting Started With Image Classification: fastai, ResNet, MobileNet, and More
<h2><span data-preserver-spaces="true">How Do I Get Started with Image Classification?</span></h2> <span data-preserver-spaces="true">In this article, we'll help you choose the right tools and architectures for your first <a href="https://appsilon.com/computer-vision/" target="_blank" rel="noopener noreferrer">Image Classification</a> project. We'll recommend some of the best programming tools and model architectures available for classification problems in computer vision. Image classification is subjected to the same rules as any modeling problem. Choosing the right tools for the job is of critical importance for success.</span> <ul><li><a href="#tools">Choosing Image Classification Tools: fastai</a></li><li><a href="#architectures">Choosing Image Classification Architecture</a><ul><li><a href="#accuracy">Maximizing Accuracy: ResNet-50 and ResNet-101</a></li><li><a href="#efficiency">Maximizing Efficiency: MobileNet</a></li></ul> </li> <li><a href="#conlcusion">Conclusion</a></li> </ul> <blockquote><span data-preserver-spaces="true">Interested in Object Detection? Check out our </span><a class="editor-rtfLink" href="https://wordpress.appsilon.com/object-detection-yolo-algorithm/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Introduction to YOLO Object Detection</span></a><span data-preserver-spaces="true">. </span></blockquote> <h2 id="tools"><span data-preserver-spaces="true">Choosing Image Classification Tools</span></h2> <span data-preserver-spaces="true">You might be wondering whether to implement your model in </span><a class="editor-rtfLink" href="https://pytorch.org/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">PyTorch</span></a><span data-preserver-spaces="true"> or </span><a class="editor-rtfLink" href="https://www.tensorflow.org/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">TensorFlow</span></a><span data-preserver-spaces="true">. In short - it doesn't matter, as a huge and credible community supports both frameworks. If you follow </span><a class="editor-rtfLink" href="https://paperswithcode.com/trends" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">trends</span></a><span data-preserver-spaces="true"> from Papers with Code, you might have noticed that PyTorch is gaining popularity inside the research community, which usually translates into future industry trends. </span> <span data-preserver-spaces="true">We, however, recommend using the </span><a class="editor-rtfLink" href="https://docs.fast.ai/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">fastai</span></a><span data-preserver-spaces="true"> library. It is the most popular package for adding higher-level functionality on top of PyTorch. The official docs state: </span> <blockquote><em>"fastai simplifies training fast and accurate neural nets using modern best practices."</em></blockquote> <span data-preserver-spaces="true">This is an accurate description. Using fastai saves you a lot of time, as the first baseline model is built very quickly. </span> <span data-preserver-spaces="true">Such an approach has at least two upsides. Producing a baseline model is critical for the next research iterations. Without the baseline, you can't validate if your experiment results in an improvement or not. Secondly, you can invest this saved time on model development in inspecting model results. The end goal is to understand better how the model works so you can improve its performance.</span> <h2 id="architectures"><span data-preserver-spaces="true">Choosing Image Classification Architecture</span></h2> <span data-preserver-spaces="true">To start, ask yourself the following question: </span><strong><span data-preserver-spaces="true">What are my success criteria?</span></strong> <span data-preserver-spaces="true">Defining your success criteria is crucial and independent of the problem you are trying to solve. For example, maybe you want to maximize accuracy. Maximizing accuracy is the most common end-goal of any computer vision project. On the other hand, perhaps you are limited by hardware, so you are willing to trade accuracy for efficiency. </span><span data-preserver-spaces="true">It's important to know what your primary goal is before you start with the project. We'll consider the top two goals and how they impact the architecture choices.</span> <span data-preserver-spaces="true">We always suggest to start with off-the-shelf architecture, adjust it to your problem, and leverage <strong>T</strong></span><strong><span data-preserver-spaces="true">ransfer Learning (TL)</span></strong><span data-preserver-spaces="true">.</span> <blockquote><span data-preserver-spaces="true">Wait, what the heck is transfer learning?</span> <a class="editor-rtfLink" href="https://wordpress.appsilon.com/transfer-learning-introduction/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Here's a concise hands-on introduction Transfer Learning.</span></a></blockquote> <h3 id="accuracy"><span data-preserver-spaces="true">Maximizing Accuracy</span></h3> <span data-preserver-spaces="true">If your goal is to maximize accuracy, starting with </span><em><span data-preserver-spaces="true">ResNet-50</span></em><span data-preserver-spaces="true"> or </span><em><span data-preserver-spaces="true">ResNet-101</span></em><span data-preserver-spaces="true"> is a good choice. They are easier to train and require fewer epochs to reach excellent performance than </span><em><span data-preserver-spaces="true">EfficientNet</span></em><span data-preserver-spaces="true">s. </span><em><span data-preserver-spaces="true">ResNet</span></em><span data-preserver-spaces="true">s from 50 layers use </span><strong><span data-preserver-spaces="true">Bottleneck Blocks</span></strong><span data-preserver-spaces="true"> instead of Basic Blocks, which results in a higher accuracy with less computation time.</span> <span data-preserver-spaces="true">All the advancements in image models in recent years are most often tweaks to the original </span><em><span data-preserver-spaces="true">ResNet</span></em><span data-preserver-spaces="true">. Using these architectures and tricks such as progressive resizing or mixed precision gives excellent results that are usually satisfactory in business settings. </span> <h3 id="efficiency"><span data-preserver-spaces="true">Maximizing Efficiency</span></h3> <span data-preserver-spaces="true">If you want to build a model running on mobile or edge devices, you are constrained by limited computation, power, and space. In these cases, using a recent version of </span><em><span data-preserver-spaces="true">MobileNet</span></em><span data-preserver-spaces="true"> is the right choice. </span> <span data-preserver-spaces="true">Note: if we assume the mobile device has access to the internet, the model can be deployed to a remote server. The interference will happen on a server, which is easier to scale and isn't restricted (or at least it's restricted to a less extent) by memory or processing capacity. The choice here is project-specific, but it's good to be aware of alternatives and options.</span> Are you an R Programmer? Learn <a href="https://appsilon.com/fast-ai-in-r/" target="_blank" rel="noopener noreferrer">How to Make a Computer Vision Model Within an R Environment</a> <h2 id="conclusion"><span data-preserver-spaces="true">Conclusion</span></h2> <span data-preserver-spaces="true"><strong>For most image classification projects, we propose to start building your models using fastai with pre-trained ResNet-50 or ResNet-101 architectures.</strong> This way, you should be able to create solid baseline models. If your project is limited by computation and storage resources, you should probably look into more efficient networks such as <i>MobileNet</i></span><span data-preserver-spaces="true">, which is optimized to work on mobile or edge devices.</span> <span data-preserver-spaces="true">To summarize:</span> <ul><li><em><span data-preserver-spaces="true">fastai</span></em><span data-preserver-spaces="true"> is an excellent high-level library for model development</span></li><li><span data-preserver-spaces="true">The choice of architecture depends on the project objective: accuracy vs. efficiency</span></li><li><span data-preserver-spaces="true">Models from the </span><em><span data-preserver-spaces="true">ResNet</span></em><span data-preserver-spaces="true"> family are a good starting point for computer vision models</span></li><li><span data-preserver-spaces="true">If efficiency is the top priority, </span><em><span data-preserver-spaces="true">MobileNet</span></em><span data-preserver-spaces="true"> architecture is a way to go</span></li></ul> <!--more--> <h2>Learn More</h2><ul><li><a href="https://appsilon.com/convolutional-neural-networks/" target="_blank" rel="noopener noreferrer">Convolutional Neural Networks: An Introduction</a></li><li>Appsilon is Hiring R Shiny Developers and Project Leaders! Check out our <a href="https://appsilon.com/careers/" target="_blank" rel="noopener noreferrer">Careers Page</a>.</li><li>Do you need help building an advanced Machine Learning model? Reach out to <a href="https://appsilon.com/computer-vision/" target="_blank" rel="noopener noreferrer">Appsilon</a>.</li></ul>