In this article, we’ll help you choose the right tools and architectures for your first Image Classification project. We’ll recommend some of the best programming tools and model architectures available for classification problems in computer vision. Image classification is subjected to the same rules as any modeling problem. Choosing the right tools for the job is of critical importance for success.
Interested in Object Detection? Check out our Introduction to YOLO Object Detection.
You might be wondering whether to implement your model in PyTorch or TensorFlow. In short – it doesn’t matter, as a huge and credible community supports both frameworks. If you follow trends from Papers with Code, you might have noticed that PyTorch is gaining popularity inside the research community, which usually translates into future industry trends.
We, however, recommend using the fastai library. It is the most popular package for adding higher-level functionality on top of PyTorch. The official docs state:
“fastai simplifies training fast and accurate neural nets using modern best practices.”
This is an accurate description. Using fastai saves you a lot of time, as the first baseline model is built very quickly.
Such an approach has at least two upsides. Producing a baseline model is critical for the next research iterations. Without the baseline, you can’t validate if your experiment results in an improvement or not. Secondly, you can invest this saved time on model development in inspecting model results. The end goal is to understand better how the model works so you can improve its performance.
To start, ask yourself the following question: What are my success criteria?
Defining your success criteria is crucial and independent of the problem you are trying to solve. For example, maybe you want to maximize accuracy. Maximizing accuracy is the most common end-goal of any computer vision project. On the other hand, perhaps you are limited by hardware, so you are willing to trade accuracy for efficiency. It’s important to know what your primary goal is before you start with the project. We’ll consider the top two goals and how they impact the architecture choices.
We always suggest to start with off-the-shelf architecture, adjust it to your problem, and leverage Transfer Learning (TL).
Wait, what the heck is transfer learning? Here’s a concise hands-on introduction Transfer Learning.
If your goal is to maximize accuracy, starting with ResNet-50 or ResNet-101 is a good choice. They are easier to train and require fewer epochs to reach excellent performance than EfficientNets. ResNets from 50 layers use Bottleneck Blocks instead of Basic Blocks, which results in a higher accuracy with less computation time.
All the advancements in image models in recent years are most often tweaks to the original ResNet. Using these architectures and tricks such as progressive resizing or mixed precision gives excellent results that are usually satisfactory in business settings.
If you want to build a model running on mobile or edge devices, you are constrained by limited computation, power, and space. In these cases, using a recent version of MobileNet is the right choice.
Note: if we assume the mobile device has access to the internet, the model can be deployed to a remote server. The interference will happen on a server, which is easier to scale and isn’t restricted (or at least it’s restricted to a less extent) by memory or processing capacity. The choice here is project-specific, but it’s good to be aware of alternatives and options.
Are you an R Programmer? Learn How to Make a Computer Vision Model Within an R Environment
For most image classification projects, we propose to start building your models using fastai with pre-trained ResNet-50 or ResNet-101 architectures. This way, you should be able to create solid baseline models. If your project is limited by computation and storage resources, you should probably look into more efficient networks such as MobileNet, which is optimized to work on mobile or edge devices.