Our model for recognizing specific animals in images is a neural network consisting of multiple layers, and the initial layers are already good at understanding the world in general. So instead of “re-inventing the wheel,” we only need to train the final layers.
I was excited to work on a recent project with one of our partners, Wild Detect, because it aligns with one of our goals at Appsilon — to use data science consulting to aid in the preservation and management of our planet’s wildlife and environment. The goal is to build a model for visual recognition of specific kinds of animals. I cannot publish that model, so by using a different dataset I’d like to show you how I built a Proof of Concept deep learning model for this problem. So, which animals did we choose? Last year I gave a talk at UseR in Brisbane, which was quite a journey for me. I had a chance to visit the Koala Sanctuary there. Sadly, koalas are severely affected by climate change and recently they were declared functionally extinct. To highlight that problem, I decided to choose Australian animals — koalas and kangaroos for the purposes of this article.
Visual recognition is powerful for many industries besides wildlife stewardship, including retail, defense, insurance (claims verification), and manufacturing (quality control). Well-trained modern deep neural networks can give us very accurate results for a wide range of problems.
In our case, we gathered images for three classes: “koala”, “kangaroo” and “other” (images of Australia wilderness without any koalas or kangaroos). 20% of the images were set aside as a validation set.
The model is a convolutional neural network (CNN). CNNs excel at visual recognition. Specifically, we used a ResNet architecture, originally developed by a Cornell University/Microsoft team, which is state-of-the-art architecture for visual tasks. Behind its exceptional accuracy is the idea to skip network layers during training, which helps eliminate much of the vanishing gradients problem, ultimately yielding a lower training error. It’s fascinating that actually the human brain does something similar.
For the deep learning framework, we used PyTorch. We find it more convenient than Tensorflow for several reasons. What I’d like to highlight here is that PyTorch provides an official set of pre-trained models that can be used in various visual problems.
Instead of training the model from scratch, we used a version of ResNet pre-trained on the ImageNet dataset. ImageNet is a dataset of over 15 million annotated images created for the Large Scale Visual Recognition Challenge (ILSVRC).
This technique of using a pre-trained model for a different task is called transfer learning. It allows for achieving exceptional results quickly. Our model is a neural network consisting of multiple layers, and the initial layers of the pre-trained model are already pretty good at understanding the world in general. We only needed to train the final layers. This step also goes a long way to minimizing the training time, and lets us achieve good results with only several hundred images for each of the three classes: “koala,” “kangaroo,” and “other.”
We started by training the last 2 layers, which gave a 1,98% error. To get this even lower, we then trained a bit more on all the layers, which got the error rate down to 1,58%. Naturally, for a production model we would do more fine tuning and data augmentation. We would also need to gather a more realistic dataset. That being said, this model already proves what solution can be achieved.
Once the model has been taught to spot, in this case, kangaroos and koalas, the results can be made accessible in a variety of ways — API, Shiny or a Python web application. We believe it is crucial to have a usable interface for a model. We want to expose the findings of our neural network to the people that need the information and may even need to see how the model arrived at a given conclusion. The step in which the network interacts with the human is just as important as the artificial intelligence piece.
In the case of the project with Wild Detect, we’re building a device that will eventually live in the wilderness, a ranch, and/or a nature preserve, so it will be a standalone service running on the device that can be regularly queried as new images come in from the camera.
We’re experts in building analytical web apps, so for the POC I couldn’t do anything less than to build an app where you can play with the model. Here’s what it looks like.
AI can be very accurate in recognizing objects, animals and people in images. Using transfer learning makes business applications even more feasible, and allows us to work with smaller datasets, which are often all we have. Accuracy and effort to build a model matters. It matters to a non-profit organization that is counting the few remaining koalas left on the planet, and it matters to a company counting inventory on its shelves. It matters to Wild Detect, for whom we made a successful Proof of Concept, and we’re excited to journey with them further.
Organizations that manage large facilities, inventories, and lands can all benefit from accurate visual recognition of objects. You can find other Appsilon articles about object detection here.