Updated: June 2020 by Appsilon Data Science
Our model for recognizing specific animals in images is a neural network consisting of multiple layers. The initial layers are already good at understanding the world in general, so we only need to train the final layers instead of “re-inventing the wheel”.
Visual recognition has been gaining popularity in biodiversity preservation and management. Since launching our AI for Good initiative, we have been working with biodiversity researchers and practitioners to deliver wildlife image recognition machine learning models and tools. Our first foray into this area was our project for Wild Detect, which aligned with one of our goals at Appsilon — to use data science consulting to aid in the preservation and management of our planet’s wildlife and environment.
The goal was to build a model for visual recognition of specific kinds of animals. Since I cannot publish the original model, I will use a different dataset to show how I built a Proof of Concept deep learning model to demonstrate how I approached this problem.
I first set out to choose the animal species that I would use for developing the model. Last year I gave a talk at UseR in Brisbane, which was quite a journey for me. I had a chance to visit the Koala Sanctuary there. Sadly, koalas are severely affected by climate change and recently they were declared functionally extinct. To highlight this problem, I decided to choose Australian animals — koalas and kangaroos — for the purposes of this article.
Visual recognition can be a powerful tool in many industries besides wildlife stewardship, including retail, defense, insurance (claims verification), and manufacturing (quality control). Well-trained modern deep neural networks can give us very accurate results for a wide range of problems.
Google Images is a good resource for building such proof of concept models. The gi2ds tool assists in the process of building a dataset in three simple steps:
The resulting list of image urls can be downloaded using code.
In our case, we gathered images for three classes: “koala”, “kangaroo” and “other” (images of Australian wilderness without any koalas or kangaroos). 20% of the images were set aside as a validation set.
The model is a convolutional neural network (CNN). CNNs excel at visual recognition. Specifically, we used a ResNet architecture, originally developed by a Cornell University/Microsoft team, which is a state-of-the-art architecture for visual tasks. Behind its exceptional accuracy is the idea to skip network layers during training, which helps eliminate much of the vanishing gradients problem, ultimately yielding a lower training error. It is fascinating that actually the human brain does something similar.
For the deep learning framework, we used PyTorch. We find it more convenient than Tensorflow for several reasons including the fact that PyTorch provides an official set of pre-trained models that can be used in various visual problems.
Instead of training the model from scratch, we used a version of ResNet pre-trained on the ImageNet dataset. ImageNet is a dataset of over 15 million annotated images created for the Large Scale Visual Recognition Challenge (ILSVRC).
This technique of using a pre-trained model for a different task is called transfer learning. It allows for achieving exceptional results quickly. Our model is a neural network consisting of multiple layers, and the initial layers of the pre-trained model are already quite effective at understanding the world in general. We only needed to train the final layers. This step also goes a long way to minimizing the training time, and lets us achieve good results with only several hundred images for each of the three classes: “koala,” “kangaroo,” and “other.”
We started by training the last 2 layers, which gave a 1,98% error. To achieve an even lower score, we trained all the layers, which got the error rate down to 1,58%. Naturally, for a production model we would do more fine tuning and data augmentation. We would also need to gather a more realistic dataset. That being said, this model already proves what solution can be achieved.
Once the model has been taught to spot, in this case, kangaroos and koalas, the results can be made accessible in a variety of ways — an API, a Shiny or a Python web application. We believe it is crucial to have a usable interface for a model, so that the findings of our neural network can be made available and easily accessible to users who may also wish to see how the model arrived at a given conclusion. The user interface, which enables interaction between the human and the neural network is just as important as the actual artificial intelligence part.
In the case of the Wild Detect project, we were contributing to building a standalone device that would eventually be installed in the wilderness, on a ranch, and/or a nature preserve and that can be regularly queried as new images come in from the inbuilt camera.
We are experts in building analytical web apps, so for the POC I built an app that allows for playing around with the model. Here is what it looks like:
AI can be very accurate in recognizing objects, animals and people in images. Using transfer learning makes business applications even more feasible, and allows us to work with smaller datasets, which are often all we have. Accuracy and effort to build a model matters. It matters to a non-profit organization that is counting the few remaining koalas left on the planet, and it matters to a company counting inventory in its warehouse. It matters to Wild Detect, for whom we made a successful Proof of Concept, and we are excited to journey with them further.
Organizations that manage large facilities, inventories, and land can all benefit from accurate visual recognition of objects. See other Appsilon articles about object detection below.