Join the Shiny Community every month at Shiny Gatherings

Recognizing Animals in Photos: Building an AI Model for Object Recognition

Updated: September 26, 2020.

Our model for recognizing specific animals in images is a neural network consisting of multiple layers. The initial layers are already good at understanding the world in general, so we only need to train the final layers instead of “re-inventing the wheel”.

Object Detection – Transfer Learning

Visual recognition with object detection transfer learning has been gaining popularity in biodiversity preservation and management. Since launching our AI for Good initiative, we have been working with biodiversity researchers and practitioners to deliver wildlife image recognition machine learning models and tools. Our first foray into this area was our project for Wild Detect, which aligned with one of our goals at Appsilon — to use data science consulting to aid in the preservation and management of our planet’s wildlife and environment. 

The goal was to build a model for visual recognition of specific kinds of animals. Since I cannot publish the original model, I will use a different dataset to show how I built a Proof of Concept deep learning model to demonstrate how I approached this problem.

I first set out to choose the animal species that I would use for developing the model. Last year I gave a talk at UseR in Brisbane, which was quite a journey for me. I had a chance to visit the Koala Sanctuary there. Sadly, koalas are severely affected by climate change and recently they were declared functionally extinct. To highlight this problem, I decided to choose Australian animals — koalas and kangaroos — for the purposes of this article.

Visual recognition can be a powerful tool in many industries besides wildlife stewardship, including retail, defense, insurance (claims verification), and manufacturing (quality control).  Well-trained modern deep neural networks can give us very accurate results for a wide range of problems.

Table of contents:

The Dataset for Object Detection and Transfer Learning

Google Images is a good resource for building a such proof of concept models. The gi2ds tool assists in the process of building a dataset in three simple steps:

  • Run a Google Images search for each class to be included in the dataset
  • Run javascript code
  • Review and exclude unwanted images

The resulting list of image URLs can be downloaded using code.

Image 1 – Wildlife dataset for machine learning

In our case, we gathered images for three classes: “koala”, “kangaroo” and “other” (images of Australian wilderness without any koalas or kangaroos). 20% of the images were set aside as a validation set.

The Convolutional Model We Used

The model is a convolutional neural network (CNN). CNN’s excel at visual recognition. Specifically, we used a ResNet architecture, originally developed by a Cornell University/Microsoft team, which is a state-of-the-art architecture for visual tasks. Behind its exceptional accuracy is the idea to skip network layers during training, which helps eliminate much of the vanishing gradients problem, ultimately yielding a lower training error. It is fascinating that actually, the human brain does something similar.

Image 2 – ResNet architecture

For the deep learning framework, we used PyTorch. We find it more convenient than Tensorflow for several reasons including the fact that PyTorch provides an official set of pre-trained models that can be used in various visual problems. 

Training an Object Detection Transfer Learning Model

Instead of training the model from scratch, we used a version of ResNet pre-trained on the ImageNet dataset. ImageNet is a dataset of over 15 million annotated images created for the Large Scale Visual Recognition Challenge (ILSVRC).  

This technique of using a pre-trained model for a different task is called transfer learning. It allows for achieving exceptional results quickly.  Our model is a neural network consisting of multiple layers, and the initial layers of the pre-trained model are already quite effective at understanding the world in general.  We only needed to train the final layers. This step also goes a long way to minimizing the training time, and lets us achieve good results with only several hundred images for each of the three classes: “koala,” “kangaroo,” and “other.”

We started by training the last 2 layers, which gave a 1,98% error. To achieve an even lower score, we trained all the layers, which got the error rate down to 1,58%. Naturally, for a production model, we would do more fine-tuning and data augmentation. We would also need to gather a more realistic dataset. That being said, this model already proves what solution can be achieved.

Image 3 – Activation heatmap on a Kangaroo image

The Interface Around the Machine Learning Model

Once the model has been taught to spot, in this case, kangaroos and koalas, the results can be made accessible in a variety of ways — an API, a Shiny, or a Python web application.  We believe it is crucial to have a usable interface for a model so that the findings of our neural network can be made available and easily accessible to users who may also wish to see how the model arrived at a given conclusion. The user interface, which enables interaction between the human and the neural network is just as important as the actual artificial intelligence part.

In the case of the Wild Detect project, we were contributing to building a standalone device that would eventually be installed in the wilderness, on a ranch, and/or a nature preserve and that can be regularly queried as new images come in from the inbuilt camera.

We are experts in building analytical web apps, so for the POC, I built an app that allows for playing around with the model. Here is what it looks like:

Image 4 – UI around our ML model

Summing up Object Detection and Transfer Learning

AI can be very accurate in recognizing objects, animals, and people in images. Using transfer learning in object detection makes business applications even more feasible, and allows us to work with smaller datasets, which are often all we have. Accuracy and effort to build a model matter. It matters to a non-profit organization that is counting the few remaining koalas left on the planet, and it matters to a company counting inventory in its warehouse.  It matters to Wild Detect, for whom we made a successful Proof of Concept, and we are excited to journey with them further.  

Organizations that manage large facilities, inventories, and land can all benefit from accurate visual recognition of objects.  See other Appsilon articles about object detection below.