Guide to GPU-accelerated Ship Recognition in Satellite Imagery Using Keras and R (part I)

Reading time:

time

min

January 16, 2018

<h2 id="problem-overview">Problem overview</h2> The tutorial we've created is a bit long, so we've divided it into a series of posts. In this post, the first of the series, I'll explain the basic concepts behind convolutional neural networks and how to build them using Keras. In <a href="https://appsilon.com/ship-recognition-in-satellite-imagery-part-ii/" target="_blank" rel="noopener noreferrer">part II</a> of the series, I focus on improving the performance of the network. Artificial Intelligence or AI has exploded in popularity both in business and in society. Companies large and small are redirecting their digital transformation to include technologies that are the true representation of what AI currently is; namely, deep learning. Deep learning is a subset of machine learning, which more generally, falls into data science. Both machine learning and deep learning find themselves at the peak of <a href="https://www.gartner.com/smarterwithgartner/top-trends-in-the-gartner-hype-cycle-for-emerging-technologies-2017/" target="_blank" rel="noopener noreferrer">2017’s Gartner Hype Cycle</a> and are already making a huge impact on the current technological status quo. Let’s take a look at one way of going about creating a basic machine learning model. <h2 id="what-is-tensorflow-and-keras-">What are TensorFlow and Keras?</h2> <a href="https://www.tensorflow.org" target="_blank" rel="noopener noreferrer">TensorFlow</a> is an open-source software library for Machine Intelligence that allows you to deploy computations to multiple CPUs or GPUs. It was developed by researchers and engineers working on the Google Brain Team. <a href="https://keras.io" target="_blank" rel="noopener noreferrer">Keras</a> is a high-level neural networks API capable of running on top of multiple back-ends including TensorFlow, CNTK, or Theano. One of its biggest advantages is its “user-friendliness”. With Keras, you can easily build advanced models like a convolutional or recurrent neural network. To install TensorFlow and Keras from R use install_keras() function. If you want to use the GPU version you have to install some prerequisites first. This could be difficult but it is worth the extra effort when dealing with larger and more elaborate models. I strongly recommend you do this! You can read more on the <a href="https://tensorflow.rstudio.com/installation_gpu.html#prerequisites" target="_blank" rel="noopener noreferrer">installation prerequisites</a>. <figure class="highlight"> <pre><code class="language-r" data-lang="r">install.packages("keras") library(keras) # Make sure to install required prerequisites, before installing Keras using the commands below: install_keras() # CPU version install_keras(tensorflow = "gpu") # GPU version</code></pre> </figure> <h2 id="data-preparation">Data preparation</h2> For the task, we will use a <a href="https://www.kaggle.com/rhammell/ships-in-satellite-imagery/data" target="_blank" rel="noopener noreferrer">dataset of 2800 satellite pictures from Kaggle</a>. Every row contains information about one photo (80-pixel height, 80-pixel width, 3 colors - RGB color space). To input data into a Keras model, we need to transform it into a 4-dimensional array (index of sample, height, width, colors). Every picture is associated with a label that could be equal to 1 for a ship and 0 for a non-ship object. Also here we have to use some transformations to create a binary matrix for Keras. <figure class="highlight"> <pre><code class="language-r" data-lang="r">library(keras) library(tidyverse) library(jsonlite) library(abind) library(gridExtra) ships_json <- fromJSON("ships_images/shipsnet.json")[1:2] ships_data <- ships_json$data %>% apply(., 1, function(x) { r <- matrix(x[1:6400], 80, 80, byrow = TRUE) / 255 g <- matrix(x[6401:12800], 80, 80, byrow = TRUE) / 255 b <- matrix(x[12801:19200], 80, 80, byrow = TRUE) / 255 list(array(c(r,g,b), dim = c(80, 80, 3))) }) %>% do.call(c, .) %>% abind(., along = 4) %>% aperm(c(4, 1, 2, 3)) ships_labels <- ships_json$labels %>% to_categorical(2) rm(ships_json) dim(ships_data)</code></pre> </figure> <figure class="highlight"> <pre><code class="language-r" data-lang="r">[1] 2800 80 80 3</code></pre> </figure> Now we can take a look at some samples of our data. Notice that if a ship appeared partially on a picture, then it wasn’t labeled as a 1. <figure class="highlight"> <pre><code class="language-r" data-lang="r">xy_axis <- data.frame(x = expand.grid(1:80, 80:1)[, 1], y = expand.grid(1:80, 80:1)[, 2]) set.seed(1111) sample_plots <- sample(1:dim(ships_data)[1], 12) %>% map(~ { plot_data <- cbind(xy_axis, r = as.vector(t(ships_data[.x, , , 1])), g = as.vector(t(ships_data[.x, , , 2])), b = as.vector(t(ships_data[.x, , , 3]))) ggplot(plot_data, aes(x, y, fill = rgb(r, g, b))) + guides(fill = FALSE) + scale_fill_identity() + theme_void() + geom_raster(hjust = 0, vjust = 0) + ggtitle(ifelse(ships_labels[.x, 2], "Ship", "Non-ship")) }) do.call("grid.arrange", c(sample_plots, ncol = 4, nrow = 3))</code></pre> </figure>   <img class="aligncenter size-full wp-image-8852" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b023654a80805de2092eaf_samples.webp" alt="Sample imagery of ships and non-ship objects" width="756" height="533" /> The last thing we have to do is to split our data into training and test sets. <figure class="highlight"> <pre><code class="language-r" data-lang="r">set.seed(1234) indexes <- sample(1:nrow(ships_labels), 0.7 * nrow(ships_labels)) train <- list(data = ships_data[indexes, , , ], labels = ships_labels[indexes, ]) test <- list(data = ships_data[-indexes, , , ], labels = ships_labels[-indexes, ])</code></pre> </figure> <h2 id="modeling">Modeling</h2> In Keras you can build models in 3 different ways using: <ol><li>a sequential model</li><li>functional API</li><li>pre-trained models</li></ol> For now, we will only use sequential models. But before that, we have to understand the basic concepts behind convolutional neural networks. Convolutional neural networks (CNN) or ConvNets are a class of deep, feed-forward artificial neural networks designed for solving problems like image, video, audio, and object detection. The architecture of ConvNets differs depending on the issue, but there are some basic commonalities.   <img class="aligncenter size-full wp-image-8853" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0236650b6cc5df6a49958_convnet.webp" alt="Typical CNN architecture" width="1255" height="601" /> The first type of layer in CNN’s is a convolutional layer and it is a core building block of ConvNets. Simply put, we take a small set of filters (also called kernels) and place them on part of our original image to get the dot product between kernels and corresponding image parts. Next, we move our filter to the next position and repeat this action. The number of pixels that we move the filters is called a stride. After getting the dot product for the whole image, we get a so-called activation map.   <img class="aligncenter size-full wp-image-8854" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b02367d1f3d10918dfa87a_convolve.webp" alt="Convolution example" width="750" height="353" /> The second type of layer in CNN’s is the pooling layer. This layer is responsible for the dimensionality reduction of activation maps. There are several types of pooling, but max pooling is most commonly used. As it was in the case of convolutional layers, we have some filters and strides. After placing the filter on an image part, we take the maximum value from that part and move to the next region by the number of pixels, specified as strides.   <img class="aligncenter size-full wp-image-8855" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b02368a9d828bc5e4e3631_maxpool.webp" alt="Maxpooling example" width="787" height="368" /> The third type of layer in CNN’s is called the activation layer. In this layer, values from activation maps are transformed by some activation function. There are several functions to use but the most common one is called a rectified linear unit (ReLU).   <img class="aligncenter size-full wp-image-8856" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0236a1bac8c0ee5247d62_relu.webp" alt="reLU function" width="756" height="533" /> The fourth type of layer is called a densely (fully) connected layer which is a classical output layer known as a feed-forward neural network. This fully connected layer is placed at the end of a ConvNet. We begin by creating an empty sequential model <figure class="highlight"> <pre><code class="language-r" data-lang="r">model <- keras_model_sequential() summary(model)</code></pre> </figure> <figure class="highlight"> <pre><code class="language-r" data-lang="r">_______________________________________________________________________________________ Layer (type) Output Shape Param # ======================================================================================= Total params: 0 Trainable params: 0 Non-trainable params: 0 _______________________________________________________________________________________</code></pre> </figure> Now we can add some additional layers. Note that objects in Keras are modified in place so there’s no need for the consecutive assignments. In the first layer, we have to specify the shape of our data. <figure class="highlight"> <pre><code class="language-r" data-lang="r">model %>% # 32 filters, each size 3x3 pixels # ReLU activation after convolution layer_conv_2d( input_shape = c(80, 80, 3), filter = 32, kernel_size = c(3, 3), strides = c(1, 1), activation = "relu") %>% layer_max_pooling_2d(pool_size = c(2, 2), strides = c(2, 2)) %>% layer_conv_2d(filter = 64, kernel_size = c(3, 3), strides = c(1, 1), activation = "relu") %>% layer_max_pooling_2d(pool_size = c(2, 2), strides = c(2, 2)) %>% layer_flatten() %>% layer_dense(2, activation = "softmax") summary(model)</code></pre> </figure> <figure class="highlight"> <pre><code class="language-r" data-lang="r">_______________________________________________________________________________________ Layer (type) Output Shape Param # ======================================================================================= conv2d_1 (Conv2D) (None, 78, 78, 32) 896 _______________________________________________________________________________________ max_pooling2d_1 (MaxPooling2D) (None, 39, 39, 32) 0 _______________________________________________________________________________________ conv2d_2 (Conv2D) (None, 37, 37, 64) 18496 _______________________________________________________________________________________ max_pooling2d_2 (MaxPooling2D) (None, 18, 18, 64) 0 _______________________________________________________________________________________ flatten_1 (Flatten) (None, 20736) 0 _______________________________________________________________________________________ dense_1 (Dense) (None, 2) 41474 ======================================================================================= Total params: 60,866 Trainable params: 60,866 Non-trainable params: 0 _______________________________________________________________________________________</code></pre> </figure> After building the architecture for our CNN, we have to configure it for training. We must specify the loss function, optimizer, and additional metrics for evaluation. For example, we can use stochastic gradient descent as an optimization method and cross-entropy as a loss function. <figure class="highlight"> <pre><code class="language-r" data-lang="r">model %>% compile( loss = "categorical_crossentropy", optimizer = optimizer_sgd(lr = 0.0001, decay = 1e-6), metrics = "accuracy" )</code></pre> </figure> Finally, we are ready to fit the model but there is one more thing we can do. If we want to have a good and quick visualization of our results, we can run a visualization tool called TensorBoard. <figure class="highlight"> <pre><code class="language-r" data-lang="r">tensorboard("logs/ships") ships_fit <- model %>% fit(x = train[[1]], y = train[[2]], epochs = 20, batch_size = 32, validation_split = 0.2, callbacks = callback_tensorboard("logs/ships"))</code></pre> </figure> <figure class="highlight"> <pre><code class="language-r" data-lang="r">... Epoch 20/20 32/1567 [..............................] - ETA: 0s - loss: 0.4627 - acc: 0.7812 160/1567 [==>...........................] - ETA: 0s - loss: 0.5256 - acc: 0.7500 288/1567 [====>.........................] - ETA: 0s - loss: 0.5268 - acc: 0.7431 448/1567 [=======>......................] - ETA: 0s - loss: 0.5401 - acc: 0.7299 608/1567 [==========>...................] - ETA: 0s - loss: 0.5375 - acc: 0.7319 768/1567 [=============>................] - ETA: 0s - loss: 0.5389 - acc: 0.7305 896/1567 [================>.............] - ETA: 0s - loss: 0.5312 - acc: 0.7377 1056/1567 [===================>..........] - ETA: 0s - loss: 0.5259 - acc: 0.7453 1216/1567 [======================>.......] - ETA: 0s - loss: 0.5294 - acc: 0.7401 1376/1567 [=========================>....] - ETA: 0s - loss: 0.5217 - acc: 0.7471 1536/1567 [============================>.] - ETA: 0s - loss: 0.5191 - acc: 0.7507 1567/1567 [==============================] - 1s 484us/step - loss: 0.5188 - acc: 0.7511 - val_loss: 0.5288 - val_acc: 0.7449</code></pre> </figure>   <img class="aligncenter size-full wp-image-8857" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0236af92b3d80176e4c5c_tb.webp" alt="Tensor board" width="1907" height="675" /> The last thing to do is to get evaluation metrics and predictions from the test set. <figure class="highlight"> <pre><code class="language-r" data-lang="r">predicted_probs <- model %>% predict_proba(test[[1]]) %>% cbind(test[[2]]) head(predicted_probs) model %>% evaluate(test[[1]], test[[2]]) set.seed(1111) sample_plots <- sample(1:dim(test[[1]])[1], 12) %>% map(~ { plot_data <- cbind(xy_axis, r = as.vector(t(test[[1]][.x, , , 1])), g = as.vector(t(test[[1]][.x, , , 2])), b = as.vector(t(test[[1]][.x, , , 3]))) ggplot(plot_data, aes(x, y, fill = rgb(r, g, b))) + guides(fill = FALSE) + scale_fill_identity() + theme_void() + geom_raster(hjust = 0, vjust = 0) + ggtitle(ifelse(test[[2]][.x, 2], "Ship", "Non-ship")) + labs(caption = paste("Ship prob:", round(predicted_probs[.x, 2], 6))) + theme(plot.title = element_text(hjust = 0.5)) }) do.call("grid.arrange", c(sample_plots, ncol = 4, nrow = 3))</code></pre> </figure> <figure class="highlight"> <pre><code class="language-r" data-lang="r">[,1] [,2] [,3] [,4] [1,] 0.04486139 0.95513862 0 1 [2,] 0.92640823 0.07359175 0 1 [3,] 0.26848912 0.73151088 0 1 [4,] 0.51208550 0.48791450 0 1 [5,] 0.15906605 0.84093398 0 1 [6,] 0.66976833 0.33023167 0 1 32/841 [>.............................] - ETA: 0s 384/841 [============>.................] - ETA: 0s 736/841 [=========================>....] - ETA: 0s 841/841 [==============================] - 0s 162us/step $loss [1] 0.5235391 $acc [1] 0.7502973</code></pre> </figure>   <img class="aligncenter size-full wp-image-8858" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0236b461107d83fd4b8b5_samples_test.webp" alt="Test set probability" width="756" height="533" /> As we can see, the model leaves room for improvement. It has a low accuracy (.075) and a high cross-entropy loss (0.52). It is, however, a good introduction and start to Keras. We are going to explore ways of improving the network and achieving better results in <a href="https://appsilon.com/ship-recognition-in-satellite-imagery-part-ii/" rel="noopener noreferrer">part two</a>. See you soon!

Guide to GPU-accelerated Ship Recognition in Satellite Imagery Using Keras and R (part I)

Open source, pharma, and AI insights - once a week.

Share Your Data Goals with Us