Guide to GPU-accelerated Ship Recognition in Satellite Imagery Using Keras and R (part II)

Reading time:

time

min

January 23, 2018

<h2 id="before-we-start">Before we start…</h2> We hope you found the <a href="https://appsilon.com/ship-recognition-in-satellite-imagery-part-i/" target="_blank" rel="noopener noreferrer">first half</a> of this post useful and interesting. Before we dive into the code, I want to explain a few important aspects of data science. Firstly, implementing data science in practice is always a research process. The goals we set have a significant impact on the methods chosen. Trying to achieve even a marginal increase in accuracy or precision can have a significant impact on the project’s duration. Development is heavily influenced by the data, as well. Achieving the same results on different data sets is not always a straightforward process. Furthermore, I want to describe why we use GPU’s over CPU’s to train our models. It is important to go into the differences between the two. CPU’s only have a few cores. Generally, each core works on a single process at a time. GPU’s on the other hand, has hundreds of weaker cores. Technically speaking, training a model is done through thousands of small processes and individual statistical manipulations. Each of these processes can be done at the same time on a GPU, vastly decreasing the necessary time needed for training. The differences are most apparent in Deep Learning. <h2 id="the-data">The data</h2> Before we start changing our CNN’s architecture, there are some things we can do when preparing our data. As a reminder, we’ve got 2800 satellite images (80-pixel height, 80-pixel width, 3 colors - RGB color space). This isn’t a huge sample, especially in Deep Learning, but it will do for our needs. In situations like this, a common practice is to use some geometric transformation (rotation, translation, thickening, blurring, etc.) to enlarge the training set. For example, in R we can use the rot90 function from the pracma package to create images rotated by 90, 180, or 270 degrees. We now have to slightly modify the code: <figure class="highlight"> <pre><code class="language-r" data-lang="r">library(keras) library(tidyverse) library(jsonlite) library(abind) library(gridExtra) library(pracma) ships_json <- fromJSON("ships_images/shipsnet.json")[1:2] ships_data <- ships_json$data %>% apply(., 1, function(x) { r <- matrix(x[1:6400], 80, 80, byrow = TRUE) / 255 g <- matrix(x[6401:12800], 80, 80, byrow = TRUE) / 255 b <- matrix(x[12801:19200], 80, 80, byrow = TRUE) / 255 list(array(c(r, g, b), dim = c(80, 80, 3)), # Orginal array(c(rot90(r, 1), rot90(g, 1), rot90(b, 1)), dim = c(80, 80, 3)), # 90 degrees array(c(rot90(r, 2), rot90(g, 2), rot90(b, 2)), dim = c(80, 80, 3)), # 180 degrees array(c(rot90(r, 3), rot90(g, 3), rot90(b, 3)), dim = c(80, 80, 3))) # 270 degrees }) %>% do.call(c, .) %>% abind(., along = 4) %>% # Combine 3-dimensional arrays into 4-dimensional array aperm(c(4, 1, 2, 3)) # Array transposition ships_labels <- ships_json$labels %>% map(~ rep(.x, 4)) %>% unlist() %>% to_categorical(2) set.seed(1234) indexes <- sample(1:dim(ships_data)[1], 0.7 * dim(ships_data)[1] / 4) %>% map(~ .x + 0:3) %>% unlist() train <- list(data = ships_data[indexes, , , ], labels = ships_labels[indexes, ]) test <- list(data = ships_data[-indexes, , , ], labels = ships_labels[-indexes, ]) xy_axis <- data.frame(x = expand.grid(1:80, 80:1)[ ,1], y = expand.grid(1:80, 80:1)[ ,2]) sample_plots <- 1:4 %>% map(~ { plot_data <- cbind(xy_axis, r = as.vector(t(ships_data[.x, , ,1])), g = as.vector(t(ships_data[.x, , ,2])), b = as.vector(t(ships_data[.x, , ,3]))) ggplot(plot_data, aes(x, y, fill = rgb(r, g, b))) + guides(fill = FALSE) + scale_fill_identity() + theme_void() + geom_raster(hjust = 0, vjust = 0) + ggtitle(paste(((.x - 1) * 90) %% 360, "degree rotation")) }) do.call("grid.arrange", c(sample_plots, ncol = 2, nrow = 2))</code></pre> </figure>   <img class="aligncenter size-full wp-image-8860" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0236045aea1a614239666_rot.webp" alt="Rotated images of ships" width="756" height="533" /> <h2 id="cnns-architecture">CNN’s architecture</h2> We can change the architecture of our ConvNet in many different ways. The first and simplest thing we can try is to add more layers. Our initial network looks like this: <img class="aligncenter size-full wp-image-8861" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b023608879deb4ce6480e7_model_old.webp" alt="Old model outline" width="763" height="246" /> We will add some previously mentioned layers (convolutional, pooling, activation), but can also add some new ones. Our network is getting bigger and more complicated. As such, it could be prone to overfitting. To prevent this we can use a regularization method called dropout. In dropout, individual nodes are either removed from the network with some probability 1-p or kept with probability p. To add dropout to a convolutional neural network in Keras we can use the layer_dropout() function and set the rate parameter to a desired probability. Our example architecture could look like this: <img class="aligncenter size-full wp-image-8862" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b02362386245cb4e127d7a_model_new.webp" alt="New model outline" width="766" height="246" /> <figure class="highlight"> <pre><code class="language-r" data-lang="r">model2 <- keras_model_sequential() model2 %>% layer_conv_2d( filter = 32, kernel_size = c(3, 3), padding = "same", input_shape = c(80, 80, 3), activation = "relu") %>% layer_conv_2d(filter = 32, kernel_size = c(3, 3), activation = "relu") %>% layer_max_pooling_2d(pool_size = c(2, 2)) %>% layer_dropout(0.25) %>% layer_conv_2d(filter = 64, kernel_size = c(3, 3), padding = "same", activation = "relu") %>% layer_conv_2d(filter = 64, kernel_size = c(3, 3), activation = "relu") %>% layer_max_pooling_2d(pool_size = c(2, 2)) %>% layer_dropout(0.25) %>% layer_flatten() %>% layer_dense(512, activation = "relu") %>% layer_dropout(0.5) %>% layer_dense(2, activation = "softmax")</code></pre> </figure> <h2 id="optimizer">Optimizer</h2> After preparing our training set and setting up the architecture, we can choose a loss function and optimization algorithm. In Keras, you can choose from several algorithms such as a simple Stochastic Gradient Descent to a more adaptive algorithm like Adaptive Moment Estimation. Choosing a good optimizer could be crucial. In Keras, optimizer functions start with optimizer_: <figure class="highlight"> <pre><code class="language-r" data-lang="r">model2 %>% compile( loss = "categorical_crossentropy", optimizer = optimizer_adamax(lr = 0.0001, decay = 1e-6), metrics = "accuracy" )</code></pre> </figure> <h2 id="results">Results</h2> The figure below shows the values of our accuracy and loss function (cross-entropy) before (Model 1) and after (Model 2) modifications. We can see noticeable growth in our validation set accuracy (from 0.7449 to 0.9828) and loss function decrease (from 0.556 to 0.04573). <img class="aligncenter size-full wp-image-8863" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b02363a9d828bc5e4e2ec6_resu.webp" alt="Values of accuracy and loss function (cross-entropy) before and after improvements of CNN" width="756" height="533" /> I also ran both models on CPU and on GPU. The computation times are below: <img class="aligncenter size-full wp-image-8864" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b023653dff82577ed367c1_gpucpu.webp" alt="Estimation times for GPU and CPU" width="756" height="533" /> Machine specifications: Processor: Intel Core i7-7700HQ, Memory: 32GB DDR4-2133MHz, Graphic: NVIDIA GeForce GTX 1070, 8GB GDDR5 VRAM

Guide to GPU-accelerated Ship Recognition in Satellite Imagery Using Keras and R (part II)

Open source, pharma, and AI insights - once a week.

Share Your Data Goals with Us