Loading [MathJax]/extensions/TeX/AMSsymbols.js

Monday, 25 July 2016

Image recognition in R using convolutional neural networks with the MXNet package

Among R deep learning packages, MXNet is my favourite one. Why you may ask? Well I can’t really say why this time. It feels relatively simple, maybe because at first sight its workflow looks similar to the one used by Keras, maybe because it was my first package for deep learning in R or maybe because it works very good with little effort, who knows.

MXNet is not available on CRAN but it can be easily installed either by using precompiled binaries or by building the package from scratch. I decided to install the GPU version but for some reason my GPU did not want to collaborate so I installed the CPU one that should be fine for what I plan on doing.

neural_network

As a first experiment with this package, I decided to try and complete some image recognition tasks. However, the first big problem is where to find images to work on, while at the same time not using the same old boring datasets. ImageNet is the correct answer! It provides a collection of URLs to images publicly available that can be downloaded easily using a simple R or Python script.

I decided to build a model to classify images of dogs and plants, therefore I downloaded about 1500 images of dogs and plants (mainly flowers).

Preprocessing the data

Here comes the first problem. Images have different sizes, as expected. R has a nice package for working with images: EBImage. I’ve been using it a lot lately to manipulate images. Scaling rectangular shape images to square images is not ideal, but a deep convolutional neural network should be able to deal with it and since this is just a quick exercise I think this solution can be ok.

I decided to resize the images to 28x28 pixel and turn them into greyscale. I could have also kept the RGB. I tried to use 64x64 pixel images but R refused to run smoothly so I had to go back to 28x28. In order to resize all the images at once, I wrote this quick R script. It is very customizable.

# Resize images and convert to grayscale
rm(list=ls())
require(EBImage)
# Set wd where images are located
setwd("C://dogs_images")
# Set d where to save images
save_in <- "C://dogs_images_resized"
# Load images names
images <- list.files()
# Set width
w <- 28
# Set height
h <- 28
# Main loop resize images and set them to greyscale
for(i in 1:length(images))
{
# Try-catch is necessary since some images
# may not work.
result <- tryCatch({
# Image name
imgname <- images[i]
# Read image
img <- readImage(imgname)
# Resize image 28x28
img_resized <- resize(img, w = w, h = h)
# Set to grayscale
grayimg <- channel(img_resized,"gray")
# Path to file
path <- paste(save_in, imgname, sep = "")
# Save image
writeImage(grayimg, path, quality = 70)
# Print status
print(paste("Done",i,sep = " "))},
# Error function
error = function(e){print(e)})
}
view raw im_rec_1.R hosted with ❤ by GitHub

After having preprocessed the images, they need to be stored in a proper format in order to use them to train a model. Greyscale images are basically a two dimensional matrix so they can be easily stored in a flattened array (or more simply, a vector).

# Generate a train-test dataset
# Clean environment and load required packages
rm(list=ls())
require(EBImage)
# Set wd where resized greyscale images are located
setwd("C://dogs_resized")
# Out file
out_file <- "C://dogs_28.csv"
# List images in path
images <- list.files()
# Set up df
df <- data.frame()
# Set image size. In this case 28x28
img_size <- 28*28
# Set label
label <- 1
# Main loop. Loop over each image
for(i in 1:length(images))
{
# Read image
img <- readImage(images[i])
# Get the image as a matrix
img_matrix <- img@.Data
# Coerce to a vector
img_vector <- as.vector(t(img_matrix))
# Add label
vec <- c(label, img_vector)
# Bind rows
df <- rbind(df,vec)
# Print status info
print(paste("Done ", i, sep = ""))
}
# Set names
names(df) <- c("label", paste("pixel", c(1:img_size)))
# Write out dataset
write.csv(df, out_file, row.names = FALSE)
#-------------------------------------------------------------------------------
# Test and train split and shuffle
# Load datasets
plants <- read.csv("plants_28.csv")
dogs <- read.csv("dogs_28.csv")
# Bind rows in a single dataset
new <- rbind(plants, dogs)
# Shuffle new dataset
shuffled <- new[sample(1:1512),]
# Train-test split
train_28 <- shuffled[1:1200,]
test_28 <- shuffled[1201:1512,]
# Save train-test datasets
write.csv(train_28, "train_28.csv",row.names = FALSE)
write.csv(test_28, "test_28.csv",row.names = FALSE)
view raw im_rec_2.R hosted with ❤ by GitHub

Of course it would be better to make different train and test split but it can be done later when cross validating the model.

Building and testing the model

Now the data is a usable format. Let’s go on and build the model. I’ll use 2 convolutional layers and 2 fully connected layers.

rm(list=ls())
# Load MXNet
require(mxnet)
# Train test datasets
train <- read.csv("train_28.csv")
test <- read.csv("test_28.csv")
# Fix train and test datasets
train <- data.matrix(train)
train_x <- t(train[,-1])
train_y <- train[,1]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_x))
test__ <- data.matrix(test)
test_x <- t(test[,-1])
test_y <- test[,1]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))
# Model
data <- mx.symbol.Variable('data')
# 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data= data, kernel = c(5,5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data= conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2,2), stride = c(2,2))
# 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2,2), stride = c(2,2))
# 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
# 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 2)
# Output
NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
# Set seed for reproducibility
mx.set.seed(100)
# Device used. Sadly not the GPU :-(
device <- mx.cpu()
# Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
ctx = device,
num.round = 30,
array.batch.size = 100,
learning.rate = 0.05,
momentum = 0.9,
wd = 0.00001,
eval.metric = mx.metric.accuracy,
epoch.end.callback = mx.callback.log.train.metric(100))
# Test on 312 samples
predict_probs <- predict(model, test_array)
predicted_labels <- max.col(t(predict_probs)) - 1
table(test__[,1], predicted_labels)
sum(diag(table(test__[,1], predicted_labels)))/312
##############################################
# Output
##############################################
# predicted_labels
# 0 1
# 0 83 47
# 1 34 149
#
#
# [1] 0.7412141
#
view raw im_rec_3.R hosted with ❤ by GitHub

The best accuracy score I got on the test set was of 74%. I’ve tested different parameters however it was not so easy to get them right. Other activation functions did not get a good results and caused some training problems. After 30 iteration, the model starts to overfit very badly and you get significant drops in accuracy on the test set.

Conclusions

74% is not one of the best scores in image recognition tasks, but I believe it is, globally at least, a good result for the following reasons:

- Train and test datasets were very small, 1500 samples is not that much.

- This score is still marginally better than the one I obtained using a random forest model.

- The images were squashed and stretched to be 28x28 pixels, they each showed very different subjects in different positions (dogs images), not to mention the fact that some had pretty noisy watermarks.

- I got to play around a lot with the hyperparameters of the net.

For sure there is room for improvement, namely using tensorflow and a slightly different model I achieved close to 80% Accuracy on the same test set. But this model took me less time than tensorflow to build and run.

12 comments:

  1. Sorry about the trivial question, but in the below line of code:

    predicted_labels <- max.col(t(predict_probs)) - 1

    why are we subtracting by 1 ? could you please explain this ?

    ReplyDelete
    Replies
    1. Hi, the 1 is subtracted since R, contrary to most programming languages, starts its array/vector indeces from 1 instead of 0.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hi, so for my final table all my predicted labels are marked as 1, why is that? i guess that is what preventing from the final result be similar to yours

    ReplyDelete
  4. Hi, Thanks a lot for sharing this, please help with the dataset require to replicate this i.e. dogs_resized ..

    ReplyDelete
  5. Hi Mic,

    thanks a lot, your post is easy to follow and understand...referring to you last statement, can you share the detail in similar manner on tensor flow model you developed, please!

    ReplyDelete
    Replies
    1. Hi kpankaj, thanks for your kind comment! This post is a bit old, a few changes have happened to my hard drive and I am afraid the tensorflow model has been lost somewhere in the backups I made (unless I have already posted it). However, I've got something better, if you'd like to learn the basics of Tenorflow, just follow Sentdex's videos, he's very good at explaining this stuff, I'd recommend him to everyone who starts out. Here's the link https://youtu.be/oYbVFhK_olY?list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v

      Delete
  6. sir its help me alot.... i search a lot of websites for help but its not easy to take such code but your was perfect job specially for students.. who cant paid for to get code...
    sir kindly can give me the code for such data in which we have 2 or more obejcts in one picture and then identify object??

    ReplyDelete
    Replies
    1. There are a lot of good tutorials online on the topic you need, try to search on YouTube as well.

      Delete
  7. Hi can you please tell me what u do in this step?

    dim(test_array) <- c(28, 28, 1, ncol(test_x))

    ReplyDelete
    Replies
    1. In this step I'm telling R that the data is made by ncol(test_x) samples of 28x28 pixel (or datapoints) images with 1 channel (since they are gray scale images, if they were RGB then this 1 would have been a 3)

      Delete
  8. Good Post! Thank you so much for sharing this post, it was so good to read and useful to improve my knowledge as updated one, keep blogging

    Best Machine Learning institute in Chennai | machine learning with python course in chennai | best training institute for machine learning

    ReplyDelete