The Beginner Programmer: Image recognition tutorial in R using deep convolutional neural networks (MXNet package)

Friday, 5 August 2016

Image recognition tutorial in R using deep convolutional neural networks (MXNet package)

This is a detailed tutorial on image recognition in R using a deep convolutional neural network provided by the MXNet package. After a short post I wrote some times ago I received a lot of requests and emails for a much more detailed explanation, therefore I decided to write this tutorial. This post will show a reproducible example on how to get 97.5% accuracy score on a faces recognition task in R.

Premises

I feel some premises are necessary. I’m writing this tutorial with two main objectives.
The first one is to provide a fully reproducible example to everyone; the second objective is to provide a, hopefully, clear explanation to the most frequent questions I received via email on the old post. Please be aware that this is just how I decided to tackle this particular problem, this is certainly not the only way to solve it, and most certainly it is not the best. If you have any suggestion to improve the model or the process, or should you spot any mistake, please do leave a comment.

Requirements

I’m going to use both Python 3.x (for getting the data and some preprocessing) and R (for the core of the tutorial), so make sure to have both installed. The package requirements for R are the following:

MXNet. This package will provide you with the model we are going to be using in this tutorial, namely a deep convolutional neural network. You don’t need the GPU version, CPU will suffice for this task although it could be slower. If you feel like it, then go for the GPU version.
EBImage. This package provides a wide variety of tools to work with images. It is a pleasure to work on images using this package, the documentation is crystal clear and pretty straightforward.

As for Python 3.x please make sure to install both Numpy and Scikit-learn. I suggest to install the Anaconda distribution which already has installed some commonly used packages for data science and machine learning.

Once you have all these things installed and running, you are set to go.

The dataset

I am going to use the Olivetti faces dataset¹. This dataset is a collection of 64x64 pixel 0-256 greyscale images.

The dataset contains a total of 400 images of 40 subjects. With just 10 samples for each subject it is usually used for unsupervised or semi-supervised algorithms, but I’m going to try my best with the selected supervised method.

First of all, you need to scale the images in the 0-1 range. This is automatically done by the function we are going to use to download the dataset, so you do not need to worry about it that much, just be aware that it has already been done for you. Should you use your own images be sure to scale them down in the 0-1 range (or –1,1 range, although the first seems work better with neural networks in my experience). This below is the Python script you need to run in order to download the dataset. Just change the paths to your selected paths and then run it either from your IDE or the terminal.

What this piece of code does is basically download the data, reshape the X datapoints (the images) and then save the numpy arrays to a .csv file.

As you can see, the x array is a tensor (tensor is a fancy name for a multidimensional matrix) of shape (400, 64, 64): this means that the x array contains 400 samples of 64x64 matrices (read images). When in doubt about these things, just print out the first elements of the tensor and try to figure out the structure of the data with what you know. For instance, we know from the dataset description that we have 400 samples and that each image is 64x64 pixel. We flatten the x tensor into a matrix of shape 400x4096. That is, each 64x64 matrix (image) is now converted (flattened) to a row vector of length 4096.

As for y, y is already a simple column vector of size 400. No need to reshape it.

Take a look at the generated .csv and check that everything makes sense to you.

Some more preprocessing with R

Now we are going to use EBImage to resize each image to 28x28 pixel and generate the training and testing datasets. You may ask why I am resizing the images, well, for some reason my machine does not like 64x64 pixel images and my PC crash whenever I run the model with the data. Not good. But that’s ok since we can get good results with smaller images too (you can try running the model with the 64x64 size images if your PC does not crash though). Let’s go on now:

This part of the tutorial should be pretty self explanatory, if you are not sure on what is the output, I suggest taking a look at the rs_df dataframe. It should be a 400x785 dataframe as follows:

label, pixel1, pixel2, … , pixel784
0, 0.2, 0.3, … ,0.1

Building the model

Now for the fun part, let’s build the model. Below you can find the script I used to train and test the model. After the script you can find my comments and explanations on the code.

After having loaded the training and testing dataset, I convert each dataframe to a numeric matrix using the data.matrix function. Remember that the first column of the data represents the labels associated with each image. Be sure to remove the labels from train_array and test_array. Then, after having separated the predictors from the labels, you need to tell MXNet the shape of the data. This is what I did at line 19 with the following piece of code “dim(train_array) <- c(28, 28, 1, ncol(train_x))” for the training array and at line 24 for the test arrray. By doing this, we are essentially telling the model that the training data is made of ncol(train_x) samples (360 images) of shape 28x28. The number 1 signals that the images are greyscale, ie they have only 1 channel. If the images were RGB then the 1 should have been replaced by a 3 since in that case each image would have 3 channels.

As far as the model structure is concerned, this is a variation of the LeNet model using hyperbolic tangent instead of “Relu” as activation function, 2 convolutional layers, 2 pooling layers, 2 fully connected layer, and a typical softmax output.

Each convolutional layer is using a 5x5 kernel and applying a fixed number of filters, check this awesome video for some qualitative insight into convolutional layers. The pooling layers apply a classical “max pooling” approach.

During my tests tanh seemd to work far better than sigmoid and Relu but you can try and use the other activation functions if you want.

As for the model hyperparameters, the learning rate is a little higher than usual but it turns out that it works fine, while the selected number of epochs is 480. Batch size of 40 seems to work fine. These hyperparameters were found out from many trials and error. As far as I know I could have performed a grid search at best, but I did not want to over complicate the code and therefore I just used the classical trial and error approach.

At the end you should get 0.975 accuracy.

Conclusions

All in all, this model was fairly easy to set up and run. When running on CPU it takes about 4-5 minutes to train which is a bit long if you want to make some experiments but it is reasonable to work with.

Considering the fact that we did not do any feature engineering and just made some simple and ordinary preprocessing steps, I believe the result achieved is quite good. Of course, in case we would like to get a better grasp on the “true” accuracy, we would need to perform some cross validation procedures (that will inevitably eat up a lot of computing time Smile ).

Thank you for reading this far, I hope this tutorial helped you understanding how to set up and run this particular model.

Dataset source and credits:

¹Olivetti faces dataset was made by AT&T Laboratories Cambridge.

27 comments:

Mic5 March 2017 at 09:48
Hi Arno, thanks for reading the post and for your comment.

In general, the model will be able to discriminate only if you've told it how to do that. For instance if you have two faces in the same picture, in the example above it will "chose" to output as an answer either one or the other face, since it has not been trained to do otherwise.

If you have more labels in the same picture, you can either try to separate them and then pass each one to a classifier, use a different model, or even use some more sophisticated tools such as openCV with Python which can detect objects in a picture very accurately. Ultimately the choice will depend on the details of the specific problem you are trying to solve.
ReplyDelete
Replies
Leonid7 March 2017 at 13:07
Hi, thanks for post. Hope you make more posts on neural networks using R.
Why not use Olivetti faces from RnavGraphImageData package and stay fully in R?
ReplyDelete
Replies
nhfgnfgh28 June 2017 at 16:32
Hi,

First of all thanks for this post. Very informative.
I am struggling to install "mxnet" package in my R.

Could you please guide what needs to be done in order to use it.
ReplyDelete
Replies
nhfgnfgh30 June 2017 at 21:03
Hi, I was able to fix "Mxnet" issue mentioned in my previous comment.

It ran perfectly and gave similar results. However, when I changed the seed from 100 to some other number (Just before shuffling itno training and testing), I get an accuracy of 0.325.

Isn't it supposed to be somewhat similar of model is working fine ?
Am I missing on something ?
ReplyDelete
Replies
StatRat2 December 2017 at 10:52
I cannot see there the dependent variable is stored? Somehow the information needs to be transmitted which picture blongs to which person. Where is that done?
ReplyDelete
Replies
Unknown19 April 2018 at 10:39
I had issues with the EBImage library. Hence I did the step "Some more preprocessing with R" using this. Attention to Changes: (1) No downscaling to 28x28. I keep the full resolution of 64 x 64 and at the end. (2) At the end a RData file instead of .CSV is created.

#
# This code is from https://firsttimeprogrammer.blogspot.de/2016/08/image-recognition-tutorial-in-r-using.html
#

# This script is used to resize images from 64x64 to 28x28 pixels

# Clear workspace
rm(list=ls())
setwd("/gpudev/projects/E610046/sandbox/cnn_example1")

# Load data
X <- read.csv("data/olivetti_X.csv", header = F)
labels <- read.csv("data/olivetti_y.csv", header = F)

# Dataframe of resized images
rs_df <- data.frame()

# Main loop: for each image, resize and set it to greyscale
rs_df <- data.frame(labels,X)
rs_df[1:10,1:10]
colnames(rs_df) <- c("labels",paste0("pixel",1:ncol(X)))

# Train-test split
#-------------------------------------------------------------------------------
# Simple train-test split. No crossvalidation is done in this tutorial.

# Set seed for reproducibility purposes
set.seed(100)

# Shuffled df
shuffled <- rs_df[sample(1:nrow(rs_df)),]

# Train-test split
train_64 <- shuffled[1:360, ]
test_64 <- shuffled[361:400, ]

# Save train-test datasets
save(test_64, file="data/test_64.RData")
save(train_64, file="data/train_64.RData")

# Done!
print("Done!")
ReplyDelete
Replies
Unknown19 April 2018 at 10:43
"Building the model" with batch normalization: After 6-8 iterations, test accuracy is same as above (97.5%). Uses an own implementation of early stopping (see comment below!)

#'
#' with batch normalization
#' with early stopping
#'

# Clean workspace
rm(list=ls())

# Load MXNet
require(mxnet)

# load custom loss function (early stopping)
setwd("/gpudev/projects/E610046/sandbox/cnn_example1/fun/early_stopping.R")

# Loading data and set up
#-------------------------------------------------------------------------------

# Load train and test datasets
load("data/test_64.RData")
load("data/train_64.RData")
train <- train_64; rm(train_64)
test <- test_64; rm(test_64)

# Set up train and test datasets
train <- data.matrix(train)
train_x <- t(train[, -1])
train_y <- train[, 1]
train_array <- train_x
dim(train_array) <- c(64, 64, 1, ncol(train_x))

test_x <- t(test[, -1])
test_y <- test[, 1]
test_array <- test_x
dim(test_array) <- c(64, 64, 1, ncol(test_x))

# Set up the symbolic model
#-------------------------------------------------------------------------------

data <- mx.symbol.Variable('data')
# 1st convolutional layer
conv_1 <- mx.symbol.Convolution(data = data, kernel = c(5, 5), num_filter = 1)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
# 2nd convolutional layer
bn_1 <- mx.symbol.BatchNorm(pool_1)
conv_2 <- mx.symbol.Convolution(data = bn_1, kernel = c(5, 5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data=tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
# 1st fully connected layer
bn_2 <- mx.symbol.BatchNorm(pool_2)
flatten <- mx.symbol.Flatten(data = bn_2)
fc_1 <- mx.symbol.FullyConnected(data = flatten, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fc_1, act_type = "tanh")
# 2nd fully connected layer
bn_3 <- mx.symbol.Flatten(data = tanh_3)
fc_2 <- mx.symbol.FullyConnected(data = bn_3, num_hidden = 40)
# Output. Softmax output since we'd like to get some probabilities.
NN_model <- mx.symbol.SoftmaxOutput(data = fc_2)

# Pre-training set up
#-------------------------------------------------------------------------------

# Set seed for reproducibility
mx.set.seed(100)

# Device used. CPU in my case.
devices <- mx.cpu()

# Training
#-------------------------------------------------------------------------------
logger <- mx.metric.logger$new()
# iterator for eval error
valIter <-
mx.io.arrayiter(
data = test_array,
label = test_y,
batch.size = 40,
shuffle = F
)

# Train the model
model <- mx.model.FeedForward.create(
NN_model,
X = train_array,
eval.data = valIter,
y = train_y,
ctx = devices,
num.round = 130,
array.batch.size = 10,
learning.rate = 0.01,
momentum = 0.9,
eval.metric = mx.metric.accuracy,
epoch.end.callback = mx.callback.custom_early.stop_log.train.metric(
period = 40,
bad.steps = 3,
logger,
verbose = T,
maximize = T
)
)

# Testing
#-------------------------------------------------------------------------------

# Predict labels
predicted <- predict(model, test_array)
# Assign labels
predicted_labels <- max.col(t(predicted)) - 1
# Get accuracy
sum(diag(table(test_y, predicted_labels)))/40

################################################################################
# OUTPUT
################################################################################
#
# 0.975
#

# Author: TG
ReplyDelete
Replies
NullPointer6 January 2019 at 20:28
Hi there,

Thanks for your post. It's a really nice easy to follow tutorial for users like myself to get into this. Just a few things that I wanted to discuss that others might struggle with:

1. The installation of Mxnet varies drastically depending on where you look across the internet. Your link on how to build unfortunately doesn't exist anymore. The simplest option I found was on their page (see below) where you input your platform and it gives you the required code to install Mxnet.

https://mxnet.incubator.apache.org/install/index.html

2. As what was mentioned in the comments, the way that the testing sample is chosen highly influences the performance of the model as favourable testing samples can make the model appear better than what it is.

I was not able to reproduce your results unfortunately. The way random number generation is done from seeding appears to be dependent on the operating system (see https://stackoverflow.com/questions/48626086/same-seed-different-os-different-random-numbers-in-r)

For those who are looking for someone to compare against. I am running a Windows 10 machine (update 1803), Python version 3.7 and R version 3.5.0 (2018-04-23) -- "Joy in Playing" and using the seeding in this article I achieved a 35% test accuracy. Without setting the seeding, my test accuracy varied (depending on the seed chosen by the operating system) between 17.5% and 100%.

3. Another thought that I thought should be mentioned is that Neural nets are notorious for needing huge amounts of training data to acquire good results. I know that this isn't the aim of the tutorial, but perhaps it might be misleading for those starting out to expect a 97.5% accuracy every time when given such a small training set.

Thanks again for the tutorial!
Chris
ReplyDelete
Replies
Tong Zhao27 February 2019 at 05:28
why in your predicted_labels, you have to add -1 to the label number. Is it possible that somewhere you messed up with the labels?
ReplyDelete
Replies
educational blogs28 September 2019 at 10:07
Thanks for sharing this valuable information and we collected some information from this blog.

Machine learning in-house Corporate training in Nigeria
ReplyDelete
Replies
Padminiprwatech25 November 2019 at 06:22
Your blog is very informative and interactive. Thanks for sharing useful information.R Programming training in bangalore. Keep updating valuable content.
ReplyDelete
Replies
Jackie Co Kad11 August 2020 at 06:28
Great Article
Artificial Intelligence Projects

Project Center in Chennai

JavaScript Training in Chennai

JavaScript Training in Chennai
ReplyDelete
Replies
Sohel Rana22 December 2020 at 03:00
Thanks for sharing this super helpful guide. I really appreciate your effort.
Learn How to Resize an Image in Pixlr
ReplyDelete
Replies
markjack15 March 2021 at 16:58
This blog resolved all my queries I had in my mind. Really helpful and supportive subject matter written in all the points. Hard to find such kind of blogs as descriptive and accountable to your doubts.This blog resolved all my queries I had in my mind. Really helpful and supportive subject matter written in all the points. Hard to find such kind of blogs as descriptive and accountable to your doubts.How to get rid of Discord Javascript Error [100% SOLVE ISSUE]
ReplyDelete
Replies
Johnsk5126 March 2021 at 11:09

Amazon.com mytv - you merely unboxed a fresh Amazon device and are excited to research -- just how can you begin? Discover how to connect your devices to Prime so that you can easily see and listen to exclusive Prime manhood content from anywhere. You simply have to make amazon accounts and activate it using amazon my television activation code.
ReplyDelete
Replies
traininginstitute17 May 2021 at 03:58
We are really grateful for your blog post. You will find a lot of approaches after visiting your post. Great work

Best Data Science courses in Hyderabad
ReplyDelete
Replies
traininginstitute12 July 2021 at 06:59
I have read your excellent post. This is a great job. I have enjoyed reading your post first time. I want to say thanks for this post. Thank you...
digital marketing courses in hyderabad with placement
ReplyDelete
Replies
business analytics course in chennai with placement16 November 2021 at 06:10
Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing. iot course in lucknow
ReplyDelete
Replies
Ramesh Sampangi27 November 2021 at 18:49
Really awesome blog. Very interesting and knowledgeable content. Keep sharing more blogs with us.
Data Science Training in Hyderabad
ReplyDelete
Replies
generative ai course16 September 2025 at 09:38
Learn from 360DigiTMG’s Data Science and Artificial Intelligence Online Courses, designed for flexible learning. Covering Python, R, AI, ML, and analytics, these courses offer projects, mentorship, and placement support. Ideal for professionals and students, they provide practical exposure to data and AI tools from anywhere, anytime.

data science and artificial intelligence online courses
ReplyDelete
Replies
digitmg24 September 2025 at 12:05
Good information Thanks for sharing 360DigiTMG helps learners prepare for Data Science Fresher Jobs in Bangalore. Gain hands-on experience in Python, R, ML, AI, and analytics through projects, mentorship, and placement guidance to become industry-ready.data science fresher jobs bangalore
ReplyDelete
Replies
datascience chennai20 November 2025 at 05:08
This comment has been removed by the author.
ReplyDelete
Replies
datascience chennai20 November 2025 at 05:43
Thank you for your insights and information. Prepare for Data Science Jobs in Chennai
with 360DigiTMG. Gain hands-on experience in Python, R, ML, AI, and analytics through live projects and mentorship. Placement support equips learners with industry-ready skills to secure competitive roles in top Chennai-based companies.
ReplyDelete
Replies