Classifying user-supplied images using pre-trained CNN models

Build, Train and Evaluate a CNN Model to identify a dog-breed in a user-supplied image

Ayushi Shah
12 min readOct 14, 2020
Final Output of the project

Introduction

The article describes a project to classify the user-supplied images of dogs of different breeds. This project is inspired by Udacity’s DeepLearning nano degree program. The dataset contains images of 133 different dog breeds. Thus, we have 133 classes. We know that for the image classification tasks, CNNs (Convolutional Neural Networks) give highly accurate outputs. Here instead of building a CNN model from scratch, we use Pre-trained CNN model. This concept is called Transfer Learning. The Project uses the PyTorch framework for building and training the CNN as it provides the pre-trained models and makes it easy to move parameters to GPU for faster Processing.

Source: Created on draw.io

Import Libraries

We start by importing the following libraries.

NumPy v1.18.1, PyTorch v1.6.0, torchvision v0.7.0, matplotlib v3.3.2

import os
import numpy as np
from glob import glob
from PIL import Image, ImageFile
import torch
import torchvision.transforms as transforms
import torchvision.models as models
from torchvision import datasets
import torch.nn as nn
import torch.optim as optim import matplotlib.pyplot as plt
%matplotlib inline

Import Data Sets

The next step is importing data sets. The dataset can be downloaded from here.

# Import data set
dog_files = np.array(glob("/data/dog_images/*/*/*"))
# print number of images in the dataset
print('There are %d total dog images.' % len(dog_files))

output:

There are 8351 total dog images.

Data Loading

We use PyTorch’storchvision package here. It provides datasets.ImageFolder for loading data, which can be used as follows:

dataset = datasets.ImageFolder('path-to-data', transform=transform)

Transforms: When we load data with ImageFolder , we need to define image transformations which can be done using torchvision.transforms.

  1. The dataset used for the project contains real-world images with different sizes. But we need to keep all the images of the same size to avoid bias. So, we resize the images to size 224x224 ( as most of the pre-trained networks require the input to be of this size) using transforms.RandomResizedCrop(224)and transforms.CenterCrop(224). Also, if the image is very large, resizing it to a small size will save the memory.
  2. We need to convert images to PyTorch tensors, for which we usetransforms.ToTensor() .
  3. For the invariant representation of the model, we need to add randomness in the input data. We use transforms.RandomHorizontalFlip() for the model to be translation invariant, and transforms.RandomRotation(10) for the model to be rotation invariant. This process of adding data by random rotations and translations on training data is called as Data Augmentation. It makes the model more statistically invariant and also avoids overfitting, leading to better performance on the test data.
  4. Next, we normalize our data using transforms.Normalize. Data Normalization ensures that each pixel value comes from the standard Distribution.
num_workers= 0
batch_size= 20
valid_size= 0.2
data_transform={}
data_transform['train'] = transforms.Compose([transforms.RandomResizedCrop(224), transforms.CenterCrop(224), transforms.RandomHorizontalFlip(), transforms.RandomRotation(10),
transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])
data_transform['valid'] = transforms.Compose([transforms.RandomResizedCrop(224), transforms.CenterCrop(224),
transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])
data_transform['test'] = transforms.Compose([transforms.RandomResizedCrop(224), transforms.CenterCrop(224),
transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])
data_scratch={}
data_scratch['train'] = datasets.ImageFolder('/data/dog_images/train', transform=data_transform['train'])
data_scratch['valid'] = datasets.ImageFolder('/data/dog_images/test', transform=data_transform['valid'])
data_scratch['test']= datasets.ImageFolder('/data/dog_images/valid', transform=data_transform['test'])
n_classes=len(data_scratch['train'].classes)

DataLoaders: After loading the ImageFolder , we need to pass it to the DataLoader, which returns batches of images and corresponding labels. DataLoader is a generator, to get the data out of it we need to convert it to an iterator and call next(). In the following code, we prepare DataLoaders.

loaders_scratch={}
# prepare data loaders
loaders_scratch['train']= torch.utils.data.DataLoader(data_scratch['train'], batch_size=batch_size, num_workers=num_workers, shuffle=True)
loaders_scratch['valid'] = torch.utils.data.DataLoader(data_scratch['valid'], batch_size=batch_size, num_workers=num_workers)loaders_scratch['test'] = torch.utils.data.DataLoader(data_scratch['test'], batch_size=batch_size,num_workers=num_workers)loaders_tranfer= loaders_scratch

Obtain pre-trained VGG-16 Model

Transfer Learning: It is a technique to use pre-trained models to solve different but similar problems. The approach for implementing Transfer Learning will depend on 2 factors.

  1. How similar a dataset is to the dataset that a pre-trained network has seen?
  2. How transferrable can a certain knowledge be?

For this project, we use VGGNet, which has learned to distinguish between 1000+ categories that are present in the ImageNet Dataset and to extract the features from the input images. It is simple and with minimum top-1 (28.41) and top-5 (9.62) error rates, leading to great performance. Another important thing is, in the ImageNet dataset, most of the categories are animals, fruits, vegetables, and everyday objects, this makes the VGGNet model perfect for our problem of dog-breed classification. This answers our first question of dataset similarity.

In this project, we are going to use a really deep network that has 16 layers. If we will try to train the model on CPU it will take a very long time. So, we use GPU for the calculations. On GPUs, the computations are done parallelly, which leads to 100 times the speed of CPUs. PyTorch uses CUDA to efficiently compute forward and backward passes on the GPU. It also allows to move the model parameters and other tensors to GPU memory using model.cuda(). In the code below we have used torchvision.models to download the pre-trained model and moved it to GPU.

# define VGG16 model
VGG16 = models.vgg16(pretrained=True)
# check if CUDA is available
use_cuda = torch.cuda.is_available()
# move model to GPU if CUDA is available
if use_cuda:
VGG16 = VGG16.cuda()

Output:

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.torch/models/vgg16-397923af.pth
100%|██████████| 553433881/553433881 [00:32<00:00, 17250207.33it/s]

VGG16 model has 2 parts: Features and Classifiers.

Features part is a stack of convolutional layers. We know that convolutional filters in a trained CNN are arranged in a kind of hierarchy like features in the first layer often detects edges or blocks of colors. Features in the second layer detect circles, stripes, and rectangles and so on. We see that these are general features that are useful for analyzing any image in almost any dataset. Thus this part can be used as a fixed feature extractor.

The Classifier part has fully connected layers. Here the filters are much more specific to the data the model is trained on, and so it will not work for our problem. Thus we need to replace the classifier part keeping the earlier layers.

In this way, we can use the convolutional and pooling layers in the pre-trained Network and add one or two linear layers at the end according to the problem we are trying to solve. Here we get an answer to our second question about the transferability of the knowledge.

In the code below we print the VGG16 model architecture.

print(VGG16)
print(VGG16.classifier[6].in_features)
print(VGG16.classifier[6].out_features)

Output:

VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace)
(5): Dropout(p=0.5)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
4096
1000

The code below freezes the Feature part of the Net so that it can be used as a fixed feature extractor. It simply means that the parameters in the pre-trained model will not change during training.

for param in VGG16.features.parameters():
param.requires_grad = False

Model Architecture

It is now simple to build model architecture. All we have to do is replace the last layer with a linear classifier of our own, which will give an output of the length of the number of classes which is 133.

For Backpropagation PyTorch provides a module called auto_grad, for automatically calculating the gradients of tensors. It works by keeping a track of the operations performed on the tensors. For the PyTorch to keep the track of the operations we need to set requires_grad = true. Here as we are adding the new layers to the pre-trained model, the requires_grad = true is automatically set.

import torchvision.models as models
import torch.nn as nn
## Specify model architecturen_inputs = VGG16.classifier[6].in_features# add last linear layer
# new layers automatically have requires_grad = True
last_layer = nn.Linear(n_inputs, n_classes)
VGG16.classifier[6] = last_layer
model_transfer=VGG16
print(model_transfer.classifier[6].out_features)# if GPU is available, move the model to GPUif use_cuda:
model_transfer = model_transfer.cuda()

output:

133

Loss Function and Optimizer

The next step is to calculate the loss. PyTorch provides various loss functions, one of which is a cross-entropy loss. We know that for the classification problems we need to use the softmax function to predict the class probabilities. With the softmax function, we need to use the cross-entropy loss. I have used nn.CrossEntropyLoss as it combines both nn.LogSoftmax() and nn.NLLLoss() in one single class, here is the documentation for it. Input to the function is supposed to be the score of each class. This means we need to pass raw output to the class and not the output of the softmax function. This is useful as it avoids calculations with the probabilities.

The last thing required for training CNN is updating weights. We use PyTorch’s optim package to update the weights with the gradient. optim.SGD helps us use Stochastic Gradient Descent.

criterion_transfer = nn.CrossEntropyLoss()
optimizer_transfer = optim.SGD(model_transfer.classifier.parameters(), lr=0.001)

Train and Validate the model

Now that we have our training toolbox ready, it's time to write the function to train and validate the network.

  1. Parameters of the function will be

Number of epochs n_epochs which means the number of training passes through the entire dataset.

Parameters defined in our toolbox loaders, model , optimizer , criterion.

the parameter to enable using GPU if availableuse_cuda

Last would be the path to save the best-trained model save_path

2. We follow the steps for training as follows- First of all, we need to set the model in the training mode by model.train() . It turns on the dropout(randomly drop input units) for the training phase, to reduce overfitting. PyTorch offers nn.Dropout modules for the same.

Step 0 — optimizer_zero_grad() While doing multiple backward passes with the same parameters, the gradients are accumulated, thus we need to do zero the gradients in each training pass or we will end up saving gradients from the previous training batches too.

Step1: Forward pass through the network

Step2: Use the output of the forward pass to calculate Cross Entropy Loss

Step3: Backward pass through the network through loss.backward() to calculate the gradients.

Step4: Update the weights using optimizer.step()

Step5: Update the training loss and make a record of the average training loss.

3. Validating the model.

Now that the training part is done we need to test the model for making predictions which are also called inference in the field of statistics. We know that neural networks tend to perform very well on the training dataset, leading to overfitting, which decreases the accuracy of prediction on the test data. So we need to test for overfitting on the data which is not in the training set called the validation dataset.

Step1: Set the model to eval mode model.eval() to turn off dropout during validation and testing

Step2: Repeat Step2-Step4 from point 2

Step3: Update validation loss and record the average validation loss.

Step4: Save the model if the Validation loss has decreased. We save the model so that we do not have to train it every time we want to use it.

def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
"""returns trained model"""
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf

for epoch in range(1, n_epochs+1):
# initialize variables to monitor training and validation loss
train_loss = 0.0
valid_loss = 0.0

###################
# train the model #
###################
model.train()
for batch_idx, (data, target) in enumerate(loaders['train']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing #inputs to the model
output = model(data)
# calculate the batch loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with
# respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update training loss
train_loss += loss.item()*data.size(0)


######################
# validate the model #
######################
model.eval()
for batch_idx, (data, target) in enumerate(loaders['valid']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## update the average validation loss
output = model(data)
# calculate the batch loss
loss = criterion(output, target)
# update average validation loss
valid_loss += loss.item()*data.size(0)

# calculate average losses
train_loss = train_loss/len(loaders['train'].dataset)
valid_loss = valid_loss/len(loaders['valid'].dataset)
# print training/validation statistics
print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
epoch,
train_loss,
valid_loss
))

## save the model if validation loss has decreased
if valid_loss <= valid_loss_min:
print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(
valid_loss_min,
valid_loss))
torch.save(model.state_dict(), save_path)
valid_loss_min = valid_loss

# return trained model
return model

Now we call the train() function and load the model. The parameters of PyTorch are saved in the mode’s state_dict (it contains the weight and bias matrics for each of the layers).

model_transfer = train(40, loaders_transfer, model_transfer, optimizer_transfer, criterion_transfer, use_cuda, 'model_transfer.pt')# load the model that got the best validation accuracy 
model_transfer.load_state_dict(torch.load('model_transfer.pt'))

Output: The code runs for 40 epochs. It can be changed to different values to check the performance.

Epoch: 1 	Training Loss: 0.991454 	Validation Loss: 0.846310
Validation loss decreased (inf --> 0.846310). Saving model ...
Epoch: 2 Training Loss: 0.959642 Validation Loss: 0.854284
Epoch: 3 Training Loss: 0.989277 Validation Loss: 0.857747
Epoch: 4 Training Loss: 0.949668 Validation Loss: 0.827528
Validation loss decreased (0.846310 --> 0.827528). Saving model ...
.......................
Epoch: 39 Training Loss: 0.776136 Validation Loss: 0.758932
Epoch: 40 Training Loss: 0.772685 Validation Loss: 0.822122

Test the model

Step1: Set the model to eval mode.

Step2: Repeat Step2-Step4 from point 2

Step3: Update test loss.

Step4: Convert the output probabilities to the predicted class.

Step5: Compare the predictions to the true label.

def test(loaders, model, criterion, use_cuda):# monitor test loss and accuracy
test_loss = 0.
correct = 0.
total = 0.
model.eval()
for batch_idx, (data, target) in enumerate(loaders['test']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update average test loss
test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
# convert output probabilities to predicted class
pred = output.data.max(1, keepdim=True)[1]
# compare predictions to true label
correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
total += data.size(0)

test_loss = test_loss/len(loaders['test'].dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))



print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
100. * correct / total, correct, total))

Call the test() function:

test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)

Output:

Test Loss: 0.000939
Test Accuracy: 78% (653/835)

Predict dog Breed with the model

The code below defines a function predict_breed_transfer(img_path) which takes an image path as a parameter and returns the name of the predicted class.

data_transfer=data_scratch
# list of class names by index, i.e. a name can be accessed like class_names[0]
class_names = [item[4:].replace("_", " ") for item in data_transfer['train'].classes]
def predict_breed_transfer(img_path):
model_transfer.eval()
data_transform = transforms.Compose([transforms.RandomResizedCrop(224),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])
image=Image.open(img_path)

image =data_transform(image)
image= image.unsqueeze_(0)
if use_cuda:
image = image.cuda()
## Return the *index* of the predicted class for that image
output = model_transfer(image)
_, pred = torch.max(output, 1)
return class_names[pred]

Prediction of the class for user-supplied images

The code block below

  1. Defines a function open_img() to display the image at the given path to an image as a parameter.
  2. Defines another function run_application() which again takes the path to an image as the parameter. This function creates a user experience by displaying the image of the dog and showing its breed below the image.
def open_img(img_path):
image=Image.open(img_path)
plt.imshow(image)
plt.show()
def run_application(img_path):
Predicted_breed= predict_breed_transfer(img_path)
open_img(img_path)
print('\n you look like',Predicted_breed)

Testing our algorithm

Finally, we reach the end of the project where we provide brand new(user-supplied) images of dogs (which are not there in the dataset) to our algorithm to test it.

dog_pictures = glob("./dog-pictures/*")
for file in np.hstack((dog_pictures)):
run_application(file)

Output:

In this article, we focused on the image classification pipeline using the Convolutional Neural Network for Classifying different dog breeds in the images which had a dog in them. Now for fun try supplying images with human faces. You will notice that the algorithm still gives you an output of a resembling dog breed. Thus this project is incomplete as it should first be able to detect a dog in the image and only then classify its breed. Feel free to have a look at the full project here.

--

--

Ayushi Shah

A Deep Learning enthusiast. In this ocean of knowledge and wisdom, I am lucky to have a passion to learn and improve everyday.