# Classifying user-supplied images using pre-trained CNN models

## Build, Train and Evaluate a CNN Model to identify a dog-breed in a user-supplied image

## Introduction

The article describes a project to classify the user-supplied images of dogs of different breeds. This project is inspired by Udacity’s DeepLearning nano degree program. The dataset contains images of 133 different dog breeds. Thus, we have 133 classes. We know that for the image classification tasks, CNNs (Convolutional Neural Networks) give highly accurate outputs. Here instead of building a CNN model from scratch, we use Pre-trained CNN model. This concept is called **Transfer Learning. **The Project uses the PyTorch framework for building and training the CNN as it provides the pre-trained models and makes it easy to move parameters to GPU for faster Processing.

## Import Libraries

We start by importing the following libraries.

NumPy v1.18.1, PyTorch v1.6.0, torchvision v0.7.0, matplotlib v3.3.2

`import os`

import numpy as np

from glob import glob

from PIL import Image, ImageFile

import torch

import torchvision.transforms as transforms

import torchvision.models as models

from torchvision import datasets

import torch.nn as nn

import torch.optim as optim import matplotlib.pyplot as plt

%matplotlib inline

## Import Data Sets

The next step is importing data sets. The dataset can be downloaded from here.

# Import data set

dog_files = np.array(glob("/data/dog_images/*/*/*"))# print number of images in the dataset

print('There are %d total dog images.' % len(dog_files))

output:

`There are 8351 total dog images.`

## Data Loading

We use PyTorch’s`torchvision`

package here. It provides `datasets.ImageFolder`

for loading data, which can be used as follows:

`dataset `

**=** datasets.ImageFolder('path-to-data', transform**=**transform)

**Transforms: **When we load data with `ImageFolder`

, we need to define image transformations which can be done using `torchvision.transforms`

.

- The dataset used for the project contains real-world images with different sizes. But we need to keep all the images of the same size to avoid bias. So, we resize the images to size 224x224 ( as most of the pre-trained networks require the input to be of this size) using
`transforms.RandomResizedCrop(224)`

and`transforms.CenterCrop(224)`

. Also, if the image is very large, resizing it to a small size will save the memory. - We need to convert images to PyTorch tensors, for which we use
`transforms.ToTensor()`

. - For the invariant representation of the model, we need to add randomness in the input data. We use
`transforms.RandomHorizontalFlip()`

for the model to be translation invariant, and`transforms.RandomRotation(10)`

for the model to be rotation invariant. This process of adding data by random rotations and translations on training data is called as**Data Augmentation.**It makes the model more statistically invariant and also avoids overfitting, leading to better performance on the test data. - Next, we normalize our data using
`transforms.Normalize`

.**Data Normalization**ensures that each pixel value comes from the standard Distribution.

num_workers= 0

batch_size= 20

valid_size= 0.2data_transform={}

data_transform['train'] = transforms.Compose([transforms.RandomResizedCrop(224), transforms.CenterCrop(224), transforms.RandomHorizontalFlip(), transforms.RandomRotation(10),

transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])data_transform['valid'] = transforms.Compose([transforms.RandomResizedCrop(224), transforms.CenterCrop(224),

transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])data_transform['test'] = transforms.Compose([transforms.RandomResizedCrop(224), transforms.CenterCrop(224),

transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])data_scratch={}

data_scratch['train'] = datasets.ImageFolder('/data/dog_images/train', transform=data_transform['train'])

data_scratch['valid'] = datasets.ImageFolder('/data/dog_images/test', transform=data_transform['valid'])

data_scratch['test']= datasets.ImageFolder('/data/dog_images/valid', transform=data_transform['test'])n_classes=len(data_scratch['train'].classes)

**DataLoaders: **After loading the `ImageFolder`

, we need to pass it to the `DataLoader`

, which returns batches of images and corresponding labels. DataLoader is a generator, to get the data out of it we need to convert it to an iterator and call `next()`

. In the following code, we prepare DataLoaders.

loaders_scratch={}

# prepare data loaders

loaders_scratch['train']= torch.utils.data.DataLoader(data_scratch['train'], batch_size=batch_size, num_workers=num_workers, shuffle=True)loaders_scratch['valid'] = torch.utils.data.DataLoader(data_scratch['valid'], batch_size=batch_size, num_workers=num_workers)loaders_scratch['test'] = torch.utils.data.DataLoader(data_scratch['test'], batch_size=batch_size,num_workers=num_workers)loaders_tranfer= loaders_scratch

## Obtain pre-trained VGG-16 Model

**Transfer Learning: **It is a technique to use pre-trained models to solve different but similar problems.** **The approach for implementing Transfer Learning will depend on 2 factors.

- How similar a dataset is to the dataset that a pre-trained network has seen?
- How transferrable can a certain knowledge be?

For this project, we use VGGNet, which has learned to distinguish between 1000+ categories that are present in the ImageNet Dataset and to extract the features from the input images. It is simple and with minimum top-1 (28.41) and top-5 (9.62) error rates, leading to great performance. Another important thing is, in the ImageNet dataset, most of the categories are animals, fruits, vegetables, and everyday objects, this makes the VGGNet model perfect for our problem of dog-breed classification. This answers our first question of dataset similarity.

In this project, we are going to use a really deep network that has 16 layers. If we will try to train the model on CPU it will take a very long time. So, we use GPU for the calculations. On GPUs, the computations are done parallelly, which leads to 100 times the speed of CPUs. PyTorch uses CUDA to efficiently compute forward and backward passes on the GPU. It also allows to move the model parameters and other tensors to GPU memory using `model.cuda()`

. In the code below we have used `torchvision.models`

to download the pre-trained model and moved it to GPU.

# define VGG16 model

VGG16 = models.vgg16(pretrained=True)# check if CUDA is available

use_cuda = torch.cuda.is_available()# move model to GPU if CUDA is available

if use_cuda:

VGG16 = VGG16.cuda()

Output:

`Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.torch/models/vgg16-397923af.pth`

100%|██████████| 553433881/553433881 [00:32<00:00, 17250207.33it/s]

**VGG16** model has 2 parts: Features and Classifiers.

**Features** part is a stack of convolutional layers. We know that convolutional filters in a trained CNN are arranged in a kind of hierarchy like features in the first layer often detects edges or blocks of colors. Features in the second layer detect circles, stripes, and rectangles and so on. We see that these are general features that are useful for analyzing any image in almost any dataset. Thus this part can be used as a fixed feature extractor.

The** Classifier **part has fully connected layers. Here the filters are much more specific to the data the model is trained on, and so it will not work for our problem. Thus we need to replace the classifier part keeping the earlier layers.

In this way, we can use the convolutional and pooling layers in the pre-trained Network and add one or two linear layers at the end according to the problem we are trying to solve. Here we get an answer to our second question about the transferability of the knowledge.

In the code below we print the VGG16 model architecture.

`print(VGG16)`

print(VGG16.classifier[6].in_features)

print(VGG16.classifier[6].out_features)

Output:

`VGG(`

(features): Sequential(

(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(1): ReLU(inplace)

(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(3): ReLU(inplace)

(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(6): ReLU(inplace)

(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(8): ReLU(inplace)

(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(11): ReLU(inplace)

(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(13): ReLU(inplace)

(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(15): ReLU(inplace)

(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(18): ReLU(inplace)

(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(20): ReLU(inplace)

(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(22): ReLU(inplace)

(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(25): ReLU(inplace)

(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(27): ReLU(inplace)

(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

(29): ReLU(inplace)

(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

)

(classifier): Sequential(

(0): Linear(in_features=25088, out_features=4096, bias=True)

(1): ReLU(inplace)

(2): Dropout(p=0.5)

(3): Linear(in_features=4096, out_features=4096, bias=True)

(4): ReLU(inplace)

(5): Dropout(p=0.5)

(6): Linear(in_features=4096, out_features=1000, bias=True)

)

)

4096

1000

The code below freezes the Feature part of the Net so that it can be used as a fixed feature extractor. It simply means that the parameters in the pre-trained model will not change during training.

`for param in VGG16.features.parameters():`

param.requires_grad = False

## Model Architecture

It is now simple to build model architecture. All we have to do is replace the last layer with a linear classifier of our own, which will give an output of the length of the number of classes which is 133.

For **Backpropagation **PyTorch provides a module called `auto_grad, `

for automatically calculating the gradients of tensors. It works by keeping a track of the operations performed on the tensors. For the PyTorch to keep the track of the operations we need to set `requires_grad = true`

. Here as we are adding the new layers to the pre-trained model, the `requires_grad = true`

is automatically set.

import torchvision.models as models

import torch.nn as nn## Specify model architecturen_inputs = VGG16.classifier[6].in_features# add last linear layer

# new layers automatically have requires_grad = True

last_layer = nn.Linear(n_inputs, n_classes)VGG16.classifier[6] = last_layer

model_transfer=VGG16print(model_transfer.classifier[6].out_features)# if GPU is available, move the model to GPUif use_cuda:

model_transfer = model_transfer.cuda()

output:

`133`

## Loss Function and Optimizer

The next step is to calculate the **loss**. PyTorch provides various loss functions, one of which is a **cross-entropy loss**. We know that for the classification problems we need to use the softmax function to predict the class probabilities. With the softmax function, we need to use the cross-entropy loss. I have used `nn.CrossEntropyLoss`

as it combines both `nn.LogSoftmax()`

and `nn.NLLLoss()`

in one single class, here is the documentation for it. Input to the function is supposed to be the score of each class. This means we need to pass raw output to the class and not the output of the softmax function. This is useful as it avoids calculations with the probabilities.

The last thing required for training CNN is updating weights. We use PyTorch’s `optim`

package to update the weights with the gradient. `optim.SGD`

helps us use Stochastic Gradient Descent.

`criterion_transfer = nn.CrossEntropyLoss()`

optimizer_transfer = optim.SGD(model_transfer.classifier.parameters(), lr=0.001)

## Train and Validate the model

Now that we have our training toolbox ready, it's time to write the function to train and validate the network.

**Parameters**of the function will be

Number of epochs `n_epochs`

which means the number of training passes through the entire dataset.

Parameters defined in our toolbox `loaders`

, `model`

, `optimizer`

, `criterion`

.

the parameter to enable using GPU if available`use_cuda`

Last would be the path to save the best-trained model `save_path`

2. We follow the **steps for training** as follows- First of all, we need to set the model in the training mode by `model.train()`

. It turns on the dropout(randomly drop input units) for the training phase, to reduce overfitting. PyTorch offers `nn.Dropout`

modules for the same.

Step 0 — `optimizer_zero_grad()`

While doing multiple backward passes with the same parameters, the gradients are accumulated, thus we need to do zero the gradients in each training pass or we will end up saving gradients from the previous training batches too.

Step1: Forward pass through the network

Step2: Use the output of the forward pass to calculate Cross Entropy Loss

Step3: Backward pass through the network through `loss.backward()`

to calculate the gradients.

Step4: Update the weights using `optimizer.step()`

Step5: Update the training loss and make a record of the average training loss.

3. **Validating** the model.

Now that the training part is done we need to test the model for making predictions which are also called **inference **in the field of statistics. We know that neural networks tend to perform very well on the training dataset, leading to overfitting**, **which decreases the accuracy of prediction on the test data. So we need to test for overfitting on the data which is not in the training set called the validation dataset.

Step1: Set the model to eval mode `model.eval()`

to turn off dropout during validation and testing

Step2: Repeat Step2-Step4 from point 2

Step3: Update validation loss and record the average validation loss.

Step4: Save the model if the Validation loss has decreased. We save the model so that we do not have to train it every time we want to use it.

def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):

"""returns trained model"""

# initialize tracker for minimum validation loss

valid_loss_min = np.Inf

for epoch in range(1, n_epochs+1):

# initialize variables to monitor training and validation loss

train_loss = 0.0

valid_loss = 0.0

###################

# train the model #

###################

model.train()

for batch_idx, (data, target) in enumerate(loaders['train']):

# move to GPU

if use_cuda:

data, target = data.cuda(), target.cuda()

optimizer.zero_grad()

# forward pass: compute predicted outputs by passing #inputs to the model

output = model(data)

# calculate the batch loss

loss = criterion(output, target)

# backward pass: compute gradient of the loss with

# respect to model parameters

loss.backward()

# perform a single optimization step (parameter update)

optimizer.step()

# update training loss

train_loss += loss.item()*data.size(0)

######################

# validate the model #

######################

model.eval()

for batch_idx, (data, target) in enumerate(loaders['valid']):

# move to GPU

if use_cuda:

data, target = data.cuda(), target.cuda()

## update the average validation loss

output = model(data)

# calculate the batch loss

loss = criterion(output, target)

# update average validation loss

valid_loss += loss.item()*data.size(0)

# calculate average losses

train_loss = train_loss/len(loaders['train'].dataset)

valid_loss = valid_loss/len(loaders['valid'].dataset)# print training/validation statistics

print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(

epoch,

train_loss,

valid_loss

))

## save the model if validation loss has decreased

if valid_loss <= valid_loss_min:

print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(

valid_loss_min,

valid_loss))

torch.save(model.state_dict(), save_path)

valid_loss_min = valid_loss

# return trained model

return model

Now we call the `train()`

function and load the model. The parameters of PyTorch are saved in the mode’s `state_dict`

(it contains the weight and bias matrics for each of the layers).

model_transfer = train(40, loaders_transfer, model_transfer, optimizer_transfer, criterion_transfer, use_cuda, 'model_transfer.pt')# load the model that got the best validation accuracy

model_transfer.load_state_dict(torch.load('model_transfer.pt'))

Output: The code runs for 40 epochs. It can be changed to different values to check the performance.

`Epoch: 1 Training Loss: 0.991454 Validation Loss: 0.846310`

Validation loss decreased (inf --> 0.846310). Saving model ...

Epoch: 2 Training Loss: 0.959642 Validation Loss: 0.854284

Epoch: 3 Training Loss: 0.989277 Validation Loss: 0.857747

Epoch: 4 Training Loss: 0.949668 Validation Loss: 0.827528

Validation loss decreased (0.846310 --> 0.827528). Saving model ...

.......................

Epoch: 39 Training Loss: 0.776136 Validation Loss: 0.758932

Epoch: 40 Training Loss: 0.772685 Validation Loss: 0.822122

## Test the model

Step1: Set the model to eval mode.

Step2: Repeat Step2-Step4 from point 2

Step3: Update test loss.

Step4: Convert the output probabilities to the predicted class.

Step5: Compare the predictions to the true label.

def test(loaders, model, criterion, use_cuda):# monitor test loss and accuracy

test_loss = 0.

correct = 0.

total = 0.model.eval()

for batch_idx, (data, target) in enumerate(loaders['test']):

# move to GPU

if use_cuda:

data, target = data.cuda(), target.cuda()

# forward pass: compute predicted outputs by passing inputs to the model

output = model(data)

# calculate the loss

loss = criterion(output, target)

# update average test loss

test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))

# convert output probabilities to predicted class

pred = output.data.max(1, keepdim=True)[1]

# compare predictions to true label

correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())

total += data.size(0)

test_loss = test_loss/len(loaders['test'].dataset)

print('Test Loss: {:.6f}\n'.format(test_loss))

print('\nTest Accuracy: %2d%% (%2d/%2d)' % (

100. * correct / total, correct, total))

Call the `test()`

function:

`test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)`

Output:

`Test Loss: 0.000939`

Test Accuracy: 78% (653/835)

## Predict dog Breed with the model

The code below defines a function `predict_breed_transfer(img_path) `

which takes an image path as a parameter and returns the name of the predicted class.

data_transfer=data_scratch

# list of class names by index, i.e. a name can be accessed like class_names[0]

class_names = [item[4:].replace("_", " ") for item in data_transfer['train'].classes]def predict_breed_transfer(img_path):

model_transfer.eval()

data_transform = transforms.Compose([transforms.RandomResizedCrop(224),

transforms.CenterCrop(224),

transforms.ToTensor(),

transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])])

image=Image.open(img_path)

image =data_transform(image)

image= image.unsqueeze_(0)

if use_cuda:

image = image.cuda()

## Return the *index* of the predicted class for that image

output = model_transfer(image)

_, pred = torch.max(output, 1)

return class_names[pred]

## Prediction of the class for user-supplied images

The code block below

- Defines a function
`open_img()`

to display the image at the given path to an image as a parameter. - Defines another function
`run_application()`

which again takes the path to an image as the parameter. This function creates a user experience by displaying the image of the dog and showing its breed below the image.

def open_img(img_path):

image=Image.open(img_path)

plt.imshow(image)

plt.show()def run_application(img_path):

Predicted_breed= predict_breed_transfer(img_path)

open_img(img_path)

print('\n you look like',Predicted_breed)

## Testing our algorithm

Finally, we reach the end of the project where we provide brand new(user-supplied) images of dogs (which are not there in the dataset) to our algorithm to test it.

`dog_pictures = glob("./dog-pictures/*")`

for file in np.hstack((dog_pictures)):

run_application(file)

Output:

In this article, we focused on the image classification pipeline using the Convolutional Neural Network for Classifying different dog breeds in the images which had a dog in them. Now for fun try supplying images with human faces. You will notice that the algorithm still gives you an output of a resembling dog breed. Thus this project is incomplete as it should first be able to detect a dog in the image and only then classify its breed. Feel free to have a look at the full project here.