Be the first user to complete this post
|
Add to List |
13. Image Classification with Deep Learning: CNN in PyTorch Using Python
Design and implement an advanced image classification system to accurately identify and categorize various fruits and vegetables using deep learning techniques. Utilizing Convolutional Neural Networks (CNNs) built with PyTorch and Python, this system will classify images of different fruits and vegetables to facilitate applications in automated inventory management, quality assessment, and dietary analysis. The model will be trained on a diverse dataset with a carefully structured approach involving training, validation, and test phases, ensuring high accuracy and effective generalization across a wide range of real-world scenarios.
- Device Selection
- DataSet - Structure
- Data Preprocessing
- CNN Model Architecture
- Define Loss Function and Optimizer
- Training Loop
- Validation Loop
- Testing Loop
- Output
- Complete Project - GitHub
1. Device Selection:
The device is selected based on whether a GPU (NVIDIA CUDA or Apple MPS) is available, defaulting to the CPU if neither is found.
2. DataSet - Structure
dataset/ train/ banana/ banana1.jpg banana2.jpg ... pepper/ pepper1.jpg pepper2.jpg ... ... validation/ banana/ banana1.jpg banana2.jpg ... pepper/ pepper1.jpg pepper2.jpg ... ... test/ banana/ banana1.jpg banana2.jpg ... pepper/ pepper1.jpg pepper2.jpg ... ...
3. Data Preprocessing
In this project, we employ three distinct datasets to develop and evaluate our image classification model: training, validation, and test datasets.
3.1 Transform the Dataset
-
transforms.Compose([...])
:- What It Is: This function allows you to chain together multiple transformations, applying them sequentially to the images in the dataset.
- Why It’s Important: Chaining transformations helps in both data augmentation and preparation, ensuring the images are in the right format and condition for training the model.
-
transforms.RandomRotation(20)
:- What It Is: This transformation randomly rotates the image by up to 20 degrees.
- Purpose: Rotation augmentation helps the model become invariant to the orientation of objects, meaning it can recognize the object regardless of how it's rotated.
-
transforms.RandomHorizontalFlip()
:- What It Is: This transformation randomly flips the image horizontally.
- Purpose: Horizontal flipping helps the model become robust to variations in the horizontal orientation of objects, like flipping an image of a banana so that it curves in the opposite direction.
-
transforms.ToTensor()
:- What It Is: Converts the image from a PIL Image (or numpy array) into a PyTorch tensor.
- Purpose: PyTorch models require input data in the form of tensors. This transformation also scales the pixel values from the range [0, 255] to [0, 1].
-
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
:- What It Is: This transformation normalizes the image tensor by subtracting a mean of 0.5 and dividing by a standard deviation of 0.5 for each color channel (Red, Green, Blue).
- Purpose: Normalization ensures that the data is centered around 0 with a standard deviation of 1. This helps stabilize and speed up the training process by ensuring that the inputs to the network have similar scales.
3.2 Load the Dataset
- What It Is: These lines create datasets for training and validation by loading images from directories and applying the defined transformations.
datasets.ImageFolder(train_dir, transform=transform)
:- What It Is:
ImageFolder
is a PyTorch utility that loads images from a directory structure where each subdirectory corresponds to a different class. - Purpose: It organizes your dataset based on the directory structure and automatically assigns labels to the images based on the folder names. The
transform
parameter ensures that the images are preprocessed (e.g., rotated, flipped, normalized) before being fed into the model.
- What It Is:
4. CNN Model Architecture
Convolutional Layers:
nn.Conv2d
This defines a 2D convolutional layer in PyTorch. The parameters are:
- in_channels: Number of input channels.
- out_channels: Number of output channels (or filters).
- kernel_size: Size of the convolving kernel.
- stride: Stride of the convolution.
self.conv1 = nn.Conv2d(3, 16, 3, 1)
- 3: The number of input channels (e.g., for RGB images, there are 3 channels).
- 16: The number of output channels (or filters). This layer will produce 16 different feature maps.
- 3: The size of the kernel is 3x3.
- 1: The stride of the convolution is 1.
self.conv2 = nn.Conv2d(16, 32, 3, 1)
- 16: The number of input channels, which is the same as the number of output channels from the previous layer.
- 32: The number of output channels. This layer will produce 32 different feature maps.
- 3: The size of the kernel is 3x3.
- 1: The stride of the convolution is 1.
Fully Connected Layer
nn.Linear
This defines a fully connected (or dense) layer in PyTorch. The parameters are:
- in_features: Size of each input sample.
- out_features: Size of each output sample.
self.fc1 = nn.Linear(32 * 30 * 30, 128)
- 32 * 30 * 30: The number of input features. This is calculated based on the output from the last convolutional layer.
- 32: The number of output channels from the last convolutional layer.
- 30 * 30: The spatial dimensions of the feature maps after the convolutions and pooling operations (if any). Assuming the input image is resized to 128x128 and the convolutions and pooling reduce it to 30x30.
- 128: The number of output features. This layer reduces the high-dimensional feature vector to a 128-dimensional vector.
self.fc2 = nn.Linear(128, data_classes_len)
- Finally, the number of output features equals to total number of categories ( In our case total categories of fruits and vegetables)
Convolutional and Pooling Layers
def forward(self, x):
x = F.relu(self.conv1(x)) # Apply ReLU activation after first convolution
x = F.max_pool2d(x, 2, 2) # Apply max pooling with a 2x2 kernel
x = F.relu(self.conv2(x)) # Apply ReLU activation after second convolution
x = F.max_pool2d(x, 2, 2) # Apply max pooling with a 2x2 kernel
x = x.view(-1, 32 * 30 * 30) # Flatten the tensor
x = F.relu(self.fc1(x)) # Apply ReLU activation after fully connected layer
x = self.fc2(x) # Output to number of classes
return F.log_softmax(x, dim=1) # apply softmax
-
First Convolutional Layer (
conv1
):- Input: (batch_size, 3, 128, 128)
- Output: (batch_size, 16, 126, 126) (since kernel_size=3 and stride=1, the spatial dimensions reduce by 2)
-
First Pooling Layer (
max_pool2d
):- Input: (batch_size, 16, 126, 126)
- Output: (batch_size, 16, 63, 63) (pooling with kernel_size=2 and stride=2, reduces dimensions by half)
-
Second Convolutional Layer (
conv2
):- Input: (batch_size, 16, 63, 63)
- Output: (batch_size, 32, 61, 61) (again, kernel_size=3 and stride=1 reduces spatial dimensions by 2)
-
Second Pooling Layer (
max_pool2d
):- Input: (batch_size, 32, 61, 61)
- Output: (batch_size, 32, 30, 30) (pooling with kernel_size=2 and stride=2 reduces dimensions by half)
- Forward Pass
- In the
forward
method, the connection between the pooling layer and the fully connected layer is made by flattening the tensor usingx.view
. - Flattening the Tensor:
x = x.view(-1, 32 * 30 * 30)
- Purpose: The goal of this line is to reshape (or "flatten") the multi-dimensional tensor into a 2D tensor, which is required before feeding it into a fully connected (linear) layer.
- Explanation:
x.view(-1, 32 * 30 * 30)
:view
is a PyTorch method that reshapes the tensor without changing its data.-1
: This tells PyTorch to infer the size of this dimension based on the other dimensions and the total number of elements. In this context,-1
usually corresponds to the batch size.32 * 30 * 30
: This is the total number of features that each image (after the convolutional and pooling layers) has. The32
is the number of feature maps, and30 * 30
is the spatial dimension of each feature map.
- Example:
- Suppose the input tensor
x
has a shape of(batch_size, 32, 30, 30)
before flattening. After applyingview(-1, 32 * 30 * 30)
, the shape ofx
becomes(batch_size, 28800)
, where28800
is the product of32 * 30 * 30
. Each image's feature maps are now flattened into a single vector of size28800
.
- Suppose the input tensor
- In the
-
Fully Connected Layer with ReLU:
x = F.relu(self.fc1(x))
-
After flattening, the vector is passed through a fully connected (linear) layer, followed by a ReLU activation function.
-
- Second Fully Connected Layer:
x = self.fc2(x)
This line passes the output from the first fully connected layer through another fully connected layer.
- Applying Log-Softmax:
return F.log_softmax(x, dim=1)
- Click here to read about Softmax
5. Define Loss Function and Optimizer
Next, you need to define the loss function and the optimizer.
6. Training Loop
Here's how you would use the train_loader
in the training loop.
7. Validation Loop:
You can use the val_loader
to evaluate the model on the validation set after each epoch.
8. Testing Loop
Finally, use the test_loader
to evaluate the model on the test set after training is complete.
9. Output:
10. Complete Project - GitHub
https://github.com/SumitJainUTD/image-classification-pytorch-tensorflow-deep-learning