Playing Cards Image Classification
This problem is an image classification problem where we need to classify images of playing cards into 53 different labels (52 standard playing cards + 1 random). There was 7643 train images, 265 validation images and 265 test cards. Each image is 224x224 RGB image. I have used custom convolutional network as well as pretrained networks of ResNet18 and ResNet34.
Data Source: https://www.kaggle.com/datasets/gpiosenka/cards-image-datasetclassification
Exploratory Data Analysis
All images in the dataset are high quality images with size 224x224x3. All images are cropped so that the card occupies more than 50% of the pixels in overall image. There are 7624 training images, 265 test images and 265 validation images. Each image of size 224x224x3.
I plotted some of the images as well.
The function which I wrote for plotting:
from matplotlib import Path
def show_img(folder):
"""
This function plots 53 images (one correspoding to each class)
on a 11x5 grid of axes.
Args: folder - train/test/valid
"""
#you must change this dir path as per the lcoation of data:
dir = "/content/drive/MyDrive/INM702_CARDS/cards_data/" + folder
i = 0
#getting the figures and axis grid
fig,axes = plt.subplots(nrows = 11, ncols = 5, figsize=(25,60))
path_list = ['']*55 #empyt list to store paths of each image to plot
#iterating inside the data folder
for im_folder in os.listdir(dir):
im_count = 0
#iterating inside the labeled folder
for im in os.listdir(dir + "/" + im_folder):
#print(im)
im_path = dir + "/" + im_folder + "/" + im
#storing the path in path_list:
path_list[i] = im_path
i +=1
im_count +=1
if im_count == 1:
break
#reshaping the path_list to the shape of grid of axes:
path_list = np.array(path_list).reshape((11,5))
#iterating on the figures of axes grid to plot:
for i in range(11):
for j in range(5):
try:
img = Image.open(path_list[i][j])
except:
pass
axes[i,j].imshow(img)
try:
label = path_list[i][j].split('/')[-2]
except:
label = ''
axes[i,j].title.set_text(label)
Plot below helps us to visualise the train test split.
Now let us plot the count of images in each deck.
Now, finally lets check whether the data is balanced for each classes (53):
Observation:
As we can see, we have a good balance of training data for all cards classes. Resolution of images are good and training, testing, valid images are acceptable as far as counts are concerned.
Data Preprocessing
Before training, I had to define my custom image generator class. Firstly, the image generator object was taking long time to load images one by one. So training was very slow. To overcome that I had to optimize my image generator class by using multiple data structures. For example, I created a pandas dataframe and loaded the tensors of all images in pandas data and fed them to generator class.
The dataframe looked like this:
Below is the loading function which I wrote to get the data:
def load_data(folder):
"""
This function read the filepath where images are stored and creates a
list of lists containing Tensors and Label
Args: folder- train/test/valid
Returns: [labels_code, labels, tensors] - a list of lists
"""
#path for data
dir = "/content/drive/MyDrive/INM702_CARDS/cards_data/" + folder
tensors = []
labels = []
labels_code = []
code = 0 #initializing code for label
#iterating over the filenames:
for im_folder in os.listdir(dir):
#iterating inside the labeled folder
for im in os.listdir(dir + "/" + im_folder):
im_path = dir + "/" + im_folder + "/" + im
#reading image using PIL
img = Image.open(im_path)
#transforming PIL image into tensor and also normalizing
transform = transforms.Compose([transforms.PILToTensor()])
img_tensor = transform(img)
#append label to label list
labels.append(im_folder)
labels_code.append(label_map[im_folder])
#append tensor to tensor list
tensors.append(img_tensor)
code += 1 #incrementing the code from next label
return [labels_code, labels, tensors]
And then finally my custom data generator:
#https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
class CustomImageDataset(Dataset):
"""
This class will create a custom image dataset object.
"""
def __init__(self,df, transform=None):
"""
Args: df - a pandas dataframe containing three columns - labels_code, labels, tensors.
transform - bool type
"""
self.df = df
self.transform = transform
def __len__(self):
#returns the length of data
return len(self.df)
def __getitem__(self, idx):
#fetching the image tensor
image = self.df.iloc[idx, 2].float()
#fetching the image label
label = torch.tensor(int(self.df.iloc[idx, 0]))
#applying transformation
if self.transform:
image = self.transform(image)
return (image, label)
Training
I have used two methods to train:
- Customised CNN models
- Pretrained Models from Pytorch
Considering the nature of problem, I trained the images on CNN models customized by myself as well as used the transfer learning. So, in all I have trained the below models.
Please refer to the custom convNet code below:
class ConvNetClassifier(nn.Module):
def __init__(self, output_size):
super(ConvNetClassifier, self).__init__()
"""The parameteres:
output_size - number of distinct labels in output
"""
self.output_size = output_size
"""We will now build out neural network:
input layer -> hidden_layer1 (Relu)->hidden_layer2 (Sigmoid)-> output_layer
"""
#first sequence unit of convolution
self.Conv1 = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=1),
nn.BatchNorm2d(32),
nn.ReLU(),
#nn.MaxPool2d(kernel_size=2, stride=2)
)
#second sequence unit of convolution
self.Conv2 = nn.Sequential(
nn.Conv2d(32, 32, kernel_size=3, stride=1),
#nn.BatchNorm2d(32),
nn.ReLU(),
#nn.MaxPool2d(kernel_size=2, stride=2)
)
#third sequence unit of convolution
self.Conv3 = nn.Sequential(
nn.Conv2d(32, 32, kernel_size=3, stride=1),
nn.BatchNorm2d(32),
nn.ReLU(),
#nn.MaxPool2d(kernel_size=2, stride=2)
)
#fourth sequence unit of convolution
self.Conv4 = nn.Sequential(
nn.Conv2d(32, 16, kernel_size=3, stride=1),
#nn.BatchNorm2d(64),
nn.ReLU(),
#nn.MaxPool2d(kernel_size=2, stride=2)
)
#fifth sequence unit of convolution
self.Conv5 = nn.Sequential(
nn.Conv2d(16, 16, kernel_size=3, stride=1),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
#sixth sequence unit of convolution
self.Conv6 = nn.Sequential(
nn.Conv2d(16, 32, kernel_size=3, stride=1),
#nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
#2D dropout layer for conv features
self.conv_drop = nn.Dropout2d()
#fully connected layers and dropouts
self.fc1 = nn.Linear(52*52*32,512)
self.drop1 = nn.Dropout(0.5)
self.fc2 = nn.Linear(512, 53)
def forward(self,x):
"""
This function connects all the layers/sequences initialized above.
Args: x (input)
returns: x (output)
"""
#connecting the conv sequence units:
x = self.Conv1(x)
x = self.Conv2(x)
x = self.conv_drop(x)
x = self.Conv3(x)
x = self.Conv4(x)
x = self.conv_drop(x)
x = self.Conv5(x)
x = self.Conv6(x)
#flattening the features:
x = x.view(x.size(0), -1)
#connecting with fully connected layers:
x = self.drop1(x)
x = self.fc1(x)
x = self.fc2(x)
return x
The results of my experiments are as follow:
Observations:
- While training, the custom models were overfitting after certain iterations. Then adding dropout layers helped in generalising.
- Also, this problem needed more convolution layers so the performance improved after adding more layers.
- Pretrained models were generalising pretty well. That was expected as Resnet is trained on ImageNet dataset.
- Also, I tried different transformations while loading images like resizing, rotation, normalization, etc. A few of them like normalization did help a bit.
- Normalization transformed the pixel values in the range of 0 to 1.
- Cropping the image was not a good idea as the cards corners are occupied with the number and deck values.
Conclusion:
After training all the models for 10 epochs, the performance of Resnet34 surpassed all other models and hence I trained Resnet34 for 30 epochs to achieve an accuracy of 93.21. The loss can be seen in figure below.