Playing Cards Image Classification

6 min readMar 11, 2023

This problem is an image classification problem where we need to classify images of playing cards into 53 different labels (52 standard playing cards + 1 random). There was 7643 train images, 265 validation images and 265 test cards. Each image is 224x224 RGB image. I have used custom convolutional network as well as pretrained networks of ResNet18 and ResNet34.

Data Source: https://www.kaggle.com/datasets/gpiosenka/cards-image-datasetclassification

Exploratory Data Analysis

All images in the dataset are high quality images with size 224x224x3. All images are cropped so that the card occupies more than 50% of the pixels in overall image. There are 7624 training images, 265 test images and 265 validation images. Each image of size 224x224x3.

I plotted some of the images as well.

*Training images samples. Quality of images were quite good.*

The function which I wrote for plotting:

from matplotlib import Path

def show_img(folder):
  """
  This function plots 53 images (one correspoding to each class) 
  on a 11x5 grid of axes.
  Args: folder - train/test/valid
  """
  #you must change this dir path as per the lcoation of data:
  dir  = "/content/drive/MyDrive/INM702_CARDS/cards_data/" + folder
  i = 0
  #getting the figures and axis grid
  fig,axes = plt.subplots(nrows = 11, ncols = 5, figsize=(25,60))
  path_list = ['']*55 #empyt list to store paths of each image to plot

  #iterating inside the data folder
  for im_folder in os.listdir(dir):
      im_count = 0
      
      #iterating inside the labeled folder
      for im in os.listdir(dir + "/" + im_folder):
        #print(im)
        im_path = dir + "/" + im_folder + "/" + im
        #storing the path in path_list:
        path_list[i] = im_path
        i +=1
        im_count +=1
        if im_count == 1:
          break
  
  #reshaping the path_list to the shape of grid of axes:
  path_list = np.array(path_list).reshape((11,5))
  
  #iterating on the figures of axes grid to plot:
  for i in range(11):
    for j in range(5):
      try:
        img = Image.open(path_list[i][j])
      except:
        pass
      axes[i,j].imshow(img)
      try:
        label = path_list[i][j].split('/')[-2]
      except:
        label = ''
      axes[i,j].title.set_text(label)

Plot below helps us to visualise the train test split.

Now let us plot the count of images in each deck.

Now, finally lets check whether the data is balanced for each classes (53):

Observation:

As we can see, we have a good balance of training data for all cards classes. Resolution of images are good and training, testing, valid images are acceptable as far as counts are concerned.

Data Preprocessing

Before training, I had to define my custom image generator class. Firstly, the image generator object was taking long time to load images one by one. So training was very slow. To overcome that I had to optimize my image generator class by using multiple data structures. For example, I created a pandas dataframe and loaded the tensors of all images in pandas data and fed them to generator class.

The dataframe looked like this:

Below is the loading function which I wrote to get the data:

def load_data(folder):
  """
  This function read the filepath where images are stored and creates a
  list of lists containing Tensors and Label
  Args: folder- train/test/valid
  Returns: [labels_code, labels, tensors] - a list of lists
  """
  #path for data
  dir  = "/content/drive/MyDrive/INM702_CARDS/cards_data/" + folder 
  tensors = []
  labels = []
  labels_code = []
  code = 0 #initializing code for label
  #iterating over the filenames:
  for im_folder in os.listdir(dir):

    #iterating inside the labeled folder
    for im in os.listdir(dir + "/" + im_folder):
      im_path = dir + "/" + im_folder + "/" + im
      #reading image using PIL
      img = Image.open(im_path)
      #transforming PIL image into tensor and also normalizing
      
      transform = transforms.Compose([transforms.PILToTensor()])
      img_tensor = transform(img)

      #append label to label list
      labels.append(im_folder)
      labels_code.append(label_map[im_folder])
      #append tensor to tensor list
      tensors.append(img_tensor)

    code += 1 #incrementing the code from next label
  return [labels_code, labels, tensors]

And then finally my custom data generator:

#https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
class CustomImageDataset(Dataset):
    """
    This class will create a custom image dataset object.
    """
    def __init__(self,df, transform=None):
        """
        Args: df - a pandas dataframe containing three columns - labels_code, labels, tensors.
                transform - bool type
        """
        
        self.df = df
        self.transform = transform

    def __len__(self):
        #returns the length of data
        return len(self.df)

    def __getitem__(self, idx):
        #fetching the image tensor
        image = self.df.iloc[idx, 2].float()
        #fetching the image label
        label = torch.tensor(int(self.df.iloc[idx, 0]))
        #applying transformation
        if self.transform:
            image = self.transform(image)
       
        return (image, label)

Training

I have used two methods to train:

Customised CNN models
Pretrained Models from Pytorch

Considering the nature of problem, I trained the images on CNN models customized by myself as well as used the transfer learning. So, in all I have trained the below models.

Please refer to the custom convNet code below:

class ConvNetClassifier(nn.Module):
    def __init__(self, output_size):
        super(ConvNetClassifier, self).__init__()
        """The parameteres: 
            output_size - number of distinct labels in output
        """
        self.output_size = output_size
        """We will now build out neural network:
            input layer -> hidden_layer1 (Relu)->hidden_layer2 (Sigmoid)-> output_layer
        """
        #first sequence unit of convolution
        self.Conv1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            #nn.MaxPool2d(kernel_size=2, stride=2)
            )    

        #second sequence unit of convolution
        self.Conv2 = nn.Sequential(
            nn.Conv2d(32, 32, kernel_size=3, stride=1),
            #nn.BatchNorm2d(32),
            nn.ReLU(),
            #nn.MaxPool2d(kernel_size=2, stride=2)
            ) 

        #third sequence unit of convolution
        self.Conv3 = nn.Sequential(
            nn.Conv2d(32, 32, kernel_size=3, stride=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            #nn.MaxPool2d(kernel_size=2, stride=2)
            ) 

        #fourth sequence unit of convolution
        self.Conv4 = nn.Sequential(
            nn.Conv2d(32, 16, kernel_size=3, stride=1),
            #nn.BatchNorm2d(64),
            nn.ReLU(),
            #nn.MaxPool2d(kernel_size=2, stride=2)
            ) 

        #fifth sequence unit of convolution
        self.Conv5 = nn.Sequential(
            nn.Conv2d(16, 16, kernel_size=3, stride=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
            )

        #sixth sequence unit of convolution
        self.Conv6 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=3, stride=1),
            #nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
            )
        
        #2D dropout layer for conv features
        self.conv_drop = nn.Dropout2d()
        
        #fully connected layers and dropouts
        self.fc1 = nn.Linear(52*52*32,512) 
        self.drop1 = nn.Dropout(0.5)
        self.fc2 = nn.Linear(512, 53)

    def forward(self,x):
        """
        This function connects all the layers/sequences initialized above.
        Args: x (input)
        returns: x (output)
        """
        #connecting the conv sequence units:
        x = self.Conv1(x)
        x = self.Conv2(x)
        x = self.conv_drop(x)
        x = self.Conv3(x)
        x = self.Conv4(x)
        x = self.conv_drop(x)
        x = self.Conv5(x)
        x = self.Conv6(x)

        #flattening the features:
        x = x.view(x.size(0), -1)

        #connecting with fully connected layers:
        x = self.drop1(x)
        x = self.fc1(x)
        x = self.fc2(x)
       
        return x

The results of my experiments are as follow:

Observations:

While training, the custom models were overfitting after certain iterations. Then adding dropout layers helped in generalising.
Also, this problem needed more convolution layers so the performance improved after adding more layers.
Pretrained models were generalising pretty well. That was expected as Resnet is trained on ImageNet dataset.
Also, I tried different transformations while loading images like resizing, rotation, normalization, etc. A few of them like normalization did help a bit.
Normalization transformed the pixel values in the range of 0 to 1.
Cropping the image was not a good idea as the cards corners are occupied with the number and deck values.

Conclusion:

After training all the models for 10 epochs, the performance of Resnet34 surpassed all other models and hence I trained Resnet34 for 30 epochs to achieve an accuracy of 93.21. The loss can be seen in figure below.

Playing Cards Image Classification

Exploratory Data Analysis

Data Preprocessing

Training

Written by Saurabh Raj

No responses yet