Hi everyone, in this article, we will see how we cat create an image dataset in CSV format. As you know we use images in deep learning applications to train our models. So, we need image datasets to develop deep learning applications. If you beginner in this area you can find prepared image datasets on the internet. But if you want to develop your custom models with different datasets you have to create your own custom datasets. Hence in this tutorial, we will create our custom image dataset consist of cats and dogs images. First, let’s start importing essential libraries which we need.

import numpy as np
import pandas as pd
import cv2
import os
from tqdm import tqdm
from glob import glob
  • numpy : is used for matrix operations.
  • pandas : is used for dataset operations.
  • cv2 : is used for image manipulasyon operations.
  • tqdm : is used to show the percentage of transactions.
  • glob : is used to get file directories of the images.
  • os : is used to list files or folders on the directory.

Before starting to write the code we have to download cat and dog images. To download images use https://www.microsoft.com/en-us/download/details.aspx?id=54765 link. There are 12500 cat images and 12500 dog images we can use.

Fist, we need to create a python class named CreateDataset for this operations. We use the init method to create predefined variables. Also, we can access these predefined variables from other methods in the class. We define the image height and width, dataframe, and, a dictionary to store label names.

class CreateDataset():
    # Costructer method
    def __init__(self):
        self.height = 64
        self.width = 64
        self.dataframe = pd.DataFrame()
        self.PetImage_dict = {"Cat":0, "Dog":1}
    def create_dataset(self):
        for name in os.listdir("PetImages/"): # get image folders from directory
            print("PetImage-Name : ", name) # print folder name
            for image_name in tqdm(glob(f'PetImages/{name}/*.jpg')): # get all image names with path from in folder
                try:                                                 # check is there any error
                    image = cv2.imread(image_name, cv2.IMREAD_GRAYSCALE) # load image and convert to grayscale with opencv
                    image = cv2.resize(image, (self.height,self.width))  # resize the image dimention with opencv
                    image = image.flatten()                              # flatten the image
                    image = np.append(image, self.PetImage_dict[name])   # add 0 or 1 to the image array
                    image_frame = pd.DataFrame(image, columns=[self.dataframe.shape[1]]) # convert image array to the dataframe
                    self.dataframe = pd.concat([self.dataframe, image_frame], axis=1) # combine the dataframes
        if len(self.dataframe) != 0:
            self.dataframe = self.dataframe.T # get the dataframe transpose
    def create_csv(self):
        self.dataframe.to_csv("DogCatDataset.csv") # save the dataset to working directory
dataset = CreateDataset() # create dataset object from CreateDataset class
dataset.create_dataset() # create the image dataset with create_dataset method
dataset.create_csv() # download the dataset with create_csv method as csv format

Finally, We saved our image dataset consists of cat and dog images. Let’s load the dataset and see how it looks like. We can use the pandas library to load the dataset. Like the following code.

cat_dog_dataset.head() # fist five images
dataset head
cat_dog_dataset.tail() # last five images
dataset tail

Note: It may take a lot of time to save images in a CSV file. So, in the next tutorial, we will see how we can store images in numpy format.