Updated on

Nov 25, 2024

AI in Retail: Implementing In-Store Analytics with Object Detection AI

We explore how the cutting-edge object detection model YOLOv8 can impact in-store analytics and transform the retail experience.

As the world evolves, so does consumer behavior, bringing significant implications for the retail sector. While uncertainty still surrounds the future of consumer spending, there are several reasons for optimism: inflation is easing, the economy is on the mend, and technological advancements are driving down costs, boosting productivity, and enhancing the customer experience.

For businesses, understanding customer behavior and optimizing store operations has become more critical than ever. In-store analytics play a crucial role in this process, providing insights that enable data-driven decision-making, which in turn helps retailers improve customer experiences, streamline operations, and boost sales.

What Is In-Store Analytics ?

In-store analytics encompasses the entire process of collecting, analyzing, and interpreting data generated within a retail environment. This data is typically gathered through cameras, sensors, and other technologies, offering insights into customer behavior, store performance, and operational efficiency.

Through in-store analytics, retailers can track metrics such as foot traffic, product interactions, and conversion rates by integrating various technologies into their systems. Artificial intelligence (AI) and machine learning (ML) are particularly valuable in this context, allowing retailers to maximize the insights they gain from the data collected via cameras and sensors.

Retailers can leverage these insights to understand how customers navigate their stores, identify which areas receive the most attention, and assess how store layouts influence purchasing decisions.

YOLO Object Detection Explained

Object detection models are a type of AI and computer vision technology designed to identify objects within images or videos. For retailers, these models are invaluable for automating inventory management and preventing theft through real-time monitoring of store shelves and customer activity.

Several open-source object detection models are available, including MediaPipe, YOLO, and others. Each model serves different use cases and offers unique features and advantages based on specific performance requirements.

In this project, we will use YOLOv8 to detect customers. YOLO (You Only Look Once) is renowned for its speed and accuracy, making it an excellent choice for real-time object detection tasks.

By implementing YOLOv8, we can count the number of customers present at any given time, providing valuable insights that can help optimize store operations and enhance the customer experience.

GitHub

Check out the full code and implementation on GitHub.

Let’s Code

1. Installing Libraries

The first step in building the object detection model is to install all the necessary libraries for capturing and detecting objects in a video frame.

pip install ultralytics
pip install opencv-python

2. Initialization

Once the installation is complete, initialize the libraries and specify the model name. The model size can vary depending on the use case, so it's important to explore different models. Check out the different models here.

import cv2

from ultralytics import YOLO
model = YOLO("yolov8n.pt")

Next, declare the file path where the video is located. For this demonstration, a recorded video is used. To implement live video, simply change the file path to 0.

input_video_path = "testing.mp4"

cap = cv2.VideoCapture(input_video_path)# for recorded video
#cap = cv2.VideoCapture(0)# uncomment this for live video

Create a list containing all the classes that ‘yolov8n’ can classify. This will also vary based on the model size chosen, as different models can classify different numbers of objects.

class_list = [
    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
    'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
    'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
    'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
    'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
    'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
    'hair drier', 'toothbrush'
]

3. Capturing Video Frames

The following code snippet reads the video frame by frame and processes each frame individually. This approach is necessary because processing can only be done on images, not on the entire video at once.

while True:
    success, img = cap.read()
    if not success:
        break

4. Object Detection

Each video frame is passed into the model, and the results are fetched as output. This will provide various parameters, such as coordinates, class, confidence, etc., for the detected classes.

    results = model(img, stream=True)


    person = 0
    for r in results:
        boxes = r.boxes
    for r in results:
        boxes = r.boxes
        for box in boxes:
            x1, y1, x2, y2 = box.xyxy[0]
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
            currentClass = class_list[int(box.cls[0])]
            currentConf = box.conf[0]

Next, fine-tune the model to detect only the classes with the class name 'person' and a confidence level greater than 0.3. For better visualization, draw a bounding box around each detected 'person'.

            if currentClass == 'person' and currentConf > 0.3:
                person += 1
                cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 1)
                cv2.putText(img, currentClass, (x1, y1 - 4), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1, cv2.LINE_AA, False)
cv2.putText(img, "person: " + str(person), (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2, cv2.LINE_AA, False)

5. Displaying Results

Finally, display the output video, which detects and counts the number of people in a store, highlighting each person with a bounding box.

    cv2.imshow("Analyzed Video", img)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break


cap.release()
cv2.destroyAllWindows()

Final Output

The sample video used for testing the model is sourced from pexels.com and can be downloaded here.

This setup provides a live preview of the number of people present in the store at any given time. By displaying the video with real-time counts, store owners can continuously monitor and track the store's occupancy at any given moment.

Optimizing the Model (Optional)

Though the existing model is quite accurate, additional optimization can be achieved by collecting data from various shopping malls, supermarkets, and different stores.

This can be done in two steps:

Annotating the Data

You need to annotate the gathered data to teach the machine where the target objects are present in the images. This can be accomplished using cloud services like Roboflow or open-source software like labelImg.

Once the annotation is complete, a file named ‘data.yaml’ will be generated, containing all the necessary metadata.

Training the Model

from ultralytics import YOLO
 
model = YOLO('yolov8n.pt')
 
results = model.train(
   data="LOCATION_TO_DATA.YAML_FILE", #Change this with the path to data.yaml file
   epochs=10,
   batch=8,
   name='Person_model'
)

Once the training is complete, a file named ‘best.pt’ will be generated. You will need to replace the pretrained model ‘yolov8n.pt’ with this new file. Additionally, you should update the ‘class_list’with the objects you trained the model on. These changes will further optimize the model based on the amount of training data.

Use Cases of In-Store Analytics

This model can be applied to various purposes:

Effective Staff Allocation: The model helps analyze peak hours to ensure adequate staffing and allows for real-time staff allocation based on customer traffic.
Enhancing Customer Experience: It can help reduce long checkout lines by signaling when to open additional counters during busy periods.
Inventory Management: The model can be used to correlate foot traffic with inventory levels, predicting demand and preventing stockouts.
Marketing and Promotions: The model can measure the effectiveness of in-store promotions and marketing campaigns by targeting areas with high foot traffic.

Conclusion

In this project, we've explored the significant role of object detection in improving in-store analytics by focusing on counting the number of people inside a store at any given time. By leveraging the advanced capabilities of YOLOv8, we can accurately and efficiently detect customers, providing crucial data for optimizing store operations.

The model we developed not only aids in understanding customer behavior but also supports better resource management and improved customer service. As the retail sector continues to evolve, integrating such technologies will be essential to remain competitive and meet market demands.