We explore how the cutting-edge object detection model YOLOv8 can impact in-store analytics and transform the retail experience.
As the world evolves, so does consumer behavior, bringing significant implications for the retail sector. While uncertainty still surrounds the future of consumer spending, there are several reasons for optimism: inflation is easing, the economy is on the mend, and technological advancements are driving down costs, boosting productivity, and enhancing the customer experience.
For businesses, understanding customer behavior and optimizing store operations has become more critical than ever. In-store analytics play a crucial role in this process, providing insights that enable data-driven decision-making, which in turn helps retailers improve customer experiences, streamline operations, and boost sales.
In-store analytics encompasses the entire process of collecting, analyzing, and interpreting data generated within a retail environment. This data is typically gathered through cameras, sensors, and other technologies, offering insights into customer behavior, store performance, and operational efficiency.
Through in-store analytics, retailers can track metrics such as foot traffic, product interactions, and conversion rates by integrating various technologies into their systems. Artificial intelligence (AI) and machine learning (ML) are particularly valuable in this context, allowing retailers to maximize the insights they gain from the data collected via cameras and sensors.
Retailers can leverage these insights to understand how customers navigate their stores, identify which areas receive the most attention, and assess how store layouts influence purchasing decisions.
Object detection models are a type of AI and computer vision technology designed to identify objects within images or videos. For retailers, these models are invaluable for automating inventory management and preventing theft through real-time monitoring of store shelves and customer activity.
Several open-source object detection models are available, including MediaPipe, YOLO, and others. Each model serves different use cases and offers unique features and advantages based on specific performance requirements.
In this project, we will use YOLOv8 to detect customers. YOLO (You Only Look Once) is renowned for its speed and accuracy, making it an excellent choice for real-time object detection tasks.
By implementing YOLOv8, we can count the number of customers present at any given time, providing valuable insights that can help optimize store operations and enhance the customer experience.
Check out the full code and implementation on GitHub.
The first step in building the object detection model is to install all the necessary libraries for capturing and detecting objects in a video frame.
pip install ultralytics
pip install opencv-python
Once the installation is complete, initialize the libraries and specify the model name. The model size can vary depending on the use case, so it's important to explore different models. Check out the different models here.
import cv2
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
Next, declare the file path where the video is located. For this demonstration, a recorded video is used. To implement live video, simply change the file path to 0.
input_video_path = "testing.mp4"
cap = cv2.VideoCapture(input_video_path)# for recorded video
#cap = cv2.VideoCapture(0)# uncomment this for live video
Create a list containing all the classes that ‘yolov8n’ can classify. This will also vary based on the model size chosen, as different models can classify different numbers of objects.
class_list = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
The following code snippet reads the video frame by frame and processes each frame individually. This approach is necessary because processing can only be done on images, not on the entire video at once.
while True:
success, img = cap.read()
if not success:
break
Each video frame is passed into the model, and the results are fetched as output. This will provide various parameters, such as coordinates, class, confidence, etc., for the detected classes.
results = model(img, stream=True)
person = 0
for r in results:
boxes = r.boxes
for r in results:
boxes = r.boxes
for box in boxes:
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
currentClass = class_list[int(box.cls[0])]
currentConf = box.conf[0]
Next, fine-tune the model to detect only the classes with the class name 'person' and a confidence level greater than 0.3. For better visualization, draw a bounding box around each detected 'person'.
if currentClass == 'person' and currentConf > 0.3:
person += 1
cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 1)
cv2.putText(img, currentClass, (x1, y1 - 4), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1, cv2.LINE_AA, False)
cv2.putText(img, "person: " + str(person), (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2, cv2.LINE_AA, False)
Finally, display the output video, which detects and counts the number of people in a store, highlighting each person with a bounding box.
cv2.imshow("Analyzed Video", img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
The sample video used for testing the model is sourced from pexels.com and can be downloaded here.
This setup provides a live preview of the number of people present in the store at any given time. By displaying the video with real-time counts, store owners can continuously monitor and track the store's occupancy at any given moment.
Though the existing model is quite accurate, additional optimization can be achieved by collecting data from various shopping malls, supermarkets, and different stores.
This can be done in two steps:
You need to annotate the gathered data to teach the machine where the target objects are present in the images. This can be accomplished using cloud services like Roboflow or open-source software like labelImg.
Once the annotation is complete, a file named ‘data.yaml’ will be generated, containing all the necessary metadata.
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
results = model.train(
data="LOCATION_TO_DATA.YAML_FILE", #Change this with the path to data.yaml file
epochs=10,
batch=8,
name='Person_model'
)
Once the training is complete, a file named ‘best.pt’ will be generated. You will need to replace the pretrained model ‘yolov8n.pt’ with this new file. Additionally, you should update the ‘class_list’with the objects you trained the model on. These changes will further optimize the model based on the amount of training data.
This model can be applied to various purposes:
In this project, we've explored the significant role of object detection in improving in-store analytics by focusing on counting the number of people inside a store at any given time. By leveraging the advanced capabilities of YOLOv8, we can accurately and efficiently detect customers, providing crucial data for optimizing store operations.
The model we developed not only aids in understanding customer behavior but also supports better resource management and improved customer service. As the retail sector continues to evolve, integrating such technologies will be essential to remain competitive and meet market demands.