We show how an object detection system using YOLO can be used to track crowd analytics in shopping malls by providing accurate and real-time footfall data

Shopping malls rely heavily on footfall data to make informed business decisions. The more customers visit a mall, the more attractive it becomes for big brands and advertisers to set up their shops and endorse their products. Accurate crowd analytics are crucial for understanding customer behavior, peak hours, and the most frequented areas within the mall. However, there is a lack of technology that effectively counts and analyzes the crowd population in real-time. Traditional methods of counting footfall are manual and do not provide real-time data, often resulting in inaccuracies, especially during peak hours. Additionally, understanding customer behavior and movement within the mall is challenging without detailed analytics. Optimizing space utilization by identifying high-footfall areas and efficiently allocating resources such as security and cleaning staff based on crowd density are also significant challenges.
Implementing an object detection system using computer vision technology can solve these challenges by providing accurate and real-time crowd analytics. By installing cameras at strategic locations throughout the mall, an object detection system can count the number of people entering, exiting, and moving within the mall in real time. Advanced algorithms can differentiate between objects and accurately count individuals even in crowded scenarios. The system can track the movement patterns of customers, identifying the most frequented areas, popular routes, and time spent in different sections of the mall. Additionally, the system can generate heatmaps to visualize high-footfall areas, helping mall management make data-driven decisions about space utilization and store placements.
YOLO (You Only Look Once) is a state-of-the-art, real-time object detection system designed to detect objects in images and videos with high accuracy and speed. YOLO was introduced by Joseph Redmon and his collaborators in their research paper "You Only Look Once: Unified, Real-Time Object Detection."
First, load all the libraries.
import cv2
import numpy as npNow we specify the path of the video.
# Specify the path to the video file 
video_path = 'video_path'  // one can also use direct camera itself
# Start video capture
cap = cv2.VideoCapture(video_path)
cap.set(3, 640)
cap.set(4, 480)
Next we will load the model; we’ll use the Ultralytics library to load YOLO.
from ultralytics import YOLO
# Load the YOLO model 
model = YOLO("yolov9c.pt")
Next, we’ll define the class labels.
# List of object classes the YOLO model can detect (we're only interested in detecting persons) 
classNames = ["person"]
Now we’ll define the function to load the video for crowd analytics.
while True:
    success, img = cap.read()  # Read a frame from the video capture
    if not success:
        break  # Exit the loop if the video ends
    results = model(img, stream=True)  # Perform object detection on the frame
    
    # Process results and draw bounding boxes
    for r in results:
        boxes = r.boxes  # Get detected bounding boxes
        for box in boxes:
            # Extract bounding box coordinates
            x1, y1, x2, y2 = box.xyxy[0]
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)  # Convert coordinates to integer values
            # Draw bounding box on the frame
            cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 255), 3)
            # Extract confidence score
            confidence = math.ceil((box.conf[0] * 100)) / 100 
            # Extract class label
            cls = int(box.cls[0])
            if cls < len(classNames):
                class_name = classNames[cls]
            else:
                class_name = f"Class {cls}"
            # Display class name on the frame
            org = [x1, y1]  # Origin coordinates for text
            font = cv2.FONT_HERSHEY_SIMPLEX  # Font type
            fontScale = 1  # Font scale
            color = (255, 0, 0)  # Text color
            thickness = 2  # Text thickness
            cv2.putText(img, class_name, tuple(org), font, fontScale, color, thickness)
We’ll display the result.
# Display the frame with bounding boxes and class labels
    cv2.imshow('Video', img)
    if cv2.waitKey(1) == ord('q'):  # Exit loop if 'q' is pressed
        break
# Release the video capture and destroy all OpenCV windows
cap.release()
cv2.destroyAllWindows()



You can view the video here: https://drive.google.com/file/d/1J36GEpfe4abQOQ3T-P-l42bAuklUgxWv/view?usp=sharing
By leveraging object detection technology, shopping malls can transform their approach to crowd analytics, leading to increased revenue, optimized space utilization, and an enhanced customer experience. Accurate real-time footfall data can attract big brands and advertisers by showcasing high-traffic areas. Optimizing space utilization can maximize rental income by identifying and leveraging high-footfall areas. Improved customer satisfaction can be achieved through better resource allocation and personalized services. Overall, this data-driven approach can enable mall management to make informed decisions about store placements, marketing strategies, and operational improvements, ultimately enhancing the shopping experience for customers.
https://arxiv.org/abs/1506.02640