Mastering Signature Detection with YOLO11: A Step-by-Step Guide to Training Custom Datasets

Introduction

YOLO (You Only Look Once) has redefined real-time object detection by striking the perfect balance between speed and accuracy. Now, with the release of YOLO version 11, the framework has evolved even further, pushing the boundaries of what’s possible in deep learning. Whether you're detecting objects in images or live-streaming videos, YOLO11 excels by offering unparalleled performance and flexibility. The latest version introduces enhanced convolutional layers, improved backbone architectures, and faster inference times, making it the go-to tool for a variety of detection tasks.

In this guide, we're taking on a specific challenge: signature detection using YOLO11. Whether you're developing an application for document verification, signature authentication, or legal tech, detecting signatures accurately is crucial. And while working with custom datasets can often feel overwhelming, YOLO11 simplifies the process with its advanced features and easy-to-use framework.

What You'll Learn

By the end of this guide, you’ll be equipped to:

Set up YOLO11 for custom object detection using your own signature dataset.
Prepare and annotate your dataset for optimal model performance.
Train a YOLO11 model, leveraging its robust architecture for signature detection.
Fine-tune it to improve precision, recall, and detection accuracy for your specific task.

This guide is for everyone—from those just starting with object detection to seasoned developers looking to implement YOLO11 for specialized use cases.

‍

Top Reasons to Use YOLO11 for Signature Detection

Using YOLO11 for signature detection offers several compelling advantages:

‍

Enhanced Feature Extraction: YOLO11 incorporates an improved backbone and neck architecture, which significantly enhances its ability to extract features from images. This leads to more precise detection of signatures, even in complex document layouts.
Optimized Efficiency and Speed: The model is designed with refined architectural elements and optimized training pipelines, allowing it to process images quickly while maintaining a balance between speed and accuracy. This is crucial for applications requiring real-time processing, such as document verification systems.
Greater Accuracy with Fewer Parameters: YOLO11 achieves a higher mean Average Precision (mAP) compared to its predecessors while using 22% fewer parameters. This makes it computationally efficient, allowing for deployment on resource-constrained devices without sacrificing performance.
Adaptability Across Environments: The model can be deployed on various platforms, including edge devices and cloud systems, which provides flexibility in application scenarios. This adaptability ensures that YOLO11 can be used effectively in diverse environments, from mobile applications to server-based systems.
Broad Range of Supported Tasks: Beyond signature detection, YOLO11 supports various computer vision tasks such as object detection, instance segmentation, and pose estimation. This versatility makes it a valuable tool for projects that may evolve beyond just signature detection.

These features make YOLO11 particularly well-suited for applications in document verification and fraud detection, where accuracy and efficiency are paramount.

‍

Step-by-Step: Implementing YOLO11 for Signature Detection

‍

Installing a Python Library (ultralytics)

Explanation:

!pip: The exclamation mark runs a shell command inside Jupyter or Colab. Here, it runs pip, Python’s package manager.
install: Tells pip to install a package.
-q (Quiet Mode): Minimizes the installation output, only showing important messages.

ultralytics: The package you are installing. This is the library used for YOLO models, particularly for object detection tasks.

!pip install -q ultralytics

‍

Importing Required Libraries

This block of code imports several libraries and modules that are essential for working with object detection models and image processing tasks:

YOLO for object detection,
OpenCV (cv2) for image and video handling,
NumPy for array manipulations,
requests for downloading images from the web,
BytesIO for handling in-memory byte streams,
Pillow (PIL) for manipulating and converting images.

All of these work together to help you load, process, and analyze images or videos when using a YOLO model.

from ultralytics import YOLO
import cv2
import numpy as np
import requests
from io import BytesIO
from PIL import Image

‍

Running YOLOv11 Model for Object Detection on an Image

‍

!yolo:

The exclamation mark ! allows you to run shell commands directly from a Jupyter or Colab notebook. Here, the yolo command is being run, which is a part of the ultralytics package you installed earlier. It’s used to interact with YOLO models.

task=detect:

This specifies the task you want to perform. In this case, the task is object detection (detect). YOLO has various tasks like classification, segmentation, etc., but here you're asking it to detect objects in an image.

mode=predict:

This tells YOLO that you are in predict mode, meaning you want to use a pre-trained YOLO model to predict or detect objects in the image provided. The model will process the image and output the detected objects with bounding boxes and labels.

model=yolo11n.pt:

This specifies the YOLO model that you’re using. In this case, it’s yolo11n.pt, which likely refers to a YOLOv11 nano model (a smaller, faster version of YOLO). YOLO models come in different sizes (nano, small, medium, large), with nano being the lightest and quickest, but sometimes less accurate compared to larger models.

source="https://ultralytics.com/images/bus.jpg":

This is the source image for your object detection task. The image is being pulled from an online URL, and it’s a picture of a bus. YOLO will process this image and detect objects like people, buses, or other common objects in the image.

!yolo task=detect mode=predict model=yolo11n.pt source="https://ultralytics.com/images/bus.jpg"

‍

Next, you'll need to download the pre-trained YOLO11 model, specifically the yolo11n.pt file, which contains all the necessary weights for this version of YOLO. Once you have the model, we'll move on to training it using our custom signature detection dataset.

‍

Training a YOLOv11 Model for Object Detection on a Custom Dataset

‍

Before we begin, there are a few essential components we need to set up. First, you'll need an annotated custom dataset folder that contains all the images for training and their corresponding labels. The dataset should be organized into two main directories: train and test, each containing the images and their associated label files.

Additionally, a YAML configuration file is required to guide YOLO through the dataset's structure. This file will include the following information:

Paths to the train and test datasets.
The number of classes (in our case, one for signatures).
Names of the classes (for example, signature).

‍

Here’s an example of the dataset structure:

/dataset
│
├── /train
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── ...
│   └── labels
│       ├── image1.txt
│       ├── image2.txt
│       └── ...
│
├── /test
│   ├── imageA.jpg
│   ├── imageB.jpg
│   ├── ...
│   └── labels
│       ├── imageA.txt
│       ├── imageB.txt
│       └── ...
│
└── dataset.yaml

‍

Here’s an example of how your YAML file might look:

train: /path/to/your/dataset/train 
val: /path/to/your/dataset/test 
nc: 1 # number of classes 
names: ['signature'] # class name

‍

Summarizing: To set up YOLO11 for custom signature detection, you'll need to organize your dataset into train and test directories, with images and their corresponding label files. A YAML configuration file will be used to guide YOLO through your dataset, specifying the paths to these directories, the number of classes (one, for signatures), and the class name. Once the dataset and YAML file are properly set up, YOLO11 can be trained on your custom dataset for accurate signature detection.

‍

Creating signature.yaml file

# Ultralytics YOLO 🚀, AGPL-3.0 license
# Signature dataset by Ultralytics
# Documentation: https://docs.ultralytics.com/datasets/detect/signature/
# Example usage: yolo train data=signature.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── signature  ← downloads here (11.2 MB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/signature # dataset root dir
train: train/images # train images (relative to 'path') 143 images
val: valid/images # val images (relative to 'path') 35 images

# Classes
names:
  0: signature

# Download script/URL (optional)
download: https://github.com/ultralytics/assets/releases/download/v0.0.0/signature.zip

‍

As we start training YOLO, it's important to make sure we have all the necessary dependencies in place to avoid any issues during the process. This includes downloading the dataset and ensuring everything is properly set up. By taking care of these requirements early on—like installing the right libraries and tools—we can ensure the training process goes smoothly. With everything in place, YOLO will be ready to work with our custom signature detection dataset, and we can focus on training the model without any interruptions.

‍

Now that you've downloaded the yolo11n.pt file, make sure to place it in your current working directory. This is essential, as we'll be using this file as the pre-trained model to train on our custom signature detection dataset. With the file in place, we're ready to start training the model and fine-tuning it for our specific task

‍

Now, let’s train the model:

model = YOLO("<path to your yolo11n.pt model>")

# Train the model
results = model.train(data="<path to your signature.yaml file>", epochs=100, imgsz=640)

‍

Transferred 643/649 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs/detect/train2', view at http://localhost:6006/
Freezing layer 'model.23.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLO11n...
AMP: checks passed ✅
train: Scanning /content/datasets/signature/train/labels.cache... 143 images, 0 backgrounds, 0 corrupt: 100%|██████████| 143/143 [00:00<?, ?it/s]
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, num_output_channels=3, method='weighted_average'), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
/usr/local/lib/python3.10/dist-packages/albumentations/__init__.py:13: UserWarning: A new version of Albumentations is available: 1.4.17 (you have 1.4.15). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
  check_for_updates()
val: Scanning /content/datasets/signature/valid/labels.cache... 35 images, 0 backgrounds, 0 corrupt: 100%|██████████| 35/35 [00:00<?, ?it/s]
Plotting labels to runs/detect/train2/labels.jpg... 
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: AdamW(lr=0.002, momentum=0.9) with parameter groups 106 weight(decay=0.0), 113 weight(decay=0.0005), 112 bias(decay=0.0)
TensorBoard: model graph visualization added ✅
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/detect/train2
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      1/100      8.69G      1.033      3.177      1.327         28        640: 100%|██████████| 9/9 [00:08<00:00,  1.10it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:00<00:00,  3.19it/s]                   all         35         35      0.871      0.714       0.84      0.771
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      2/100      8.63G     0.8318       1.52      1.171         40        640: 100%|██████████| 9/9 [00:04<00:00,  1.82it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:00<00:00,  4.79it/s]                   all         35         35      0.906      0.886      0.892      0.659
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      3/100      8.69G     0.8441     0.9709      1.123         27        640: 100%|██████████| 9/9 [00:05<00:00,  1.62it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:00<00:00,  2.97it/s]
                   all         35         35     0.0667      0.833     0.0663      0.019

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      4/100      8.63G      1.058      1.047      1.305         34        640: 100%|██████████| 9/9 [00:05<00:00,  1.64it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:00<00:00,  2.81it/s]                   all         35         35    0.00806      0.771    0.00729    0.00382
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      5/100      8.69G     0.9839      1.047      1.258         33        640: 100%|██████████| 9/9 [00:05<00:00,  1.74it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:00<00:00,  3.20it/s]

‍

Here's an overview of what the model training process will look like.

‍

At the end you might see this:

Validating runs/detect/train2/weights/best.pt...
Ultralytics 8.3.4 🚀 Python-3.10.12 torch-2.4.1+cu121 CUDA:0 (Tesla T4, 15102MiB)
YOLO11m summary (fused): 303 layers, 20,030,803 parameters, 0 gradients, 67.6 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:00<00:00,  4.25it/s]
                   all         35         35      0.998          1      0.995       0.98
Speed: 0.1ms preprocess, 6.2ms inference, 0.0ms loss, 1.8ms postprocess per image
Results saved to runs/detect/train2

‍

Once the training process is complete, your trained model will be saved in the directory at runs/detect/train2. You can navigate to this path to access the trained model files, which you can then use for making predictions or further evaluations.

You can easily download or copy the trained model from the runs/detect/train2 directory and move it to your current working directory. This will make it convenient to access the model whenever you need to use it for detecting signatures.

Loading a Pre-trained YOLO Model

model = YOLO('yolo11m.pt'): This line creates an instance of the YOLO class by loading a pre-trained YOLOv11 medium model from the specified file (yolo11m.pt). The model is now ready to be used for tasks such as detection, inference, or further training.

model=YOLO('<your downloaded or copies model>')

‍

Function to Fetch an Image from a URL and Convert to OpenCV Frame

‍

Function Definition:

def url_to_cv2_frame(url): This defines a function named url_to_cv2_frame that takes an image URL as an argument.

Docstring:

The docstring explains what the function does, its parameters (url), and the return type (an OpenCV frame).

Fetching the Image:

response = requests.get(url): Sends a GET request to fetch the image from the provided URL.

Checking the Response:

The function checks if the request was successful by verifying if response.status_code is 200 (HTTP OK).

Converting to Byte Stream:

image_bytes = BytesIO(response.content): Converts the image content into a byte stream, allowing for image processing.

Opening with PIL:

pil_image = Image.open(image_bytes): Opens the image using the Pillow library (PIL).

Converting to OpenCV Format:

open_cv_image = cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR): Converts the PIL image (in RGB format) to an OpenCV-compatible format (BGR).

Returning the Image:

If successful, the function returns the OpenCV image. If there was an error in fetching the image, it prints an error message and returns None.

def url_to_cv2_frame(url):
    """
    Fetch an image from a URL and convert it to an OpenCV (cv2) frame.
   
    Args:
        url (str): URL of the image.

    Returns:
        frame (numpy.ndarray): OpenCV frame of the image.
    """
    # Fetch the image from the URL
    response = requests.get(url)
   
    # Check if the request was successful
    if response.status_code == 200:
        # Convert the content of the response into a byte stream
        image_bytes = BytesIO(response.content)
       
        # Open the image using PIL and convert it into an OpenCV-compatible format
        pil_image = Image.open(image_bytes)
        open_cv_image = cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR)
       
        return open_cv_image
    else:
        print("Error fetching image from URL")
        return None

‍

Processing a Frame for Signature Detection

‍

Function Definition:some text
- def process(frame): This defines a function named process that takes an image frame as input.
Variable Initialization:some text
- SignatureDetected and SignaturePercentage are initialized to zero. These variables will track the detection status and area coverage.
Model Prediction:some text
- SignatureResult = model(frame, stream=True): The YOLO model processes the frame to detect Signatures. The stream=True parameter allows real-time processing.
Image Dimensions:some text
- sh, sw, c = frame.shape: Extracts the shape of the frame (height, width, and channels).
Target Area Calculation:some text
- ta = sh * sw * 0.5: Calculates half of the total area of the frame to use as a reference for determining Signature coverage.
Processing Results:some text
- The outer loop iterates through the detection results (SignatureResult), initializing fireArea to zero for calculating the detected Signature area.
Bounding Box and Confidence Calculation:some text
- The inner loop iterates through the detected bounding boxes. It retrieves the coordinates of each box (x1, y1, x2, y2) and converts them to integers.
- Confidence Score: conf = math.ceil(box.conf[0] * 100) / 100 rounds the confidence score to two decimal places.
- Area Calculation: The area of the bounding box is computed and added to fireArea.
Drawing on Frame:some text
- cvzone.cornerRect(...): Draws a rectangle around the detected Signature.
- cvzone.putTextRect(...): Adds text displaying the confidence score above the bounding box.
Calculating and Displaying Coverage:some text
- The percentage of the area covered by detected Signatures is calculated and displayed on the frame using cv2.putText(...).
Return Statement:

return frame: The processed frame with detected Signatures and coverage percentage is returned.

def process(frame):
    SignatureDetected=0
    SignaturePercentage=0
    SignatureResult=model(frame,stream=True)
    sh,sw,c=frame.shape
    ta=sh*sw*0.5

    for r in SignatureResult:
        fireArea=0
        for box in r.boxes:
            x1,y1,x2,y2=box.xyxy[0]
            x1,y1,x2,y2=int(x1),int(y1),int(x2),int(y2)
            conf=math.ceil(box.conf[0]*100)/100
            w,h=x2-x1,y2-y1
            currentArea=int(0.5*h*w)
            fireArea=fireArea+currentArea
            cvzone.cornerRect(frame,(x1,y1,w,h),l=9,rt=2,colorC=(0,255,0))
            cvzone.putTextRect(frame,f'Signature : {conf}',(max(0, x1), max(35, y1)),scale=1,thickness=1, offset=10)
    # Encode the frame as JPEG
        # cv2.putText(img,f'{int(conf)}',(x1,y1-20),cv2.FONT_HERSHEY_COMPLEX,1,(255,255,0),1)
        SignaturePercentage=int((fireArea/ta)*100)
        cv2.putText(frame,f'Total Area : {SignaturePercentage}%',(10,70),cv2.FONT_HERSHEY_COMPLEX,1,(255,255,0),1)
       
    return frame

‍

Fetching, Processing, and Displaying an Image from a URL

‍

URL Definition:

url = "https://deadline.com/wp-content/uploads/2020/10/blue-ridge-fire.jpeg?w=681&h=383&crop=1": A string containing the URL of the image you want to fetch.

Fetching the Image:

frame = url_to_cv2_frame(url): Calls the previously defined url_to_cv2_frame function to fetch the image from the URL and convert it to an OpenCV frame.

Processing the Frame:

result = process(frame): Passes the fetched frame to the process function, which performs object detection (in this case, detecting Signatures) and returns the annotated frame.

Display Condition:

if frame is not None:: Checks if the frame was successfully fetched (i.e., it is not None).

Displaying the Result:

cv2.imshow("Image from URL", result): Displays the processed image in a window titled "Image from URL."
cv2.waitKey(0): Waits indefinitely for a key press. The window will remain open until a key is pressed.

cv2.destroyAllWindows(): Closes all OpenCV windows after the key press.

url = "https://deadline.com/wp-content/uploads/2020/10/blue-ridge-fire.jpeg?w=681&h=383&crop=1"  # Replace with your image URL
frame = url_to_cv2_frame(url)

result = process(frame)



if frame is not None:
    # Display the fetched frame
    cv2.imshow("Image from URL", result)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

‍

Results

‍

Conclusion

By now, you’ve taken a deep dive into the process of leveraging YOLO 11 for signature detection, from setting up your environment to training and fine-tuning your custom model. Whether you started from scratch or used a pre-trained model, you’ve seen firsthand how YOLO 11’s cutting-edge features—like its enhanced multi-scale detection and optimized backbone—make it a powerful tool for detecting signatures with precision and speed.

The ability to work with custom datasets and fine-tune models for specific tasks like signature detection opens up numerous possibilities across industries. From document verification and fraud detection to legal tech and beyond, automating the process of signature identification can save both time and resources.

Now, with the knowledge you’ve gained, you’re equipped to implement YOLO11 in your projects and push the boundaries of what’s possible with object detection. Remember, the key to success lies in iterative improvement: continue experimenting with data augmentation, adjusting model parameters, and refining your training process to get the best results for your specific use case.

‍