ā† Back to blog

What is Sliding Window in Object Detection: Complete Overview of Methods & Tools

Sergey Sych
ā€¢
ā€¢

How to use sliding window method with object detection models to improve accuracy in Computer Vision.

What is Sliding Window in Object Detection: Complete Overview of Methods & Tools

Introduction

Computer vision has made significant progress, but object detection is still one of the key challenges. For some specific tasks, such as detection of small objects and high-density images, standard methods may not be sufficient. Combination with the sliding window method can be useful for significant improvement.

After image
Before image
Improved predictions with Sliding Window

In this article, we explore the potential of the sliding window, a model-agnostic inference method, and investigate how it enhances object detection capabilities when used in combination with the widely discussed YOLOv8 model and the Supervisely application Apply Neural Networks to Images Project.

Video Tutorial

In this video tutorial you will learn how to configure, visualize and use sliding window inference in Supervisely with any object detection model.

Understanding Sliding Window

The sliding window algorithm creates a small window (or box) of a fixed size, usually a square or rectangle, in the top-left corner of the image. This window slides across the image systematically. At each position, the model within the window analyzes the content inside it. It could be looking for objects, patterns, or anything else it's been trained to detect. After analyzing one section, the window slides a bit to the right and repeats the analysis on the next section. It keeps moving and analyzing until it has covered the entire image. In some cases, there may be an overlap between nearby windows to ensure that no detail is missed.

This can improve the accuracy of object detection models. By doing this, the model effectively "reads" the image piece by piece, just like you might read a book line by line. It helps the model handle images of various sizes because it treats them all in a consistent and systematic manner. Each piece gets the same analysis, regardless of the image's overall size.

Sliding window at work

Use Cases and Limitations

To use this method effectively, you need to know when to use it and when not to.

When it's best to use šŸ‘ When not to use šŸ‘Ž
Images with objects of the same class of different sizes

This typically applies to images with perspective. In sliding window image processing, the model can analyze numerous small regions. Within these regions, objects of the same class may vary in size, although not as significantly as when considering the entire image simultaneously. When analyzing the entire image, objects in closer proximity can differ significantly in size from those from the farthest perspective, which can negatively affect object detection if not using sliding window.
Limited resources

If you have limited computational resources, using a sliding window on large images may be too costly.
Images with complex scenes

Where are hundreds / thousands of small objects, partially hidden or close together, have different orientations, positions. Or if the objects are located on the edges of the images.
Specialized architectures

In some cases, specialized architectures (e.g., pyramidal networks, multi-scale networks with attentional multi-resolution, and so on) may be able to process images of different sizes more efficiently without using a sliding window.
Images of different sizes in the dataset

When the images in the dataset come in all sorts of different sizes, using a sliding window can make it easier for the model to work with them.
Similar images

If the images in your dataset are similar in size and objects have a constant scale, using a sliding window may be unnecessary.

Sliding Strategies

The sliding algorithm can utilize different strategies such as:

Shift the window to fit the size

As the window with specified dimensions moves across the image, if its current position exceeds the image boundaries, the window is shifted to align to the nearest edge. This ensures that the entire window stays within the valid data area, preventing incomplete or out-of-bounds analysis.

"Shift the window" strategy

Add padding to the image

Padding involves adding extra values (often zeros) around the image data increasing its dimensions. This additional space allows the window to move across the image without encountering edge-related issues.

"Add padding" strategy

Resize sliding window

Moving over the image with a certain step, the sliding window reaches the borders of the image, at which it's resized to process only the remaining part of the image.

"Resize sliding window" strategy

Easy to Use in Supervisely

In our ecosystem we have a ready-made solution that will take your project, walk through each and perform object detection task for every image.

It works simply, all you need to do is follow these steps:

  1. Deploy any model you prefer.

    Supervisely supports a lof of different Neural Networks frameworks for training and inference such as MMDetection (train, serve), Detectron2 (train, serve), MMSegmentation (train, serve), YOLOv5 (train, serve), YOLOv8 (train, serve).

    Follow this tutorial to learn how to train and use custom model. And since you can use your own models with our applications, this would be a useful read.

    But for this tutorial, we'll be using such as the well-known one - YOLOv8. So, launch Serve YOLOv8 application, select from a table pre-trained model, task type and device you will use for this purpose.

  2. Once model is deployed launch application named Apply NN to Images Project.

    During startup, select the project for which the object detection task will be applied. Then going to the application interface select previously deployed model and connect to it.

    As soon as the application starts, preview with default settings appears - object detection on the whole image at once.

    Object detection on the whole image at once
  3. Set the necessary settings.

    Select the "Mode" on "Sliding Window", activate the "Preview" on the same image and analyze the result.

    Sliding window mode

    To improve the detection result, reconfigure the Sliding window - change its size and overlap. In order to choose the right window size, watch the demo video in "Sliding Window Inference Preview" section, where it shows how much of the image is captured by one step with the current settings.

    Adjusted sliding window

    In this example the window was too large, and many sheep were caught in one pass, so the window was reduced by 40% in dimensions, which increased the detection quality quite a lot.

  4. By clicking Apply model to input data the model will be applied to all project images in Sliding Window mode. At the end of the process a new project with predictions will be created.

Conclusion

The sliding window approach proves to be an important tool in the fields of computer vision and data analysis. It allows systems to break down information into smaller blocks for more detailed analysis, significantly enhancing the accuracy of object, pattern, and anomaly detection. This method brings significant benefits across multiple domains, providing local context and improving the analysis and decision-making process.

In this article, we have seen how this algorithm improves object detection using pre-trained segment models and how easy it is to use in Supervisely Ecosystem.

. . .

Supervisely for Computer Vision

Supervisely is online and on-premise platform that helps researchers and companies to build computer vision solutions. We cover the entire development pipeline: from data labeling of images, videos and 3D to model training.

Get Supervisely for free

The big difference from other products is that Supervisely is built like an OS with countless Supervisely Apps ā€” interactive web-tools running in your browser, yet powered by Python. This allows to integrate all those awesome open-source machine learning tools and neural networks, enhance them with user interface and let everyone run them with a single click.

You can order a demo or try it yourself for free on our Community Edition ā€” no credit card needed!

. . .
Sergey Sych
About the author

Python Developer at Supervisely

Connect on LinkedIn

Table of Contents