What is Sliding Window in Object Detection: Complete Overview of Methods & Tools
How to use sliding window method with object detection models to improve accuracy in Computer Vision.
Table of Contents
Computer vision has made significant progress, but object detection is still one of the key challenges. For some specific tasks, such as detection of small objects and high-density images, standard methods may not be sufficient. Combination with the sliding window method can be useful for significant improvement.
In this article, we explore the potential of the sliding window, a model-agnostic inference method, and investigate how it enhances object detection capabilities when used in combination with the widely discussed YOLOv8 model and the Supervisely application Apply Neural Networks to Images Project.
In this video tutorial you will learn how to configure, visualize and use sliding window inference in Supervisely with any object detection model.
Understanding Sliding Window
The sliding window algorithm creates a small window (or box) of a fixed size, usually a square or rectangle, in the top-left corner of the image. This window slides across the image systematically. At each position, the model within the window analyzes the content inside it. It could be looking for objects, patterns, or anything else it's been trained to detect. After analyzing one section, the window slides a bit to the right and repeats the analysis on the next section. It keeps moving and analyzing until it has covered the entire image. In some cases, there may be an overlap between nearby windows to ensure that no detail is missed.
This can improve the accuracy of object detection models. By doing this, the model effectively "reads" the image piece by piece, just like you might read a book line by line. It helps the model handle images of various sizes because it treats them all in a consistent and systematic manner. Each piece gets the same analysis, regardless of the image's overall size.
Use Cases and Limitations
To use this method effectively, you need to know when to use it and when not to.
|When it's best to use 👍||When not to use 👎|
|Images with objects of the same class of different sizes
This typically applies to images with perspective. In sliding window image processing, the model can analyze numerous small regions. Within these regions, objects of the same class may vary in size, although not as significantly as when considering the entire image simultaneously. When analyzing the entire image, objects in closer proximity can differ significantly in size from those from the farthest perspective, which can negatively affect object detection if not using sliding window.
If you have limited computational resources, using a sliding window on large images may be too costly.
|Images with complex scenes
Where are hundreds / thousands of small objects, partially hidden or close together, have different orientations, positions. Or if the objects are located on the edges of the images.
In some cases, specialized architectures (e.g., pyramidal networks, multi-scale networks with attentional multi-resolution, and so on) may be able to process images of different sizes more efficiently without using a sliding window.
|Images of different sizes in the dataset
When the images in the dataset come in all sorts of different sizes, using a sliding window can make it easier for the model to work with them.
If the images in your dataset are similar in size and objects have a constant scale, using a sliding window may be unnecessary.
The sliding algorithm can utilize different strategies such as:
Shift the window to fit the size
As the window with specified dimensions moves across the image, if its current position exceeds the image boundaries, the window is shifted to align to the nearest edge. This ensures that the entire window stays within the valid data area, preventing incomplete or out-of-bounds analysis.
Add padding to the image
Padding involves adding extra values (often zeros) around the image data increasing its dimensions. This additional space allows the window to move across the image without encountering edge-related issues.
Resize sliding window
Moving over the image with a certain step, the sliding window reaches the borders of the image, at which it's resized to process only the remaining part of the image.
Easy to Use in Supervisely
In our ecosystem we have a ready-made solution that will take your project, walk through each and perform object detection task for every image.
Deploy YOLOv8 as REST API service
Apply NN to Images Project
NN Inference on images in project or dataset
It works simply, all you need to do is follow these steps:
Deploy any model you prefer.
Supervisely supports a lof of different Neural Networks frameworks for training and inference such as MMDetection (train, serve), Detectron2 (train, serve), MMSegmentation (train, serve), YOLOv5 (train, serve), YOLOv8 (train, serve).
Follow this tutorial to learn how to train and use custom model. And since you can use your own models with our applications, this would be a useful read.
But for this tutorial, we'll be using such as the well-known one - YOLOv8. So, launch Serve YOLOv8 application, select from a table pre-trained model, task type and device you will use for this purpose.
Once model is deployed launch application named Apply NN to Images Project.
During startup, select the project for which the object detection task will be applied. Then going to the application interface select previously deployed model and connect to it.
As soon as the application starts, preview with default settings appears - object detection on the whole image at once.Object detection on the whole image at once
Set the necessary settings.
Select the "Mode" on "Sliding Window", activate the "Preview" on the same image and analyze the result.Sliding window mode
To improve the detection result, reconfigure the Sliding window - change its size and overlap. In order to choose the right window size, watch the demo video in "Sliding Window Inference Preview" section, where it shows how much of the image is captured by one step with the current settings.Adjusted sliding window
In this example the window was too large, and many sheep were caught in one pass, so the window was reduced by 40% in dimensions, which increased the detection quality quite a lot.
Apply model to input datathe model will be applied to all project images in Sliding Window mode. At the end of the process a new project with predictions will be created.
The sliding window approach proves to be an important tool in the fields of computer vision and data analysis. It allows systems to break down information into smaller blocks for more detailed analysis, significantly enhancing the accuracy of object, pattern, and anomaly detection. This method brings significant benefits across multiple domains, providing local context and improving the analysis and decision-making process.
In this article, we have seen how this algorithm improves object detection using pre-trained segment models and how easy it is to use in Supervisely Ecosystem.
Supervisely for Computer Vision
Supervisely is online and on-premise platform that helps researchers and companies to build computer vision solutions. We cover the entire development pipeline: from data labeling of images, videos and 3D to model training.
The big difference from other products is that Supervisely is built like an OS with countless Supervisely Apps — interactive web-tools running in your browser, yet powered by Python. This allows to integrate all those awesome open-source machine learning tools and neural networks, enhance them with user interface and let everyone run them with a single click.