OpenCV for fast detection of any dynamic object

José Luis Domínguez
4 min readFeb 26, 2023

Artificial vision in OpenCV


OpenCV is a powerful tool that can open the doors to many portable and stationary intelligent systems, such as surveillance system, activity monitoring, detecting a particular object, reading vehicle license plate, facial recognition, among others.

Although OpenCV has many functionalities, we will now develop how the integration of some of these allows solving a wide variety of opportunity areas in different industries.



The data fetch will be collected directly from a webcam via frames. The number of pixels that each frame stores will allow us to compare some differences between each scene and the detection of defined objects. The combination of these two parameters will allow us to identify any activity (movement) in the defined observation window.


1.- Frame. Two variables are defined that start reading different frames (frame1 and frame2) from a webcam. Subsequent frames are compared to the previous one using a difference function. The difference between each pair of corresponding matrices (frames) is calculated. The difference between matrices is placed in absolute value, example: |65–226| = |-161|.

2.- Color space. To generate an even clearer differentiation, the color space of the resulting absolute image is changed, in order to be able to represent colors numerically and binary. The result will be an image in which the difference of two pixels is high when the pixel is whiter, ie on a gray scale.

3.- Smoothing filter. Since the decision threshold computation will depend on an exact numerical value, an interpolation of the output matrix obtained from frame differentiation must be generated. For this Gaussian filtering is applied which involves convolving each point in the input matrix with a Gaussian kernel (normalization window) to produce non-linear decision bounds that group data points and then summing to produce an output matrix.

4.- Segmentation threshold. One of the basic artificial vision techniques will be implemented, which will allow us to separate the objects that interest us from the foreground. As the first argument, the source image must be a grayscale image (in this case we have optimized it with a Gaussian filter). As a second argument, when applying a binary segmentation we must define a simple decision threshold, if the pixel value is greater than the threshold value, it is assigned a value (white), otherwise, it is assigned another value (black ). It is important to mention that for this to work correctly we must guarantee a high contrast between the foreground and the background of the image in controlled or strategic lighting conditions.

5.- Morphological transformation. As the detection of objects will depend to a great extent on the lighting environment to which the analysis is exposed, one of the main problems to encounter is the dispersion between the segmented elements in our frame. For this we will apply a morphological dilation operation on the segmented frame, in simple words with this we will join the broken parts of the resulting segmentation. To do this, it will be defined that when evaluating the kernel, any value close to 1 is considered as 1, thus interpolating more pixels of the object of interest in the frame.

6.- Contours. A question at this point would be, why do we want to calculate contours if we already join the sparse parts of the segmentation. Of course, we already have them joined, but we have not yet defined it as a geometric object, which is why, on the segmented and morphologically transformed image, a line must be defined that joins all the segments along the limit of an image. that have the same intensity, with this we will be able to find the size of the object of interest with greater precision, since the contours will highlight the limits of the objects present in the frame.

7.- Outline features. With the contours well defined (as a geometric entity), the next thing to do is to perform the shape analysis. The first shape analysis is the calculation of the area of the object of interest in the frame, the area is calculated using Green’s formula; the returned area and the number of pixels are non-zero. Now to highlight the object of interest, a rough rectangle is created based on four marks: 1) x: x coordinate, 2) y: y coordinate, 3) w: width, and 4) height.




José Luis Domínguez

Data scientist who develops sustainable reliability in the processes driven by the development of artificial intelligence in future society.