Machine Learning + OpenCV for complex RGB image classification

6 min readAug 28, 2022

Machine learning + OpenCV = AI

Background

Currently, electronic devices have enhanced the ability to represent large volumes of information in data that allow decisions to be made. One of these important sources of information is images (unstructured data), whether they are captured by a simple, professional camera, a video frame, a panoramic view from a drone or satellite images.

However, the complexity of the data from an image has significant differences in two main factors, dimension and information.

The dimension refers to the pixel size of the object of study, if there are few elements in an image, it means similar and large pixels, on the contrary, if they are images with a lot of detail, it means many pixels of different sizes, which increases the volume of information and size. The size in real life could represent as such a pixel of a common image, however, it can also represent 1 centimeter, 1 kilometer, or millions of kilometers of information (Figure 1).

Figure 1. Dimension of the images by pixels

Regarding information, we mean that an image with three RGB channels represents a system that allows representing more than 16 million different colors (2³²) through the combination of 256 levels of red, with 256 levels of green and 256 levels of blue. The range 0–255 might seem arbitrary, but it makes a lot of sense: using 8 bits (1 byte) numbers between 0 and 255 can be represented, so the RGB system requires exactly 3 bytes for each pixel of an image.

However, when information is included in an image with multispectral or hyperspectral channels, the byte size increases by (2³²) ^n, where n represents the number of channels added to the image (Figure 2).

Figure 3. Different number of channels for each type of image

For multispectral images they have between 10 or 15 channels and hyperspectral images have more than 100 channels, this would mean that each pixel could contain 300 or more bytes of information for each pixel of an image, for which 1 single image could contain up to 1 Terabyte of information (Figure 3).

Figure 3. Larger byte of information in more complex images

Image processing today demands solutions on a small, medium and large scale. A small-scale data science solution would be to classify images by type of fruit, clothing, or animals. Of medium scale could be identification of elements in each image. And a large-scale solution would be to be able to make predictions on each pixel of the image in order to classify each one of them within a category.

Given the above, in this informative note we will address how to solve a large-scale image processing problem. In which it is intended to provide a methodology that allows rapid processing in the extraction of information, prediction and classification.

Description of the problem

In order to exemplify a large-scale data science problem in image processing, a critical activity today will be solved, the deformation of environmental assets by mining activity.

An environmental asset must be understood as a physical space that sustains various ecological communities, services and environmental goods, which support an integral system typical of a particular geographic space.

The system will be defined by the ability to carry out production processes that promote the resilience of the environment in the face of various phenomena that cause environmental instability in the distribution space and territorial occupation, which synergistically affects society.

At present, the extractive activity of mining causes a gradual degradation of the different environmental assets, which demands the valorization of a management instrument, which allows justifying its investments and prioritizes conservation, preservation and restoration initiatives of natural spaces, all in benefit of the planet and society.

How does Artificial Intelligence participate as a solution?

The integration of technology with the environment is related to the first law of geography, which denotes that all things are related to each other, but things closer in space have a greater relationship than distant ones. This conceptualization allows formulating an integral model of territorial solution, which allows evaluating the natural heritage through the identification of tendencies and scenarios, through the enhancement of certain spectral characteristics of different elements that constitute the terrestrial surface.

The advancement of technology from the processing and interpretation of images has improved the ability to make decisions. Through the processing and application of this type of information through mathematical and statistical methods, the measurement, analysis and modeling of different environmental scenarios can be carried out, which, given the extractive capacity of the mining activity, allows obtaining a powerful management tool for spaces to rehabilitate and protect.

This is where high-performance artificial intelligence systems can allow for more precise and effective indicators in the detection, description, quantification and monitoring of environmental changes impacted by human activity, such as mining.

What problem should AI solve?

The development of artificial intelligence systems makes it possible to transform research into mapping and analysis of land cover, especially in determining statistical models for various categories or classes, which reduces the time of evaluation, monitoring and obtaining results, in terms of the way in which the resilience of the environment evolves with respect to anthropogenic activity.

Data Scientist Goals

Through the implementation of computer vision and machine learning algorithms, an artificial intelligence system can be developed that allows parameters and indicators to be obtained that allow adjustments to be made to the environmental quality management system of the mining operation.

Obtain a large-scale RGB territorial image of the study area, in particular an area with anthropogenic impact due to mining.
Evaluate the volume and size of the pixels of the selected image, in order to measure the volume of information necessary to train the algorithms.
With the dimensioning of the volume of data to be processed, a division of the image will be created into several partitions, in order to speed up the process of data collection and algorithm training.
It is necessary to create spatial training areas for each image partition, to collect information from the RGB spectral band channels of all interpretable elements in the image.
Since it will be a supervised classification problem, it will be necessary to create a data frame that contains the average of the values of all the pixels captured by each training area.
For each image partition, the predictive model must be applied in order to represent the classes in each pixel that encompasses its observation window.
With each prediction made in each of the partitions, the union of all the segments is carried out in order to obtain the total classification result in the entire image, with this improving the training and prediction processing.

Answer the question: Do machine learning and computer vision provide effective solutions in environmental management focused on risk prevention?

Type of research and development

Quantitative research, since the vectorization of pixels to numerical values is considered (Figure 4).

It is intended to integrate a methodology that allows transforming and correcting the effects of luminosity, brightness, color balance and deformation of a set of free satellite images in an RGB channel and low cost such as those from the google earth platform, which subsequently, it will allow the creation of training areas in which similar spectral characteristics are recognized (Figure 5).

Figure 5. Average of the values of all pixels in a training area

Classes and regions are generated with the highest probability of belonging to the same characteristic of the territory in a homogeneous way (Figure 6).