Preventing robberies in urban environments using deep learning and OpenCV

5 min readApr 24, 2022

Deep learning + OpenCV = AI

Background

Crime prevention is one of the aspects of public security that addresses and combats the social phenomenon of crime in order to safeguard the integrity and rights of people, as well as preserve order and social peace. Theft is defined as an act carried out by a criminal by assaulting or seizing property, regardless of whether or not he succeeds in consummating the act.

The offender does not have a certain appearance or mode of operation. The poorly dressed criminal model is outdated today, many people were approached at traffic lights by elegant criminals in suits and ties, or by older people.

Public insecurity is one of the main problems at a national and international level, and the crime with the highest rate is robbery. The state’s failure in terms of public security has forced citizens to create their own security strategies. Given this deficiency in the prevention and security system. It is necessary to find solutions to the problem of public insecurity in an interdisciplinary manner and with the support of emerging technology.

Description of the problem

Particularly in Mexico, according to the National Institute of Statistics and Geography (INEGI), through information collected by the National Survey of Victimization and Perception of Public Security (ENVIPE, 2019), in the country there were 24.7 million people victims of some crime, a figure that is equivalent to 28.3% of the 87.37 million people over 18 years of age in the country.

According to the results, in 47.8% of the cases the victims identified two criminals, and in 37% they realized that there were three criminals. In 45% of the cases of robbery on the street or public transport, the victims identified two criminals; and in 27.4% of the cases there were three criminals who participated in the event. That is why robbery from passers-by is one of the main crimes committed in cities, in which there are always more than 2 criminals present in the crime, which puts the integrity of citizens in urban environments at high risk. current.

How does Artificial Intelligence participate as a solution?

While today’s basic technology isn’t necessarily revolutionary, the algorithms it uses and the results it can produce are. Traditional criminal detection systems detect objects based on size and location, but do not recognize the type of objects or a dynamic proximity risk.

That is why high-performance artificial intelligence systems can automate the monitoring tasks of high-risk sites to provide a high level of security and monitoring efficiency in the safety of citizens.

What problem should AI solve?

Artificial intelligence-based crime-fighting solutions represent a strategic action to prevent problems that address the consequences of vandalism, theft, and the risks that accompany them.

Data Scientist Goals

Through the implementation of computer vision and deep learning algorithms, generate an algorithm that determines an insecurity alert when the risk of proximity between users in urban environments is high.

Create an image dataset from videos using individual frames.
Using a Convolutional Neural Network, extract the features of each training image, by evaluating multiple layers of convolutional filters of one or more dimensions as downsampling.
With the features of people extracted, calculate how many pixels there are (pixels apart) between two objects.
Obtained the separation between objects, define a geometric value that defines the sample space that each person occupies to later convert these separated pixels to real distance.
Generate the dynamic visualization of the behavior of the objects defined in the geometric window and their proximity relationship.

Answer the question: Does machine vision through deep learning and computer vision provide effective solutions by detecting proximity security risks in real time?

Type of research and development

Quantitative research, since the vectorization of pixels to numerical values and a subsampling are considered to identify geometric delimiters that correspond to the object identified with the Convolutional Neural Network classification, in order to later be able to calculate the Euclidean distance between geometric objects in a Euclidean space. as the length of a line segment between the two points from Cartesian coordinates using the Pythagorean theorem.

The data frame for robust model training is extracted from different videos by extracting the frames per second of each event present in the video (Figure 1).

A Convolutional Neural Network was implemented to classify the different objects of the image according to the defined typology, and based on this, obtain the probability that each object belongs to the correct class. Forward and backward propagation was performed to train the model. During the training phase, an image was passed to the model and forward propagation was executed until an output y was obtained (Figure 2).

Figure 2. Convolutional Neural Network Training.

For the tracking of the object of interest, the bounding box was defined for the detection and obtaining of the centroid of the object. At this point, if an object spans more than one grid, it will only be assigned to a single grid in which its midpoint is located. To improve this part of the modeling process, you can reduce the chances of multiple objects appearing in the same grid cell by increasing the number of grids (N size x N size) (Figure 3).

Figure 3. Construction of the bounding box for detection and the centroid of the object.

Finally, with the detection of the object in the image grid, it is necessary to calculate how many pixels of separation there are between two objects, to later convert this separation to a real distance. The real distance will be obtained by applying the foundation of the Pythagorean theorem by calculating the Euclidean distance between the different objects that appear in the video frames.

Conceptually, the Euclidean algorithm works: for each cell (x,y), the distance to each source cell is determined by computing the hypotenuse with x_max and y_max as the other two sides of the triangle. This calculation derives the true Euclidean distance, rather than the cell distance. The shortest distance to an origin is determined, and if it is less than the specified maximum distance, the value is assigned to the location of the output cell (Figure 4).