Field
This invention relates generally to a system and method for detecting and classifying objects in a two-dimensional digital image and, more particularly, to a system and method for detecting and classifying objects in a stream of pixilated two-dimensional digital images, identifying a location of the detected objects in the image and determining the relative velocity and direction of the objects that are moving, where the system includes a multi-layer feed-forward convolutional neural network (CNN) for detecting and classifying the objects and a recurrent neural network (RNN) for determining the relative velocity of the objects.
Discussion
Artificial intelligence (AI) is a part of computer science that employs algorithms that allow software applications to learn from their environment and make decisions therefrom to achieve a certain result. Machine learning is a part of AI that employs software applications that acquire their own knowledge by analyzing vast amounts of raw input data in an iterative manner to extract patterns from the data and to allow the software application to learn to perform a task without being specifically programmed to perform that task. Deep learning is a particular type of machine learning that provides greater learning performance through representing a certain real-world environment as a hierarchy of increasing complex concepts.
Deep learning typically employs a software structure comprising several layers of neural networks that perform nonlinear processing, where each successive layer receives the output from the previous layer. Generally, the layers include an input layer that receives raw data from a sensor, a number of hidden layers that extract abstract features from the data, and an output layer that identifies a certain thing based on the feature extraction from the hidden layers. The neural networks include neurons or nodes that each has a “weight” that is multiplied by the input to the node to obtain a probability of whether something is correct. More specifically, each of the nodes has a weight that is a floating point number that is multiplied with the input to the node to generate an output for that node that is some proportion of the input. The weights are initially “trained” or set by causing the neural networks to analyze a set of known data under supervised processing and through minimizing a cost function to allow the network to obtain the highest probability of a correct output.
Deep learning neural networks are usually employed to provide image feature extraction and transformation for the visual detection and classification of objects in an image, where a video or stream of images can be analyzed by the network to identify and classify objects and learn through the process to better recognize the objects. Thus, in these types of networks, the system can use the same processing configuration to detect certain objects and classify them differently based on how the algorithm has learned to recognize the objects.
Deep learning algorithms and networks continue to improve as data processing capabilities increase. Specific areas of improvement include discriminators that increase the detection quality of the images and the speed at which the objects are recognized and classified.