Computer vision includes acquisition, processing, analysis, and understanding of images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, for example, in the forms of decisions. The image understanding may be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory. The image data may take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner. Further areas of computer vision may include scene reconstruction, event detection, video tracking, object recognition, learning, indexing, motion estimation, and image restoration.
Computer vision technologies are typically complex undertakings involving large amounts of computing resources and lacking accuracy in many cases. For example, existing deep convolutional neural networks (CNNs) involve a fixed-size (e.g., 224×224) input image. This requirement is “artificial” and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale.