Object recognition is a subset of computer vision technologies, whereby, via a computer, information is extracted to detect and identify the object(s) that are present in a scene and provide useful information about such objects. It follows that a primary problem to be solved is the determination of whether the scene information contains some specific object, feature, or element of interest. In contrast to computer-based methods, humans are very efficient at such object recognition, even when the object(s) of interest are present in the scene in different visual characterizations, such as varied viewpoint, size/scale, object translation or rotation, or even where the target object is partially obstructed or obscured in a given image.
Some types of computer-based object recognition can provide generally satisfactory results today: for well-known/well-characterized objects—for example, the Eiffel Tower or storefronts in an urban area, object recognition is less challenging because such objects have been imaged and characterized broadly such that knowledge about the object and its location is largely indexed so as to be retrievable for use. For arbitrary objects that might be present in a scene, however, conventional computer-based methods for dealing with object recognition can, at best, solve only for specific target objects, such as simple geometric objects (e.g., polyhedra), the presence or absence of human faces, or printed or handwritten characters, or in situations when the images are generated so as to substantially standardize the appearance of the object(s) in the image, such as by generating the image having well-defined illumination, background, and object pose, or object position and orientation of the target object relative to the camera.
To provide identifications and other information for one or more objects of interest that may be fully or partially present in scenes substantially without human intervention where the objects may be arbitrary, current object recognition techniques typically use both positive and negative training applied by machine learning algorithms to extract object information, such as labels or other identifying properties, after suitable processing of image data from a scene. In recent years, there have been improvements in such machine learning algorithms, however, limitations in the quality of arbitrary image recognition remains. In other words, instances in which an object, although associated with a predefined class, may not appear to be identifiable by the given method. This can be a common occurrence when the very appearance of the object deviates from the canonical appearance of the class from a particular pose, vantage point, or has uncommon characteristics.
The quality of the object recognition-related and other object-specific information outputs provided resulting from determinations made by the machine learning algorithms can be greatly influenced by the quality of the image data itself. For example, detecting and distinguishing objects in image data acquired from views of uncontrolled environments (urban streets, etc.) can be challenging due to inconsistent, poor or variable scene illumination conditions, features that change (e.g., sunlight, shadows, reflections, rain, snow, night-time street illumination, etc.), or the perspective the object is seen from. The image data incorporating the object(s) of interest may also be acquired from low resolution cameras, thus providing less processable image information. Additionally, with images acquired from cameras that move among and around the scene, objects may partially occlude each other as they move through the scene relative to a camera viewpoint, particularly in situations of high density. Images acquired may also be crowded with multiple objects that are not of interest and/or be cluttered with distracting visual information, include fast moving objects, or have variable object lighting and images resolutions within a common scene. If the image data provided for processing does not incorporate the necessary quantity and quality of processable information about the actual object(s) of interest, it is less likely that the object(s) will be accurately identified, even when using the most sophisticated machine learning algorithms. Accordingly, it would be beneficial to provide enhancements to image data that is input into machine learning algorithms that are used in object identification tasks.
Alternatively, incorporating non-ideal representations of a class can aid in being able to account for such commonly occurring issues. By constructing the aforementioned dataset with both ideal and non-ideal representations for the given classes, the machine learning algorithms would be able to model for such conditions. This involves but is not limited to the inclusion of instances of the classes with noises, obstructions, variations in the object appearance by style or other characteristics, blur, variations in illuminations, etc.
Recently, it has become possible to extract accurate measurements of an object of interest directly from point clouds derived from images of scenes. An example of such methodology using a single passive imaging device is described in U.S. Pat. No. 9,460,517, (the “'517 patent”), the disclosure of which is hereby incorporated by reference in its entirety. Accurate measurements can also be generated from point clouds derived from stereoscopic images. However, again, the quality of the data—in this case, the accuracy of the measurements and other dimensional information about the object—will be affected by the form and content of the information from which the object measurements are to be derived.
Currently, object information for use in libraries is generated from 2D image information. Object recognition techniques continue to improve, which results in attendant improvements in the object libraries, as well as in the results obtained when using machine learning algorithms along with such object libraries. However, object information generated from 2D information generally lacks measurement, dimension, and topological context that can add to the ability to accurately identify and label objects in scenes. For example, a window might be recognized in a scene as being a “window,” but existing object recognition techniques based primarily on 2D object identification may not be able to discern the size of the window, the amount of other windows in the scene, or the placement of the window relative to other objects in the scene, or to the scene itself. The absence of such mathematical context can reduce the accuracy of predictions about the object(s) in the scene, as well as the overall accuracy of the object libraries themselves.
In view of the above, there remains a need for improvements in the form and content of scene and object information used in object recognition techniques for use in object recognition as applied to objects present in a scene. Yet further, there remains a need for improvements in scene data that can be used to generate measurements of objects present in a scene from images or other sources of processable information about the object in the scene. There also remains a need for improved object recognition techniques whereby mathematical context about the objects in the scene can be incorporated into the object recognition results. The present disclosure provides this, and other, benefits.