Related background fields for this disclosure include photogrammetry, Structure from Motion (SFM) and related Bundle Adjustment, and Simultaneous Location and Mapping (SLAM) formulations. Software toolkits developed in these fields are used to analyze images, such as a sequence of video of a scene, or a collection of images of a scene from various viewpoints, and extract 3D structure of the scene or objects in that scene.
Our work in these areas stemmed from our interest in Intuitive Computing Platforms (ICPs), in which computing devices, typically mobile devices, are equipped with cameras, microphones, RF radios and a host of other sensors, and process streams of sensor input, to recognize signals and objects around them, discern user intent, and deliver relevant services. See, e.g., our US Patent Application Publication 20110161076, entitled, “Intuitive Computing Methods and Systems,” which is hereby incorporated by reference in its entirety. The processes for recognizing signals and objects include feature extraction and feature tracking, image and object recognition, audio recognition, detection of machine readable signals and patterns, etc., with the aid of supporting technologies, such as machine learning, Kalman filtering, and teachings from the above fields. Photogrammetric techniques, like SFM and SLAM for example, are used to extract 3D structure from the sensor inputs in the ICP platform, which in turn, aids in object recognition and identification, and other applications, as outlined in this disclosure.
In this disclosure, we build upon our ICP disclosures as well as our signal processing work described in Ser. No. 14/466,869 (US Patent Application Publication 20150055837), and our multi-spectral and machine learning work described in Ser. No. 14/201,852 (US Patent Application Publication 20140293091), 62/054,294 and Ser. No. 14/836,878, as well as various other cited works throughout. In particular, we describe various forms of feature vector transforms, and use of the feature vector transforms to extract dense feature sets that are exploited in applications. These applications include recovering surface micro-topology (e.g., surface texture extraction from motion), precise object counting and measurement, object recognition and identification, to name a few. Some of the applications are designed to be carried out on smartphones (e.g., with access to cloud computing as needed), while others are adapted for application domain devices, like fruits and vegetable identification in point of sale scanner devices.
One aspect of the invention is a method for computing surface texture in which image frames of a scene (e.g., video frames from a user passing a smartphone camera over an object) are transformed into dense feature vectors, and feature vectors are correlated to obtain high precision depth maps.
In one implementation, for example, six dimensional 6D pose is determined from the video sequence, and then used to register patches of pixels from the frames. Registered patches are aligned and then correlated to local shifts. These local shifts are converted to precision depth maps.
Feature vector transforms that provide dense feature vectors are described, as are several methods and systems for exploiting them. For example, these feature vector transforms are leveraged in a signal processing method comprising several levels of interacting loops. At a first loop level, a structure from motion loop process extracts anchor features from image frames. At another level, an interacting loop process extracts surface texture. At additional levels, object forms are segmented from the images, and objects are counted and/or measured. At still a higher level, the lower level data structures providing feature extraction, 3D structure and pose estimation, and object surface registration are exploited by higher level loop processes for object identification (e.g., using machine learning classification), digital watermark or bar code reading and image recognition from the registered surfaces stored in lower level data structures.
Another aspect of the invention is a method of obtaining surface detail of an object from a video sequence captured by a moving camera over the object. The method provides a camera model and the video sequence. The method determines pose estimation from the video sequence using the camera model and registers images from different frames using the pose estimation. The method performs a feature vector transform on the images to produce N-dimensional feature vector per pixel of the images. The feature vector transform produces for each pixel in an array of pixels, a first vector component corresponding to plural comparisons between a center pixel and pixels at plural directions around the center pixel for a first scale, and second vector component corresponding to plural comparisons between the center pixel and pixels at plural directions around the center pixel for a second scale. The method correlates the feature vector transforms of the images to obtain shift measurements between the images, and obtains surface height detail of the object from the shift measurements.
Feature vector transforms are used to improve pose estimation by providing dense feature sheets per image to refine the pose estimation vector. A feature vector transform is applied to image frames to provide a feature vector per pixel for the pose estimation process. The pose estimation process finds shifts between feature vector arrays and determines the pose estimation from the shifts.
These methods are implemented in software instructions. In one application, the instructions are executed by a processor in a mobile device, which captures the video sequence of an object via a camera.
Another aspect of the invention is a system for obtaining surface detail of an object from a video sequence captured by a moving camera over the object, the system comprising:
means for estimating pose of the object relative to the camera from the video sequence;
means for transforming the images into dense feature vector arrays, the feature vector arrays comprising a feature vector per pixel, the feature vector having a first vector component corresponding to plural comparisons between a center pixel and pixels at plural directions around the center pixel for a first scale, and second vector component corresponding to plural comparisons between the center pixel and pixels at plural directions around the center pixel for a second scale; and
means for obtaining surface height detail of the object from the dense feature vector arrays.
In one variation, the means for estimating pose comprises a processor programmed with instructions to:
determine a coarse 6D pose from the video sequence based on a camera model;
obtain dense feature vector transforms of images in the video sequence;
aligning the feature vector transforms with the coarse 6D pose; and
determining a refined 6D pose from the aligned feature vector transforms.
In another, the means for obtaining surface height detail comprises a processor programmed with instructions to:
obtain shift measurements between the images from the dense vector arrays; and
obtain surface height detail of the object from the shift measurements.
The foregoing and other features and advantages of the present technology will be more readily apparent from the following Detailed Description, which proceeds with reference to the accompanying drawings.