Current object tracking mechanisms are costly, inaccurate, and/or computationally expensive and fail to provide means for annotation of information related to the moving object.
For example, U.S. Pat. No. 7,688,349 is directed towards a system that can track individuals to determine whether the individuals belong to the same group. However, the '349 patent discloses the use of video cameras and does not provide a unique identification (id) for each individual in the group.
Similarly, U.S. Pat. No. 8,630,460 is directed towards a system and method useful for augmenting a 2D image into a 3D representation. This system, like most similar systems, utilizes video cameras to capture the environment and does not provide means for annotating the detected objects with metadata as provided herein.
Along the same lines, U.S. Pat. No. 7,327,362 provides a method for providing volumetric representations of three-dimensional objects. However, this method requires assigning foreground and background voxels to a silhouette, a feature that is not required in this disclosure.
Using a depth sensor, determining the contour and volumetric representation of an object, and annotating said volumetric representation with additional data including, but not limited to, data obtained from additional sensors, such as microphones or video cameras, data from pre-trained Convoluted Neural Networks (CNN), and/or new data developed from the data provided from those sensors, overcomes the challenges over the prior art and affords an inexpensive solution to track and annotate moving objects. The present disclosure overcomes one or more of the problems found in the prior art.