Object localization is one of the key operations in many applications such as surveillance, monitoring and tracking applications. A tracking system is widely applied in numerous areas such as in military applications as well as in hospitals and offices. In these tracking systems, accuracy of object localization is very critical and poses a considerable challenge. Acoustic sensors have been widely used in many applications due to flexibility, low cost and ease of deployment. An acoustic sensor is sensitive to its surrounding environment with noisy data and does not fully satisfy the requirement of consistent data. Thus, as a reliable tracking method, visual sensors are often applied to tracking and monitoring systems as well. There is provided a simple method for visual localization which allows a robot to determine its absolute position with a view of single landmark in one image. In this algorithm, the image plane is perpendicular to the optical axis and aligned with the optical axis at a distance of its focal length. To track the landmark model, the Lucas-Kanade optical flow algorithm is applied by using gradient descent. This algorithm has feasible real-time performance in indoor environments. However, the approach has the limitation of a pinhole camera model in which only one correspondence can be established.
As an adaptive algorithm, an optical flow-based person tracking algorithm using multiple cameras is presented in an indoor environment. In the algorithm, each camera tracks the target person independently. By exchanging information among cameras, three dimensional positions and the velocities of targets are estimated. In particular, when one of the cameras keeps losing the target by occlusion, the algorithm is more effective since the target position and the velocity in the image are estimated based on the information from other cameras. The target position is obtained from the intersection of projection lines from at least two tracking cameras. Furthermore, in order to estimate the range of depth in a single camera, the algorithm uses a tracking window, which represents the boundary and the height of a detected person in an image plane; thus, the algorithm requires a reliable horizontal position from a completely extracted object region.
There is provided a particle filter based tracking framework which performs multimodal sensors fusion for tracking people in a video-conferencing environment efficiently when multimodal information from multiple cameras and multiple microphone arrays is used to track objects in a scene. For localization, the image coordinates (u,v) in a viewable image are translated to (X,Y,Z) coordinates by using direct linear transformation (DLT). However, in this approach, a calibration object placed in a known geometry is required to estimate the matrix Pi which has eleven parameters. In addition, once the camera is panning or zooming (i.e. camera setting is altered), the calibration of cameras should be repeated. As described above, various conventional algorithms for localizing an object have disadvantages due to a tradeoff between the accuracy of localization and the complexity of calculation.