Target tracking is a classical problem, so diverse set of algorithms exist in the literature. However, most of the tracking algorithms assume predetermined target location and size for track initialization. Hence, in many applications, target size and location are required as input from human-users. Obviously, track initialization can drastically change tracker performance, since it determines for the tracker what to track, i.e. features, appearance, contours. Thus, any insignificant or false information may result in mislearning of target appearance.
Request for target bounding box for track initialization may be answered by the user if there is not any time constraint for target selection. However, for many real-time applications drawing bounding box around the target manually is inappropriate since target should be marked instantly. Therefore, in real time applications, erroneous input is usually provided by the user due to time restriction. Moreover, system delays are also unavoidable in such systems which also increase possibility of false track initialization. Indeed, erroneous user input results in suboptimal tracking performance and yields premature track losses. If long-term tracking performance is desired, this erroneous input should be compensated. Moreover, even in the case that user provides a perfect bounding box for object, depending on the appearance of the target; this initialization may not always be preferred. For example, if track window is given as in (b) of FIG. 9, it may result in redundant features or deceptive appearance depending on the type of tracker, due to high resemblance between target and background, and may not provide long-term tracking. Hence, in order to achieve longer tracks, the proposed method selects the most salient (distinctive) segment from background as target as illustrated in (c) of FIG. 9.
China patent document CN101329767 discloses an automatic inspection method of a significant object sequence based on studying videos. In the method of the invention, first static significant features then dynamic significant features are calculated and self-adaptively combined according to the space continuity of each image of frame and the time continuity of significant objects in neighboring images. Since this method generally takes several seconds to process an image, it is not appropriate for real-time applications.
United States patent document US2012288189, an application in the state of the art, discloses an image processing method which includes a segmentation step that segments an input image into a plurality of regions by using an automatic segmentation algorithm, and a computation step that calculates a saliency value of one region of the plurality of segmented regions by using a weighted sum of color differences between the one region and all other regions. Accordingly, it is possible to automatically analyze visual saliency regions in an image, and a result of analysis can be used in application areas including significant object detection, object recognition, adaptive image compression, content-aware image resizing, and image retrieval. However, change in image resolution result in change in processing time which may exceed real-time application limits.
United States patent document US2008304740, an application in the state of the art, discloses methods for detecting a salient object in an input image are described. For this, the salient object in an image may be defined using a set of local, regional, and global features including multi-scale contrast, center-surround histogram, and color spatial distribution. These features are optimally combined through conditional random field learning. The learned conditional random field is then used to locate the salient object in the image. The methods can also use image segmentation, where the salient object is separated from the image background. However, obviously it is not proper for the real time usage.
United States patent document US20120294476, an application in the state of the art, discloses methods for detecting a salient object in an input, image are described. To find the salient objects, a computing device determines saliency measures for locations in the image based on costs of composing the locations from parts of the image outside of those locations. In the beginning of the process, input image is segmented into parts then saliency measures are calculated based on appearance and spatial distances for locations defined by sliding windows. In conclusion, this system cannot be proper for the real time usage.
The PCT application document 8058-143 is also intended to achieve the same goal by using center-surround histogram differences and a suboptimal thresholding methodology together with the same saliency map generation. Although both methodologies use geodesic saliency as discrimination measure, the present invention uses completely different initial window selection mechanism together with an optimal thresholding methodology that yields superior track initialization performance with improved error compensation and time efficiency. Moreover, initial window correction step is introduced in the present invention which improves the robustness directly together with thermal core alignment step.