A traditional general-purpose TV system often cannot meet users' needs. When demanding a program on the traditional TV system, a user needs to know not only the program name, but also the approximate scenario of the program, or choose the actor or style by the favor of the user. If no appropriate information as searching criteria is input, no results from the online databases are found, or the results found are still not adequate to fully satisfy the demand of the user, resulting in a negative impact on use experience.
With the development of image processing technology, intelligent TV is becoming a trend. There is a growing need to develop a powerful yet intuitive user-interaction control system based on object detection. When a user sends an object (e.g., merchandise) request from a remote control to TV, intelligent TV may find matched objects from one or more online databases and send the requested content (e.g., video, webpage, Wikipedia, shopping information, and so on) to the user. Further, intelligent TV may search both TV channels and Internet based on object detection to find exactly content that the user is looking for, and update the content from push notifications by tracking the user's browsing history. It provides an interactive video experience for the user to browse the objects within the video program.
However, object detection is often a challenging task, especially moving object detection. This task becomes more difficult when detecting complex objects, which poses a higher level of difficulty to the task of moving object detection in video sequences and injects a trade-off between accuracy and detection speed. Complex objects are those that either do not have a rigid form or can appear in a variety of poses. For example, detecting bags is very difficult as bags do not have a definitive shape, can present deformations, can be occluded by hands or arms, and can appear in many poses. In these cases, low-complexity object detectors are not sufficiently powerful, and detection using only motion estimation is not feasible as the bag can be carried by a person, and thus foreground motion estimation would detect the person as well as the bag, not just the bag.
In real-time systems, however, it may be infeasible or impractical to apply high-complexity object detectors to every frame in the input video sequence. That is, the system may not have sufficient computational resources to apply a powerful object detector to every frame and still generate results under the specified computational constraint.
The disclosed methods and systems are directed to solve one or more problems set forth above and other problems.