An increasingly common use of computers and computerized devices is the processing of video, such as video captured in real-time, or video captured or otherwise input from a storage, such as a hard disk drive, a digital video disc (DVD), a video cassette recorder (VCR) tape, etc. For the processing of video, objects within the video usually need to be extracted. Objects can correspond to, for example, semantic objects, which are objects as defined perceptually by the viewer. For example, a video of a baseball game may have as its objects the various players on the field, the baseball after it is thrown or hit, etc. Object extraction is useful for object-based coding techniques, such as MPEG-4, as known within the art; for content-based visual database query and indexing applications, such as MPEG-7, as also known within the art; for the processing of objects in video sequences; etc.
Prior art object extraction techniques generally fall into one of two categories: automatic extraction and semi-automatic extraction. Automatic extraction is relatively easy for the end user to perform, since he or she needs to provide little or no input for the objects to be extracted. Automatic extraction is also useful in real-time processing of video, where user input cannot be feasibly provided in real time. The primary disadvantage to automatic extraction, however, is that as performed within the prior art the objects are not defined precisely. That is, only rough contours of objects are identified. For example, parts of the background may be included in the definition of a given object.
Conversely, semi-automatic object extraction from video requires user input. Such user input can provide the exact contours of objects, for example, so that the objects are defined more precisely as compared to prior art automatic object extraction. The disadvantage to semi-automatic extraction, however, is that user input is in fact necessary. For the lay user, this may be at best inconvenient, and at most infeasible in the case where the user is not proficient in video applications and does not know how to provide the necessary optimal input. Furthermore, semi-automatic extraction is ill-suited for real-time processing of video, even where a user is proficient, since typically the user cannot identify objects in real time.
Therefore, there is a need to combine the advantages of automatic and semi-automatic video object extraction techniques. That is, there is a need to combine the advantageous precise definitions afforded objects by semi-automatic techniques, with the advantageous ability to perform the object extraction in real-time, as is allowed with automatic techniques. For these and other reasons, there is a need for the present invention.