Manipulation of digitized video images, both still pictures and moving video presentations, is an important aspect of the present trend toward the introduction of "multimedia" into many aspects of our lives, as well as in modern aspects of more traditional endeavors such as, for example, the creation of motion pictures. A copending U.S. patent application Ser. No. 08/146,964, having an inventor in common with this present invention, teaches a method for converting a conventional "moving picture" video into a computer/user interface means. In accomplishing the method of that previous invention, it is necessary to identify, within the video presentation, particular objects of concern. As discussed in the above referenced disclosure, such identification can be quite laborious, and it was anticipated that methods for transferring some of that labor from the human operator to the computer might be developed in the future.
It was disclosed that the designation of "hot spots", consisting of objects within a moving video, were, ". . . accomplished by viewing each key frame and, at least until a more automated system is developed therefor, manually designating which, if any, objects or items of interest in the key frame are to be designated as the hot spots." (Reference numbers relating to the prior designation have been deleted in this quotation.) This present application is directed to a method and means for automating the identification of such objects and maintaining such identification through time. Although the present inventive method is intended to be used in conjunction with the production of interactive computer interface systems, it is not restricted to such applications.
An object in animated, or other specially prepared moving video images, can be rather easily identified, since such object can be created according to a specific easily distinguishable criterion (such as color, or the like) or, indeed, the pixel location of the object can be made a part of the data which describes the object within the computer even as the object is created. However, objects within a live action video, which video has not been specially produced nor specially prepared, cannot be so easily segregated.
Prior art methods for identifying objects in a video image, such that the object is defined according to computer understandable criteria, have included identifying edges, colors or color patterns and/or brightness levels which define the object. Such methods have been relatively effective for the identification and/or manipulation of still video images. For example, an object can be distinguished by known methods for automatically defining the outer edges thereof, and the object can then be operated upon. As examples of such operations, the object can be moved within the image, removed from the image, or changed in color, luminosity, or the like. More in the context of the present invention, the object could even then, once the image is defined in terms of a bit map, be used in the manner of an icon or a "hot spot", such that clicking on the area of the image within the object could cause the computer to initiate a response or further interaction with the end user. It should be remembered, however, that this sort of procedure does not transfer well into the realm of moving video images. Firstly, keeping track of the location of objects within a moving video image by storing a bit map of all such objects for all frames of the moving image would require a morass of data which would tax a computer's data storage capacity and slow down the operation of the computer. Secondly, although the amount of user interaction and labor required to perform the above described operations is well tolerable when working with a single still video image, an attempt to repeat such an operation thirty or so times for each second of a moving video would quickly reveal that this method is outside the realm of practicality.
One accepted method for separating objects within a moving video image has been based upon the color of portions of the image. One skilled in the art will be familiar with the "blue screen " method wherein portions of an image which are of a specific color (often, but not necessarily, blue) can be selectively removed from an image. This technique was used in television prior to the advent of digital image manipulation, and has been found to work well also when applied to the field of digital image manipulation. While this method works well for its intended purpose, it will generally only successfully distinguish, for example, a background from the foreground object(s). Furthermore, it requires a special setting in that the object(s) of concern must be originally "shot" (meaning photographed, video taped, or the like) against the special background color. Most importantly, although the background is distinguished from the foreground objects such that a computer can calculate the location of the objects in order to perform operations thereon (such as overlaying the objects upon a different background), different objects are usually not sufficiently identifiable in terms usable by the computer such that the objects can serve as means for computer/user interaction. Moreover, even in those special situations in which a video scene can be shot in front of a blue background or the like, and even in those unusual instances wherein there may be only a single object in the foreground such that there will be no confusion between objects, such prior art solutions do not address the problem of extensive data storage requirements and drain on computation resources, as discussed above.
Methods for identification of edges or complex color patterns within a video image are more effective for segregating specific "real world" portions of a video image, as compared to the more artificially induced background "blue screen" methods. However, such edge or color identification methods generally require relatively sophisticated computer analysis, and so are not suitable for real time image tracking, at least unless a great deal of expensive computing power is dedicated to such tracking. Even where a practically unlimited quantity of computer resources are available to the task, attempting to track moving objects within a video image according to such complex criteria has proven to be undesirably complicated. Where more than one object is to be tracked within the video, or where the objects are rapidly moving and/or changing relative shape within the video, the problems associated with such methods are exacerbated.
It has been brought to the inventor's attention that several prominent manufacturers of computer products have a need for a better means and/or method for identifying moving objects within video images such that the objects may be followed by a computer, in order to implement their own products. However, in spite of the fact that some of these companies have extensive research budgets and large and experienced research staffs, they have turned to the present inventor seeking a solution to this known problem.
To the inventor's knowledge, no workable method has existed in the prior art for quickly and easily identifying, for computer tracking and manipulation, objects within moving video images which is inexpensive and easy to implement and reliable. All prior art methods have either been extremely labor intensive and/or have required an inordinate amount of computing power to implement (or, even worse, have required an inordinate amount of computing power for an end user to utilize the product of such methods) and/or have not reliably identified objects such that a computer can track the objects within a video presentation without "losing" the objects or confusing them with the backgrounds or other objects in the video.