The present invention relates to systems for tracking the movement of multiple objects within a predefined area.
As camera, micro-electronic and computer system technology continues to advance at a great pace there has been an increasing supply of machine vision systems intended to replace well defined, repetitive vision/recognition tasks which had previously been performed by humans. Early systems were designed to recognize parts moving along assembly lines to aid the manufacturing process. More recently, many inventions have been put forth to recognize humans and there movement. The variability of humans and their clothing as well as the complexity of the backgrounds within which they endeavor has presented a significant challenge to state-of-the-art technology. Considerable attention has been paid to various techniques for discerning the human shape from its background using edge detection techniques that look to remove stationary, i.e. background, information. There are two major factors that effect the success of these various techniques. The first is image resolution, which drives the amount of information and hence detail that is available to the accompanying computer system in order to differentiate foreground from background. Ideally the greater the resolution the better. However as resolution increasing so does the cost of the camera as well as the accompanying computer system. Even more importantly, as resolution increases time to process increases significantly. And as time to process increases the ability for these systems to perform real-time operations is impeded.
The following are seven examples of machine vision systems designed in some way or another to recognize human or object motion from within a predefined area.
In November of 1994 U.S. Pat. No. 5,363,297 entitled Automated Camera-Based Fracking System for Sports Contests issued to Larson et al. This system employed multiple cameras to continuously monitor the area of an ongoing sporting event. Each camera would feed its information to an accompanying computer system for analysis that consisted of extracting the players from the stationary background and thereby tracking their silhouettes. The inventor had anticipated problems when the individual players would collide or otherwise engage each other and hence merging their individual silhouettes. A need was also recognized to initialize the system by first identifying each player as they appeared within the system""s field of view. Larson et al. specified two solutions for these problems. First, they proposed attaching monitors to the tracking system that would be operated by humans who would perform the initial recognition as well as all subsequent re-identifications when the system lost track of a player due to a merging of silhouettes. Secondly, they proposed attaching electronic tracking devices and employing triangulation of received signals in order to identify and track individual players. There are at least four major problems with the Larson patent. First, the amount of digital processing required to perform the player extraction in real time greatly exceeds the cost-effective computer technology of today, let alone 1994. Secondly, in order to perform this extraction, a greater amount of detail would be necessary which would thereby increase the cost of implementation by requiring more cameras and related computer systems. And of course the additional detail would only tend to further slow down the responsiveness of the system. Thirdly, the requirement of one or more operators to initially recognize and then re-identify players is extremely limiting and costly. This requirement essentially made the patent economically impractical for monitoring non-professional youth sporting events, where the system cost including the ongoing cost of the human operator would greatly exceed smaller revenue streams. It should be noted that this operator would more than likely be a parent of one of the youths who would probably be unfamiliar with all of the players and who would more than likely find it stressful to make so many decisions in real time. This approach would also require training and retraining operators, which would also be prohibitive. Fourthly, the type of electronics necessary to track players in real time would have to operate at higher frequencies that would also mean that it would be more expensive, providing a further economic drawback. The fifth major problem is that the system could have a difficult time determining the orientation of the player. For example, while Larson""s invention could detect the direction of an individual""s motion, it could not determine if the person was facing forwards or backwards, let alone if the person""s head was turning in another direction.
In December of 1995, U.S. Pat. No. 5,473,369 entitled Object Tracking Apparatus issued to Keiko Abe. This system was concerned with the actual image processing techniques used to follow an object from frame to frame. The inventor described prior art that compared the images block by block from one frame to the next, where a block is assumed to be one or more pixels of the image. It was pointed out that such systems depended upon time consuming and error prone statistical calculations that were especially susceptible to misinterpretation when the object changed size within the field of view or disappeared altogether. Abe proposed taking the same video frames but first separating them into luminance and color histograms that are then to be compared frame by frame. By comparing the histograms rather than blocks Abe argued that the system would be more accurate and efficient that the block matching systems. However, there are at least five major problems with Abe""s patent. First, the effectiveness and reliability of this technique is highly dependent upon the lighting conditions initially and over time within the field of view being tracked. For instance, if the initial frame was taken under a well-lit condition, the luminance histograms of the object may be ideal. However, when the lighting conditions are poor to begin with, or worse yet change from frame to frame as might happen with sudden burst of ambient light, the luminance histograms will be subject to considerable error. Secondly, relying upon color histograms is equally uncertain due partly to the susceptibility of color detection to lighting conditions, which again may vary from frame to frame, and in part to the potential for object and background blurring when color schemes overlap. A third problem is that Abe""s system does not lend itself to tracking multiple objects that may have similar or identical luminance/color information and which may overlap from frame to frame. A forth problem is discussed in Abe""s specification that indicates a requirement of a human operator to initialize the system by selecting the portion of the video image which contains the object to be tracked, his so-called region designating frame. This requirement would be even more restrictive when consideration is given to multiple object tracking where the objects may go in and out of the field of view of temporarily overlap each other. And finally, a fifth problem is alluded to in the specification where it is expressed as an opportunity of the system to automatically control the pan, tilt and zoom of a camera. In so doing, Abe states that the system is capable of xe2x80x9ccoping with any change in the in the size of the object and which can photograph the target object always in a desirable size, thereby attaining a substantial improvement in terms of the facility with which the apparatus can be used.xe2x80x9d Hence, it is recognized that this method/apparatus is still very resolution dependent similar to the block methods it is attempting to improve upon.
In April of 1997, U.S. Pat. No. 5,617,335 entitled System for and Method of Recognizing and Tracking Target Mark issued to Hashima et al. This invention is attempting to address the problem of determining the three-dimensional coordinates of an object with respect to a tracking camera and processing mechanism, e.g. a robotic ann from a single two-dimensional image. He expresses these coordinates as the attitude and position of a target mark that has been placed upon the object to be tracked. In Hashima""s review of the prior art he lists several existing methods many of which require too many calculations and or have problems with multiple objects and background image noise. He discloses a technique for marking the object to be tracked with a white triangle inside of a black circle. Once the special markings are captured, they are quickly converted into projected histograms in the X and Y directions of the image of the triangle mark after which the centers of gravity as well as the maximum histogram values in the X and Y directions are also determined. All of this information is then collectively used for xe2x80x9cdetermining which of classified and preset attitude patterns the attitude of the triangle of said target mark belongs to based upon the position of the centers of gravity, the maximum histogram values, the X and Y-axis values, and the known geometrical data of said target markxe2x80x9d. Even taking Hashima""s assertion of increased efficiency and accuracy, his technique has at least three major limitations. First, the object to be tracked must be marked in a highly accurate fashion and this mark must be visible to the tracking camera at all times. No provision has been disclosed at to how the object can be tracked if the markings are temporarily blocked from the view of the tracking camera. Secondly, by attempting to determine three dimension information from a single two dimensional image, Hashima is focusing his solution on situations were additional perspective cameras may not be available. Given such additional cameras, there exist even more efficient and accurate methods for determining the third dimension. Thirdly, this invention teaches of a system which functions well xe2x80x9ceven when an image either contains many objects or has a lot of noisesxe2x80x9d. However, if every one of these multiple objects needed to be tracked within the same image, Hashima""s invention would not be at optimal performance since at any given time the preferred orientation of camera to object cannot be simultaneously maintained for multiple objects scattered in three dimensions.
In March of 1998, U.S. Pat. No. 5,731,785 entitled System and Method for Locating Objects Including an Inhibiting Feature issued to Lemelson et al. This invention teaches the tracking of objects by xe2x80x9can electronic code generating system or device carried by the object in a portable housingxe2x80x9d. This xe2x80x9csystem or devicexe2x80x9d is specified to receive locating signals such as from the GPS constellation or a ground based triangulation setup. It then uses these signals to determine it""s own location. Lemelson anticipates that at some point in time operators of a remote tracking system may be interested in the exact whereabouts of one individual object from amongst the multiplicity of objects that are housing such tracking devices. In order to determine the objects location, the tracking system will first transmit a unique xe2x80x9cinquiry signalxe2x80x9d coded for one particular device in one particular object. All of the individual tracking devices will then receive this signal but only the one device whose identifying code matches the xe2x80x9cinquiry signalxe2x80x9d will respond. This response is in the form of a transmission that includes the tracking devices currently determined location. The tracking system then receives this signal and displays on a computer system monitor related information about the identified/located object. Lemelson et al.""s invention is primarily applicable to the tracking of many objects over a very wide area, so wide that these objects are out of range of any reasonably sized camera tracking system. As an apparatus and method for tracking objects that are within a range suitable for a camera network, this invention has at least three major problems. First, it requires that each object have the capability to constantly monitor and track its own location. Such a requirement involves the use of a computing device which must be set up to receive GPS or other tracking signals and also transmit locating signals. Such a device will typically take up more space than a marking or indicia that can be placed on an object and then tracked with a camera. Furthermore, this device will require power. Secondly, Lemelson et al.""s invention assumes that the remote tracking station is only interested in one or a smaller fraction of all potential objects at a given time. However, there are many situations when it is desirous to follow the exact and continuous movements of all tracked objects as they move about within a predefined area. Whereas it is conceivable that this invention could constantly transmit xe2x80x9cinquiry signalsxe2x80x9d for all objects and constantly receive locating signal responses, it is anticipated that this amount of information would unacceptably limit the movement resolution of the system. Thirdly, such an electronic based system has no understanding of an object""s orientation with respect to it""s direction of movement. Hence, while it is possible to determine the direction a car or person being tracked is moving, it is not shown how the system could determine if that same car or person was facing or turned away from it""s current direction of travel.
In June of 1998, U.S. Pat. No. 5,768,151 entitled System for Determining the Trajectory of an Object in a Sports Simulator issued to Lowery et al. This invention teaches the use of stereoscopic cameras trained upon a limited field of view to follow the trajectory of an anticipated object. As the object transverses the field of view the cameras capture images at a suitably slower rate such that the object creates a blur as it moves. This blurred path is then analyzed and converted into the object""s trajectory vectors within the field of view. Another key means of Lowery et al.""s apparatus is its ability to determine when it should begin tracking. As such a sound detecting device is specified to sense the presence of the object within the field of view after which the image capture system is then immediately activated. There are at least four major limitations with Lowery et al.""s invention that would hinder its broader applicability. First, this invention expects a very narrow range of motion of the object and as such has a significantly restricted field of view. If the concept were to be applied to a larger area then multiple perspective cameras would need to be employed. The system would then also need to determine which cameras should be activated once the object is detected to be traveling within the tracking region. However, without actually first determining where the object is located, say for instance by attempting to triangulate the sound emitted by the object, the system would have no idea which cameras to activate. Hence all cameras would need to capture images creating a very large set of data that would need to be parsed by the tracking computer in order to determine the location of the object. The second limitation is that this system does not attempt to uniquely identify each object that it senses. Hence, while it is capable of determining a trajectory vector for individual objects, it does not anticipate a need for, nor disclose a method of determining the unique identity of each object as it passes by. The third limitation deals with the ability to track multiple objects simultaneously within this same field of view. Since this invention anticipates only one object at a time, it is merely determining trajectory vectors, and not the object""s identity. Hence if two or more objects are traveling throughout the tracking region and they collide in such a way as to effect each other""s path of travel, then the system will be left to determine which object continued on which path after the merge event. The forth limitation deals with the systems inability to pick up the object from its background when there is insufficient color and or luminescence difference between the two.
All of the above listed prior art in one way or another was attempting to track the movement of at least one object within a predefined area. When taken in combination, their limitations that must be overcome in total are as follows:
1xe2x80x94If the tracking system attempts to differentiate between the object and its background purely on the basis of pixel by pixel comparison as does Larson et al., then the video image must have higher resolution to be accurate and the resulting computer processing time prohibits real time operation.
2xe2x80x94If the tracking system attempts to reduce processing time by performing averaging techniques based upon separated color and luminescence information as does Abe, then accuracy is compromised especially as colors merge between multiple objects and their background or lighting conditions fluctuate substantially between image frames. Such reduction techniques are further hampered as object size diminishes which essentially reduces the amount of object versus background information thereby increasing xe2x80x9cnoisexe2x80x9d. The only solution is to zoom in on the object being tracking to keep the proper ratio of object to background information. This then implies that each object being tracked must have its own camera thereby greatly reducing the effectiveness of these techniques for tracking either more objects and/or greater fields of view.
3xe2x80x94If the tracking system such as Lowery et al.""s employs two perspective cameras and an image blurring technique to capture three-dimensional trajectory information it reduces image processing requirements but looses important video image detail.
4xe2x80x94If the tracking system such as Hashima et al.""s employs detailed indicia placed upon the object to be tracked, this can be effective to reduce the amount of image processing. However, Hashima faces significant issues when trying to determine three-dimensional information from a single two-dimensional image, which is one of his mandatory requirements. His resultant techniques preclude the tracking of multiple fast moving objects over wider fields of view where the object""s indicia may at times be blocked from view or at least be at significantly changing perspectives from the tracking camera.
5xe2x80x94All of the video/camera only based techniques such as Larson, Abe, Hashima and Lowery are prone to error if they were to track multiple objects whose paths would intersect and/or collide. Only Larson specifically anticipates this type of multiple object tracking and suggests the use of human operators to resolve object overlap. Such operators are cost prohibitive and also limited in their capacity to keep up with multiple fast moving objects in real time. While, as Larson suggests, it is possible to use passive electronics to help identify objects once the system determines their identities have been lost, these devices will have their own resolution/speed of response restrictions which are cost sensitive.
6xe2x80x94Furthermore, both Larson and Abe""s video/camera solutions anticipate the requirement of a human operator to initialize the system. Larson would require the operator to identify each object for the system. These objects would then be automatically tracked until they merged in some way with another object at which time the operator would be needed to re-initialize the tracking system. Abe would require the operator to crop the initial image down to a xe2x80x9cregion designating framexe2x80x9d which essentially reduces the processing requirements to at least find if not also track the object. The intervention of any operator is both cost prohibitive and real-time-response limiting.
7xe2x80x94Lowery""s video/camera solution anticipates automatic tracking activation based upon the sound detected presence of an object within the field of view. This technique is inherently limited to objects that make distinguishing sounds. It is also unable to track multiple objects that might be making similar noises simultaneously within the given field of view.
8xe2x80x94If the tracking system attempts to eliminate image processing requirements by employing active electronic tracking devices such as Lemelson et al., then the objects are required to house powered devices capable of receiving and processing both locating and inquiry signals. Such devices limit the range and type of objects that can be tracked based upon the practicality and cost of embedding computing devices. In systems such as Lemelson""s that employ electronics alone, the wealth of information available from image data is lost. Furthermore, such systems may be able to track location but cannot track object orientation, e.g. is the object traveling forwards or backwards.
9xe2x80x94With the exception of Hashima""s indicia technique, all of these solutions are still not capturing object orientation information. Such information can be extremely important to anticipate future object movement.
10xe2x80x94All of the video/camera based solutions will have difficulty picking up fast moving objects whose color and or luminescence information is sufficiently close to that of other tracked objects or the image background no matter what technique is employed. All non-video based solutions will give up valuable image information.
While the present invention will be specified in reference to one particular example of multi-object tracking as will be described forthwith, this specification should not be construed as a limitation on the scope of the invention, but rather as an exemplification of preferred embodiments thereof. The inventors envision many related uses of the apparatus and methods herein disclosed only some of which will be mentioned in the conclusion to this applications specification. For purposes of teaching the novel aspects of this invention, the example of multi-object tracking is that of a sporting event such as hockey. The particular aspects of hockey which make it a difficult series of events to track and therefore a good example of the strengths of the present invention over the prior art are as follows:
1xe2x80x94There are no other human based activities know to the present inventors where the humans as objects can travel at a faster speed or change directions and orientation more quickly than hockey. On skates a player""s speed can approach twenty-five miles per hour, which is considerably faster than any activity involving walking or running that is still conducted on the ground without the aid of a vehicle of some sort. Tracking these faster movements, especially given the variability of the human form, challenges the real time performance aspects of the system.
2xe2x80x94The speed of the object being contested by the players, i.e. the puck, can travel at rates of up to one hundred miles per hour and is also subject to sudden and quick changes of direction. This combination of speed and re-direction presents a difficult tracking problem and is unique in athletics. Tracking the puck is easier than tracking player when considering that the puck will not change shape yet the puck travels at roughly four times the speed, is on the magnitude of one hundred times as small and may travel in three dimensions.
3xe2x80x94The individual players are constantly entering and exiting the tracking field of view and as such must be efficiently and automatically identified by the tracking system in order for real time performance.
4xe2x80x94While in the field of view, both the puck and the players are routinely either fully or partially hidden from view as they merge with the paths of other players. This creates a challenge to follow movements with often limited or no image data.
5xe2x80x94The lighting conditions are difficult to work with since the ice surface will create a highly reflective background that could tend to saturate the CCD elements of the camera while the area itself may be subject to sudden bursts of light from either spectator""s camera flashes or in-house lighting systems. This places limitations on luminescence based tracking techniques.
6xe2x80x94The colors of the players on the same team are identical and may often match the markings on the ice surface and surrounding rink boards. This places limitations on color based tracking techniques.
7xe2x80x94It is not unusual for a hockey game to be played while a certain level of fog exists within the arena. This challenges any camera-based system since it could greatly reduce visibility of the players and puck.
8xe2x80x94Hockey is a filmed event and as such it presents the opportunity not only to track the movement of multiple objects but also to determine a center of interest which is constantly and abruptly changing. Once this center is determined, there is a further advantage to automatically direct the tilt, pan and zoom of a broadcaster""s camera to follow the action from a perspective view in real time. Automatically directing a camera that is situated for a perspective view presents a difficult problem for a machine vision system since it is considerably harder to follow the objects in three dimensions in real time as a perspective view would require.
9xe2x80x94Each individual player as well as the coaches in a game may at any time be instantly desirous of obtaining information regarding themselves, a group of players, their entire team and/or the other team. This information may furthermore pertain to the entire or some sub-set of the duration of the activities from start to present. Such requirements place a demand on the tracking system to quickly and efficiently store information in a way that it may easily be recalled from many viewpoints.
10xe2x80x94The enclosed metal and cement block arena precludes the use of GPS and presents difficulties for the use of passive electronic tracking devices due to the many potential reflections of in-house triangulation signals. The players themselves and the nature of the game and its potential for significant high impact collisions limit the desirability of placing active electronic devices within their equipment. Since these devices must carry a power source they will in practice take up enough space to present a potential hazard to the players. Furthermore, such devices would be extremely cost prohibitive at the local rink level where literally hundreds of children are playing games every week and would each need their own devices or to share devices.
11xe2x80x94Player orientation and location of limbs with respect to body are very important information to track. A player may be traveling forward facing forward while they are turning their head to the side to view the development of play. The turn of the head becomes important information for coaching analysis since it is an indication of a player""s on-ice awareness. Furthermore, a play may almost instantly rotate their orientation while still proceeding in the same direction such that they are now traveling backwards instead of forwards. This is also very critical information. A player""s head may be tilted down towards the ice, which is not desirable for prolonged periods of time or if often repeated. All of this information is important to track but presents problems over and above simply tracking the player as a whole.
12xe2x80x94Limiting the size of the area of tracking is desirable during practice sessions where individual drills may be conducted on a limited portion of the ice with a small number of players at a time. Under these conditions it would be desirable to easily restrict the system""s tracking area within its field of view.
13xe2x80x94The number and speed of player changes and collisions is so great that utilizing human intervention to identify and re-identify players would be significantly stressful and error prone, if not practically impossible, especially at the local rink level.
Given the current state of the art in camera systems, non-visible energy sources and filters, digital image processing and automated camera controls it is possible to create an entirely automated multi-object tracking system which operates within a predefined area and tracks the constant location, orientation and direction of movement of each and every object within the field of view. Such a system greatly increases the ability of the participants and observers to understand, analyze and enjoy the given activity.
The current state of the art for transmitting live information concerning the movements of individual players and equipment within a sporting contest is a based upon traditional methods for filming the event followed by communication via either broad band, cable or direct satellite systems. Such techniques have two major problems. First, the various methods for filming the event limit the information gathered to the angles and perspectives chosen by the producers. Second, the data stream for these techniques is very large, whether it is stored in either an analog or digital format, although the digital formats can be compressed using standard compression algorithms. These two problems counteract each other in that any attempt to film sporting events from a multiplicity of viewpoints that could later be interactively selected by the end user in their remote location would linearly multiply the size of the data stream. This increased stream would subsequently exceed even the transmission capacities of cable and direct satellite. A third problem exists in that the information that is collected at the sports venue is not first analyzed and then transmitted with encoded markers and other information which can be employed by the remote receiving system to further enhance the viewing.
Given the state of the art in communications over the network of phone lines used by the World Wide Web (WWW) it is now possible to simultaneously transmit to a large number of connected remote sites a limited amount of information in real time. Further, given the state of the art in computer system processing speeds and computer generated animation, it is now becoming possible to generate xe2x80x9clife-likexe2x80x9d real-time images that may be interacted with by the end user. The present invention employs sophisticated tracking techniques such as previously described in the primary embodiment to translate and encode in real time all of the motions of a player and their equipment in a sporting contest. This encoded information requires much less bandwidth and could then be transmitted over a bandwidth limited communications channel such as the WWW and regenerated locally via graphics animation. Such a technique further allows the end user to dynamically alter and select their preferred viewing angle throughout the entire presentation. Such choices for viewing angle will also include the first person effect of experiencing the play from a specific player""s perspective.
Employing the techniques taught in the primary embodiment, this continued embodiment specifically describes how the motions that are to be tracked by both the overhead X-Y tracking as well as the Z perspective cameras are to be translated into a streaming data set. It will be shown that this data set is compact enough to be transmitted over the smaller bandwidth phone lines in real time. It will be further shown that using this same data set the end user""s computer system will be able to regenerate the pertinent player activity in a life like fashion in real time. During this continuous regeneration, the user will be able to constantly change and select new view angles to gain further clarity and enjoyment. Also, given the quantification of motion techniques employed by this continued embodiment, the translating end user computer system will also be able to create statistics regarding the players"" motion which heretofore have been unavailable.
While the present invention will be specified in reference to one particular example of transmitting live action as will be described forthwith, this specification should not be construed as a limitation on the scope of the invention, but rather as an exemplification of the preferred embodiments thereof. The inventors envision many related uses of the apparatus and methods herein disclosed, only some of which will be mentioned in the conclusion to this application""s specification. Given the current state of the art in internet connectivity, personal computer (PC) based graphics generation as well as the teachings of the primary embodiment that is the basis for this continued application, it is now possible to capture the movements of players, equipment and the game object in such a way that it may be compressed into a minimal data set of related points that may then be transmitted to a remote computer system and reassembled into a life-like representation of the captured event.
Accordingly, the objects and advantages of the present invention are to provide a system for capturing and quantifying the stream of motion of a real-time event into a compressed data stream to multiple remote sites and then decompressing and reconstructing this same steam of motion in real time from any selected view point with the following capabilities:
to provide a""system for capturing the three dimensional relative locations of selected points of interest on each participant, equipment and/or game object during the live event without human assistance;
to provide a system for automatically translating these selected points of interest into a compressed data set;
to provide a system for automatically and selectively transmitting this compressed data set in real time over the any bandwidth limited communications network, an example being the internet;
to provide a system for automatically receiving and then selectively decompressing the transmitted data set into a life-like representation of the original event; and
to provide a system whereby the end user can dynamically change their point of view to any fixed or moving position within the virtual event field-of-play and where all of the previously mentioned capabilities are conducted in real-time.