There are many situations in which it is desirable to try to extract information from data captured during live action events, such as sporting events, to provide further insight into what is taking place. Such information can be used by broadcasters to enhance a viewing experience provided to viewers watching the live action event. Techniques that provide the extraction of information from captured data include, for example, three dimensional tracking of where a ball landed in relation to the line on a tennis court to determine whether the ball should be called in or out. Another well known example is extrapolating a projected path of a ball which has struck a batsman on his pads in a game of cricket in order to determine if he should be given out leg before wicket.
Another approach is to process video images of a scene to identify objects such as human beings within a scene. In many cases, such as televised sporting events, processing video images in this way can be more convenient, because the video images are already available. However, extracting information from video images is difficult, firstly because the data is captured in only two dimensions by a camera and secondly because the processing of the video images to extract desired information can be computationally intensive and error prone, because objects or players must be recognised from a low or variable resolution representation, due to higher resolution images being provided for images captured near the camera and lower resolution images being captured further from the camera. Furthermore, a high degree of variability in the nature of the movement of humans, makes recognition of players difficult. Other image processing techniques require many cameras to be available in order to capture video images of a subject from several different angles. In Moeslund et al. 2006, “A survey of advances in vision-based human motion capture and analysis”, a review of academic literature is presented which examines the available techniques for estimating human motion from captured image data. As discussed in this paper, most techniques require controlled studio capture conditions, high-resolution imagery, multiple cameras (typically at least four) and have very high computational requirements.