1. Field of the Invention
This invention relates to motion vector estimation in television images. Such motion vector estimation is particularly, but not exclusively, used in television standards converters and in slow motion processors.
2. Description of the Prior Art
International television program exchange necessitates standards converters due to the different television standards used in different countries, for example, the 625-line 50-fields per second (625/50) PAL system used in the U.K., and the 525-line 60-fields per second (525/60) NTSC system used in the U.S.A.
Many different standards converters have been previously proposed. One of the best known is the ACE (Advanced Conversion Equipment) developed by the British Broadcasting Corporation. Basically ACE operates on an input digital television signal line-by-line to derive interpolated samples required to form an output digital television signal. Interpolation is done not only spatially using four successive horizontal scan lines of the input television signal, but also temporally using four successive fields of the input television signal. Thus, each line of the output television signal is derived by multiplying respective samples from sixteen lines of the input television signal by respective weighting coefficients.
Further details of ACE will be found in U.K. patent specification No. GB-A-2 059 712 and in `Four-field digital standards converter for the eighties` by R. N. Robinson and G. J. Cooper at Pages 11 to 13 of `Television` (the journal of the Royal Television Society) for January/February 1982.
Although ACE gives good results, there is the problem that the equipment is very bulky. To overcome this problem, we have previously proposed a television standards converter comprising three field stores and four 4-line stores for receiving an input digital television signal of one standard and deriving therefrom arrays of sixteen lines, each array consisting of four successive lines from each of four successive fields of the input television signal. A weighting coefficient store stores sets of sixteen weighting coefficients, respective sets corresponding to positions both spatial and temporal of respective lines of an output digital television signal of a different standard, relative to the sixteen lines of the input television signal. Two interpolation filters then derive line by-line the output television signal by multiplying corresponding sample values from each of the sixteen lines of the input television signal by a respective weighting coefficient in a set of weighting coefficients and sum the resulting products to form an interpolated sample value, and four output field stores receive and store the derived lines of the output television signal. To store the additional lines which are derived when the output television signal has more lines than the input television signal, a 45-line store is interposed between one of the interpolation filters and the output field stores. Further details will be found in our U.K. patent specification No. GB-A-2 140 644.
The performance of such standards converters which employ vertical/temporal interpolation techniques represents a compromise between generating blurred pictures while maintaining good motion portrayal and maintaining vertical resolution but at the expense of `judder`. The former is a result of post filtering in order to prevent disturbing alias effects; the latter is a result of the intrusion of the adjacent 2-dimensional repeat sample structures.
We have therefore proposed that motion vector estimation should be incorporated in television standards converters and in slow motion processors. The problem with the majority of existing motion vector estimation methods is that their use is biased towards video conference type applications where generally the subject matter is either a single person's head and shoulders or a small group of people seated around a table. With television images of this type the motion is relatively simple in comparison with broadcast television images where for example at a horse race meeting the camera could be following the leaders in a race. In this situation the motion would be complex, for example, because the camera would be panning. Thus, the background may well be moving at speeds greater than eight pixels per field, while in the foreground there would be at least one horse galloping. This means that the motion vector estimation method must try to track the horses legs, which may well be moving in different directions to that of the already moving background.