This invention relates to the derivation of information regarding the position of a television camera from image data acquired by the camera.
In television production, it is often required to video live action in the studio and electronically superimpose the action on a background image. This is usually done by shooting the action in front of a blue background and generating a xe2x80x98keyxe2x80x99 from the video signal to distinguish between foreground and background. In the background areas, the chosen background image can be electronically inserted.
One limitation to this technique is that the camera in the studio cannot move, since this would generate motion of the foreground without commensurate background movement. One way of allowing the camera to move is to use a robotic camera mounting that allows a predefined camera motion to be executed, the same camera motion being used when the background images are shot. However the need for predefined motion places severe artistic limitations on the production process.
Techniques are currently under development that aim to be able to generate electronically background images that can be changed as the camera is moved so that they are appropriate to the present camera position. Thus a means of measuring the position of the camera in the studio is required. One way in which this can be done is to attach sensors to the camera to determine its position and angle of view; however the use of such sensors is not always practical.
The problem being addressed here is a method to derive the position and motion of the camera using only the video signal from the camera. Thus it can be used on an unmodified camera without special sensors.
The derivation of the position and motion of a camera by analysis of its image signal is a task often referred to as passive navigation; there are many examples of approaches to this problem in the literature, the more pertinent of which are as follows:
1. Brandt et al. 1990. Recursive motion estimation based on a model of the camera dynamics.
2.  1. Brandt, A., Karmann, K., Lanser, S. Signal Processing V: Theories and Applications (Ed. Torres, L. et al.), Elsevir, pp. 959-962, 1990.
3.  2. Buxton et al 1985 Machine perception of visual motion. Buxton, B. F., Buxton, H., Murray, D. W., Williams, N. S. GEC Journal of Research, Vol. 3 No. 3, pp. 145-161.
4.  3. Netravali and Robbins 1979 Motion-compensated television coding: Part 1. Netravali, A. N., Robbins, J. D. Bell System Technical Journal Vol. 58, No. 3, Mar. 1979, pp. 631-670.
5.  4. Thomas 1987 Television motion measurement for DATV and other applications. Thomas, G. A. BBC Research Department Report No. 1987/11.
6.  5. Uomori et al. 1992 Electronic image stabilisation system for video cameras and VCRs. Uomori, K., Morimura, A., Ishii, J. SMPTE Journal, Vol. 101 No. 2, pp. 66-75, Feb. 1992.
7.  6. Wu and Kittel  Kittler 1990 Wu, S. F., Kittel, J. 1990 . A differential method for simultaneous estimation of rotation, change of scale and translation. Signal Processing: Image Communication 2, Elsevier, 1990, pp. 69-80.
For example, if a number of feature points can be identified in the image and their motion tracked from frame to frame, it is possible to calculate the motion of the camera relative to these points by solving a number of non-linear simultaneous equations [Buxton et al. 1985]. The tracking of feature points is often achieved by measuring the optical flow (motion) field of the image. This can be done in a number of ways, for example by using an algorithm based on measurements of the spariotemporal luminance gradient of the image [Netraveli  Netravali and Robbins 1979].
A similar method is to use Kalman filtering techniques to estimate the camera motion parameters from the optical flow field and depth information [Brandt et al; 1990].
However, in order to obtain reliable (relative noise-free) information relating to the motion of the camera, it is necessary to have a good number of feature points visible at all times, and for these to be distributed in space in an appropriate manner. For example, if all points are at a relatively large distance from the camera, the effect of a camera pan (rotation of the camera about the vertical axis) will appear very similar to that of a horizontal translation at right angles to the direction of view. Points at a range of depth are thus required to distinguish reliably between these types of motion.
Simpler algorithms exist that allow a sub-set of camera motion parameters to be determined, while placing less constraints on the scene content. For example, measurement of horizontal and vertical image motions such a those caused by camera panning and tilting can be measured relatively simply for applications such as the steadying of images in hand-held cameras [Uomori et al. 1992].
In order to derive all required camera parameters (three spatial coordinates, pan and tilt angles and degree of zoom) from analysis of the camera images, a large number of points in the image would have to be identified and tracked. Consideration of the operational constraints in a TV studio suggested that providing an appropriate number of well-distributed reference points in the image would be impractical: markers would have to be placed throughout the scene at a range of different depths in such a way that at a significant number were always visible, regardless of the position of the camera or actors.
We have appreciated that measurements of image translation and scale change are relatively easy to make; from these measurements it is easy to calculate either
1. pan, tilt and zoom under the assumption that the camera is mounted on a fixed tripod: the scale change is a direct indication of the amount by which the degree of camera zoom has changed, and the horizontal and vertical translation indicate the change in pan and tilt angles; or
2. horizontal and vertical movement under the assumption that the camera is mounted in such a way that it can move in three dimensions (but cannot pan or tilt) and is looking in a direction normal to a planar background: the scale change indicates the distance the camera has moved along the optical axis and the image translation indicates how far the camera has moved normal to this axis.
This approach does not require special markers or feature points in the image, merely sufficient detail to allow simple estimation of global motion parameters. Thus it should be able to work with a wide range of picture material. All that is required is measurement of the initial focal length (or angle subtended by the field of view) and the initial position and angle of view of the camera.
The invention is defined by the independent claims to which reference should be made. Preferred features are set out in the dependent claims.
The approach described may be extended to more general situations (giving more freedom on the type of camera motion allowed) if other information such as image depth could be derived [Brandi et al. 1990]. Additional information from some sensors on the camera (for example to measure the degree of zoom) may allow more flexibility.
In order to allow the translation and scale change of the image to be measured, there must be sufficient detail present in the background of the image. Current practice is usually based upon the use of a blue screen background, to allow a key signal to be generated by analysing the RGB values of the video signal. Clearly, a plain blue screen cannot be used if camera motion information is to be derived from the image, since it contains no detail. Thus it will be necessary to use a background that contains markings of some sort, but is still of a suitable form to allow a key signal to be generated.
One form of background that is being considered is a xe2x80x98checkerboardxe2x80x99 of squares of two similar shades of blue, each closely resembling the blue colour used at present. This should allow present keying techniques to be used, while providing sufficient detail to allow optical flow measurements to be made. Such measurements could be made on a signal derived from an appropriate weighted sum of RGB values designed to accentuate the differences between the shades of blue.
The key signal may be used to remove foreground objects from the image prior to the motion estimation process. Thus the motion of foreground objects will not confuse the calculation.