The present invention relates to processing and recovering video images, each constituted by a matrix of pixels and represented by a signal which usually originated from a camera, but which could be a recorded signal.
An object of the invention is to provide a simple solution to the problem of recovering and displaying a selected fraction of an original video image, optionally together with a zoom effect.
Current practice for shooting a scene in which the center of interest moves consists in using a camera under the control of an operator who varies the direction in which the camera points, and also its degree of zoom and its focus so as to provide a xe2x80x9cbestxe2x80x9d image, which is then broadcast and displayed as such, except where the image needs to be cropped in order to display a 16/9 format image on a 4/3 format TV screen (where format is expressed as a ratio of width/height). The only modifications that might be made are inlaying subtitles or symbols, with the background image remaining unchanged.
The present invention makes use of the fact that presently available techniques of resampling and interpolation make it possible to take a digital video image of the kind supplied by a high definition camera and to obtain therefrom a representation of sufficient quality comprising only a fraction of the actual image, which representation is in the form of a matrix of pixels displayed at a spatial definition greater than the original definition in said fraction of the image. This can be done using techniques referred to generically by the term xe2x80x9cwarpingxe2x80x9d.
Consequently, there is provided a method of displaying a sequence of video images in the form of a matrix of pixels from a signal obtained from at least one fixed large-field camera having a definition that is generally greater than that of a unit for displaying a selected portion. A zone of format corresponding to that of the display member (generally 4/3) is selected dynamically from the field of the camera; and the pixels of an image of the size and definition of the display member are generated by interpolation from the pixels in the selected zone of the image supplied by the camera.
Implementation of the method may be rendered easier by using the fact since successive images of a video sequence present a high degree of natural correlation since the camera is fixed; implementation of compression techniques to reduce the data rate required to represent a very high definition image is then easier.
When using a single fixed camera, the camera is typically given a focal length such that the whole scene appears in its field. That leads to a focal length that is short, thereby making focusing easier, and generally making it possible to keep the focus setting permanently unchanged.
The zone of interest or selected portion can be selected by designating the location and the size of the zone or xe2x80x9cwindowxe2x80x9d to be displayed. Selection can be performed simply in the control room by means of a joystick or mouse type control unit for pointing to a corner of the window and also having a control member such as a set of pushbuttons for controlling zoom.
Instead of a single large field camera, it is possible to use a plurality of cameras having individual fields that present overlap so as to limit edge effects. The images from the cameras are merged in the control room. The cameras can be focused at different distances to take account of their distances from the scene at which they are pointing.
The invention can be implemented in highly flexible manner. As explained below, the window can be selected in the control room during a live broadcast from a studio; otherwise, it can be performed by a cable network supplier or by an Internet access supplier receiving the original signal over the air, by wire, or from a satellite. In all cases, there is no need for an. operator to be present behind the camera. Or at least the operator need do no more than set up the camera""s initial adjustment and possibly change camera focus.
The invention can be applied to images originating from:
a high definition 16/9 camera as presently available, which supplies a 1920xc3x971152 matrix of pixels making it possible to extract directly a window of 720xc3x97576 pixels (or some other window) suitable for displaying on present-day TV screens and monitors;
a single custom-mode camera having definition that is very much greater than that of current cameras; and
a previous step consisting of merging images from a bank of cameras having overlapping fields of view and a same magnification.
In a first embodiment of the invention, the window is selected by an operator in the control room. The final image in its final state is broadcast (e.g. on air) or is transmitted (e.g. to a cable company, a satellite, an Internet access provider). This solution has the advantage of not increasing the data rate required, except between the camera and the control room. The operator in the control room can thus pan, tilt or zoom by purely electronic means, without the camera moving.
In a second embodiment, selection is performed by the user. The data rate required for the link is then considerably increased when a bank of cameras is used or when a very high definition camera is used. Because this approach requires a very large passband, it is suitable only for professional TV or for a link between a production site and a cable network head station which then broadcasts or retransmits to end users at a smaller data rate.
In yet another embodiment, applicable when the link with the end user includes a return channel, and when images are being transmitted as opposed to being broadcast, processing can take place in two stages. The user makes use of the return path to specify a determined fraction of the overall image which contains the window which the wishes to see. Fractions that a user can select may optionally be predefined. Thereafter the user acts locally to select the position and the size of the xe2x80x9cwindowxe2x80x9d that he wishes to see in the fraction of the image that he receives. Warping is performed locally.
This solution is particularly suitable for TV on the Internetxe2x80x94or more generally any extended computer networkxe2x80x94since the user then has a home computer with a high level of computing power. In addition, the number of users connected to an access supply center remains limited and consequently implementation is compatible with the available bandwidth, which in any case must be high from the center, given that Internet type links are individual.
The high definition TV signal is always subjected to a high degree of compression prior to being broadcast or transmitted. This compression is made easier by the high degree of time correlation between successive images within a given sequence. The compression modes used at present, such as the several versions of the MPEG standard, are entirely compatible with the method of the invention.
Use may be made of a xe2x80x9cpyramidxe2x80x9d type compression algorithm; the principles thereof are described in the article xe2x80x9cPyramid-based extraction of local image features with applications to motion and texture analysisxe2x80x9d, SPIE, Vol. 360, page 114 by Peter J. Burt. This algorithm provides a representation in the form of a first model at very low definition, and successive models that add the data required to reach the desired maximum definition.
In yet another implementation, the original image is subjected to such compression, and selection causes data to be transmitted solely up to the level required for achieving satisfactory resolution in the window that has been selected for receiving, and taking account of the extent of the window.
However using the pyramid algorithm gives rise to a constraint: there is a limit on the number of scales that can be selected, and thus on the number of window sizes.
The representation of a window obtained merely by warping does not give rise to the same deformation or the same degree of optical defocusing as are generated by the imperfections of the optical systems in the cameras to which television viewers are presently accustomed. To ensure that the image as displayed is of an appearance comparable to that obtained during direct broadcasting of a television signal, artificial deformation can be added to the edges by adding a deformation computation term to the zoom and position selection terms in the resampling algorithm so as to provide an effect which is comparable to that of spherical projection.
The invention is particularly well suited to replacing a fixed designated xe2x80x9ctargetxe2x80x9d in the image as supplied by the camera with a representation of a xe2x80x9cmodelxe2x80x9d or a xe2x80x9cpatternxe2x80x9d constituted by a stored synthesized small size image or by a video sequence. When the camera is fixed and has a constant magnification, all that needs to be done is point once to the target which is to be replaced. Once the model has been inlaid, the operations or processing are the same as when there is no prior inlaying. That approach constitutes a considerable simplification as compared with U.S. Pat. No. 5,353,392 (Luquet et al.) to which reference may be made.
There is also provided apparatus enabling the above method to be implemented.