1. Field of the Invention
The present invention relates to a video composition apparatus, a video composition method and a video composition program, used to receive a plurality of video signals and output a plurality of video signals obtained by composing those received video signals in different patterns.
2. Description of the Background
In the case of the so-called multipoint video conference in which communication is conducted among a plurality of terminals by using microphone voices and camera video images, the load is heavy in the aspect of communication and terminal processing if simply the terminals are connected in a full mesh form. In general, therefore, a technique of providing an MCU (Multipoint Conference Unit) is used. The MCU is a kind of a server, and has a function of connecting with terminals, receiving voice and video images from the terminals, composing them, and transmitting resultant composite video images and voices to the terminals. Owing to this MCU, it becomes possible for the terminals to obtain voices and video images from all participants by only communicating with the MCU and receiving a composite voice and a composite video image, resulting in a high efficiency in the aspect of communication and terminal processing. In this way, the MCU plays an important role in the multipoint video conference, and a video composition technique is utilized.
As another application of the video composition technique, there is a screen splitting unit in a surveillance camera. In general, a plurality of surveillance camera are installed in a building or the like. If they are observed and recorded using separate monitors, the equipment becomes large-scaled, resulting in lowered convenience. In a typically used technique, therefore, a screen splitting unit is used, and a plurality of camera video images are composed to generate one video signal. The one video signal is viewed using a single monitor or recorded using a single video recorder.
However, most of conventional video composition techniques output a single composite video image. Few conventional techniques propose to output a plurality of composite video images.
For example, the case of the video conference will now be considered. Even in a model in which a composite video image of one kind is generated and all participants watch it as in the conventional technique, the video conference can be implemented. In pursuing greater convenience, however, for example, a demand that a specific video image selected from among a plurality of composite video images should be zoomed and displayed is also made. One solution thereto is described in Japanese Patent Application Laid-Open Publication No. 11-88854. When a certain video image is zoomed, however, video images other than the zoomed video image cannot be watched according to the technique described in Japanese Patent Application Laid-Open Publication No. 11-88854. If the way of composition is changed and, for example, a composite video image obtained by embedding other downscaled images in a zoomed video image can be exhibited, it is more convenient to use. Or if it is possible to exhibit a composite video image obtained by zooming a video image to relatively some degree instead of zooming the video image to the whole screen and displaying other downscaled participants around the zoomed video image, it is more convenient to use. In this case, it is considered that the case where participants desire to zoom in different video images will naturally take place. Therefore, it is demanded to compose video images in different patterns for respective terminals and output a plurality of different composite video images.
A model in which a plurality of composite video images are output is described only in Japanese Patent Application Laid-Open Publication No. 5-103324 so far as we know. In Japanese Patent Application Laid-Open Publication No. 5-103324, a downscaling circuit downscales input video images stored in a video memory, and a composition circuit composes the downscaled video images and outputs resultant composite video images. If it is supposed that this model is actually mounted, however, there is a problem that the circuit becomes complicated. Specifically, the following problems occur.
First, the composition circuit which outputs composed video signals must typically output signals at timing based on standards for output signals. Therefore, input signals must be input to the composition circuit at timing that causes the output timing to satisfy the standards with the processing delay in the composition circuit itself taken into consideration. (If a buffer is provided between the input signals and the composition circuit, the input signals must be input within allowed timing.) On the other hand, in order for the composition circuit to output signals at the above-described timing, a downscaling circuit which generates the input signals to the composition circuit, must read data from a video memory at adapted timing with the processing delay in the downscaling circuit itself taken into consideration. In this way, according to the final output timing of signals from the composition circuit, it is necessary to determine output timing of the downscaling circuit with the processing delay in the composition circuit taken into consideration, and determine timing for reading out data from the video memory with the processing delay in the downscaling circuit taken into consideration.
If in the above-described configuration the composition circuit is formed to be able to dynamically change the composition pattern and the downscaling circuit is formed to be able to dynamically change downscaling factors, delay timing also dynamically changes. As a result, processing which copes with the delays does not become simple, resulting in a complicated circuit and an increased circuit scale.