1. Field of the Invention
The present invention relates to a technique for controlling a video image composition processing in a video composition delivery apparatus from a video display apparatus, the video composition delivery apparatus and the video display apparatus being connected to a network. The present invention is used in, for example, a multipoint video conference system utilizing an image composition (video composition) server.
2. Related Art
It is possible to construct a multipoint video conference system using exchange of video images and voices between information devices capable of transmitting and receiving data via the network.
When constructing a multipoint video conference system including a plurality of conference terminals, there are a method of mutually exchanging video images between conference terminals, and a method of utilizing a conference server, transmitting video images from conference terminals to the conference server, composing video images received from a plurality of conference terminals to form one video image in the conference server, and then delivering the resultant video image to the terminals. Especially in the latter method, it suffices to receive a video image from the single conference server, and consequently the network load can be reduced as compared with the former method. The conference using the former method and the conference using the latter method are sometimes called distributive multipoint conference and concentrated multipoint conference, respectively.
The conference server is sometimes called MCU (Multipoint Control Unit) as well.
Video images received from respective terminals are respectively referred to as video sources. As for positions in which respective video sources are arranged in a composite video image, there are a method in which the conference server automatically determines and a method in which respective terminals exercise control. For example, in the case where the number of video sources is four, there are various composition patterns as to the arrangement position of the video sources, such as the case where the composite image is created so as to be divided into four parts, and the case where with respect to one video image remaining three video images are arranged like pictures in picture. In the case where control is exercised from each terminal, there is a method in which one is selected from among predetermined patterns and a notice thereof is sent to the conference server to change a composite video image. Besides the method of changing the video layout by ordering a pattern, a method of specifying arrangement positions of video sources from the terminal side is also conceivable.
On the other hand, unlike the multipoint video conference system, a system which receives screen information from a remote device and sends a control signal for updating screen information to the remote device is utilized in remote control of a personal computer (PC) as well.
For example, as a method for operating a remote PC, a concept “remote desktop” is proposed. This remote desktop function is mounted on PCs having Windows XP, which is the OS of the Microsoft Corporation, by default. According to the “remote desktop,” it becomes possible to operate a remote PC connected via a network as if it is at hand, by transmitting operation information such as mouse click generated by a device at hand to a remote device, creating screen information of a result obtained by conducting processing in the remote device which has received the operation information, and transmitting the screen information to the device at hand. When transmitting the screen information, a manner for lowering the network load is made by using transmission of only screen difference information, compression of transmitted image information, or the like.
Furthermore, in the “remote desktop” described above, screen information, i.e., image data itself is transmitted from the remote device which constructs screen information to the device at hand. However, a method of transmitting only a drawing instruction and conducting display processing in the device which has received the drawing instruction on the basis of the drawing instruction is proposed (Japanese Patent Application Laid-Open Publication No. 8-297617). As a concept of the same kind, there is a method called VNC (Virtual Networks Computing) and implemented.
In the ensuing description of the multipoint video conference system using a conference server and transmitting one composite video image from the conference server to a terminal, a conference server serving as an apparatus which provides a composite video image is referred to simply as “server” and a terminal serving as an apparatus which receives and displays the composite video image is referred to as “client.” Furthermore, in the remote desktop as well, an apparatus which creates and provides a screen and which is typically called terminal is referred to as “server”, and an apparatus which displays a screen and which is called viewer is referred to as “client.” The video image or view transmitted from the server to the client is a moving picture (such as MPEG4, MPEG2, H.263, H.264) or a still picture (such as continuous transmission of motion JPEG, JPEG image, and transmission of only difference information changed in a still picture). In the ensuing description, however, the video image or screen transmitted from the server to the client is referred to simply as “composite video image.”
For example, it is supposed in the multipoint video conference system that the server composites video images of participants B, C, D and E received respectively from terminals B, C, D and E into one video image, and transmits a resultant composite video image to a client which is a terminal A. In this case, the client itself does not recognize the four video images individually. Only the server recognizes the four video images individually.
On the other hand, for example, on the remote desktop in a PC having the Windows XP mounted thereon, a window or the like is displayed on a composite video image received by the client. A user who operates the client can freely move the window in a display screen of the client and change the size of the window. If the window is started by a drawing application and, for example, a rectangular figure is drawn in the window, it is also possible to move the position of the rectangular figure and change the size of the rectangular figure. If the window or the rectangular figure is clicked by a mouse, it is changed to a display which indicates that it has been selected. By further conducting mouse operation on the changed figure, it becomes possible to change the position and size. In these operations, however, the client does not recognize the window and the rectangular figure itself, but the server recognizes the window and rectangular figure. As for the display change of the window and the rectangular figure as well, only the position information of the mouse is conveyed to the server at the time of mouse click. The server judges the processing of the mouse operation, and creates a composite video image with the window and the rectangular figure changed. The client merely displays the composite video image received from the server.
In the client having a function of receiving a composite video image created by the server and merely displaying the composite image, the composite video image is nothing but one video image. It is not clear that the composite video image includes a plurality of video sources, and boundaries between video sources are not clear, either.
In the above-described pattern changeover method used in the multipoint video conference system, detailed layout information representing portions in the composite video image in which objects are arranged is not managed. On the other hand, in Japanese Patent Application Laid-Open Publication Nos. 5-103324 and 9-149396, a concept that the composite image is changed by transmitting object layout information to the server is shown. Accordingly, it is imagined that the client manages the object layout information. In Japanese Patent Application Laid-Open Publication Nos. 5-103324 and 9-149396, however, only the configuration of the server is shown, and how the client knows the object layout information is not described at all. As the method for knowing the object layout information, for example, a method of sending a notice of object layout information from the server is also conceivable. However, a mechanism in which bidirectional information can be exchanged between the client and the server becomes necessary for control signals. By the way, as for a change of the composite image in the server, there is a possibility that the composite image is automatically changed because of an increase or decrease in conference participants. Therefore, a mechanism for sending a notice from the server to the client each time a change is conducted or a mechanism for the client to ascertain the change as occasion demands is necessary. In addition, it is necessary to consider processing to be conducted when timing for transmitting a control signal from the client and timing for sending a notice of layout information from the server overlap each other, resulting in complicated processing.