1. Field of the Invention
The present invention relates to a multi-point conference system and a conference terminal device, and more particularly to a multi-point conference system in which a video conference is held using decentralized conference terminal devices and a conference terminal device used in the system.
2. Description of the Related Art
A multi-point conference system by which users who are in remote places can participate in a conference has been put to practical use. In such a multi-point conference system, conference terminal devices each of which has at least a camera device, a display device, a microphone and a loudspeaker are connected by a network such as dedicated lines, public lines or a LAN (Local Area Network). In this system, smooth image change in each of the conference terminal devices is desired.
FIGS. 1A and 1B illustrate a conventional multi-point conference system. Referring to FIGS. 1A and 1B, a multi-point control unit 100 is connected with a plurality of conference terminal devices. In this case, four conference terminal devices 101-1.about.101-4 are connected to the multi-point control unit 100.
Each of the conference terminal devices has a transmitting/receiving function for voice signals and image signals and a constitution of switching a coding operation between an interframe coding operation and an intra-frame coding operation. A conference terminal device at a transmitting side codes an initial frame of image data by the intra-frame coding operation and codes frames after the initial frame of the image data by the interframe coding operation. The interframe coding operation is then switched to the intra-frame coding operation at predetermined intervals (a predetermined number of frames). The intra-frame coded data and the interframe coded data are successively transmitted from the conference terminal device at the transmitting side. A conference terminal device at a receiving side decodes intra-frame coded data and stores the decoded frame data. When the next inter-frame coded data is received, the interframe coded data is decoded using the stored decoded frame data. A dynamic image is then displayed using the decoded frame data.
In addition, the multi-point control unit 100 has a function for distributing image data from each conference terminal device to other conference terminal devices. The various types of distribution of the image data have been proposed. For example, the following system has been known. In the system, one or more speakers who speaks now are selected. To a conference terminal device of a speaker, image data items of speakers other than the speaker are transmitted. To each of conference terminal devices of participants other than the speakers, image data items from the conference terminal devices of the speakers are transmitted.
Image information items 1.about.4 from the conference terminal devices 101-1.about.101-4 are transmitted to the multi-point control unit 100. When a user of the conference terminal device 101-2 speaks, the multi-point control unit 100 transmits the image information item 2 from the conference terminal device 101-2 to the other conference terminal devices 101-1, 101-3 and 101-4. When a user of the conference terminal device 101-1 speaks after the speech of the user of the conference terminal device 101-2 is terminated, the image information item 1 is transmitted from the conference terminal device 101-1 to the multi-point control unit 100. The multi-point control unit 100 then transmits the image information item from the conference terminal device 101-2 to the conference terminal device 101-1 and transmits the image information item 1 from the conference terminal device 101-1 to the other conference terminal devices 101-2, 101-3 and 101-4. This situation is shown in FIG. 1A.
When a user of the conference terminal device 101-3 speaks in this state, the multi-point control unit 100 transmits the image information item from the conference terminal device 101-1 to the conference terminal device 101-3. The multi-point control unit 100 then transmits the image information item from the conference terminal device 101-3 to the other conference terminal devices 101-1, 101-2 and 101-4. If the multi-point control unit 100 merely switches the transmission of the image information, there may be a case where the transmission of the image information is switched while the interframe coded data is being transmitted. In this case, a conference terminal device at the receiving side can not normally decode the coded imaged data until the intra-frame coded data is received.
Thus, the multi-point control unit 100 requests of a conference terminal device of a new speaker that the intra-frame coded data should be initially transmitted. The multi-point control unit 100 then supplies a display freeze instruction to stop displaying an image until intra-frame coded data is transmitted to the other conference terminal devices. This situation is shown in FIG. 1B.
For example, when the state shown in FIG. 1A is changed to a state in which the user of the conference terminal device 101-3 speaks, the multi-point control unit 100 supplies the request for the intra-frame coded data and the display freeze instruction to the conference terminal device 101-3. After the display freeze instruction, the image information 3 from the conference terminal device 101-3 is transmitted to the conference terminal device 101-1. To the conference terminal devices 101-2 and 101-4, the image information 3 substituted for the image information 1 is transmitted after the display freeze instruction.
FIG. 2 shows a coding process unit of each of the conventional conference terminal devices. Referring to FIG. 2, the coding process unit has a coding portion 111, a local decoding portion 112, a frame memory 113, a control portion 114, a selector (SEL) 115, a subtractor 116 and a motion vector search portion 117.
An image signal to be transmitted is supplied from a video camera (not shown) to the coding process unit. The image signal is then input to the motion vector searching portion 117 and the selector 115. Speaker specifying information is input to the control portion 114. The control portion 114 controls the selector 115 based on coding operation selecting information so that the coding operation is switched between the intra-fame predictive coding operation and the interframe predictive coding operation. When an image signal is selected by the selector 115, the image signal is input to the coding portion 111. The coding portion 111 codes the image signal so that the intra-frame coding operation is carried out. The difference between the input image signal and reference image information from the frame memory 113 is calculated by the subtractor 116. The subtractor 116 outputs an interframe difference signal. When the interframe difference signal is selected by the selector 115, the interframe difference signal is supplied to the coding portion 111. The coding portion 111 codes the interframe difference signal so that the interframe coding operation is carried out.
Thus, when the request for the intra-frame coded data is supplied from the multi-point control unit 100 shown in FIGS. 1A and 1B, the control portion 114 controls the selector 115 so that image information coded by the intra-frame coding operation can be transmitted. In addition, the local decoding portion 112 decodes data coded by the coding portion 111. The decoded data is stored as reproduced image information in the frame memory 113. As a result, the contents in the frame memory 113 is almost the same as the contents in the frame memory of a conference terminal device at the receiving side.
The motion vector searching portion 117 searches a predetermined area of a frame which is input at a present time using a block, having a predetermined size, indicated as the reference image information from the frame memory 113. As a result, the motion vector searching portion 117 obtains motion vector information indicating a changing direction. The motion vector information is supplied to the coding portion 111. Another process, such as a discrete cosine transform (DCT) process may be added to the coding process.
FIG. 3 shows a decoding process unit of the conventional conference terminal device. Referring to FIG. 3, the decoding process unit has a decoding portion 121, a motion compensation portion 122, a frame memory 123, an adder 124 and a selector (SEL) 125. Coded image information from a conference terminal device of a speaker is input to the decoding portion 121. The decoded image information includes control information indicating the intra-frame coded data or the interframe coded data and the motion vector information. The decoding portion 121 controls the selector 125 based on decoding operation selecting information.
In a case of the intra-frame decoding operation, the selector 125 selects image information decoded by the decoding portion 121 and the selected image information is supplied to a display device (not shown). In a case of the interframe decoding operation, predictive difference information decoded by the decoding portion 121. The predictive difference information and reference image information supplied via the motion compensation portion 122 from the frame memory 123 are added by the adder 124. As a result, reproduced image information is obtained. The reproduced image information is supplied via the selector 125 to the display device (not shown). The display device displays a dynamic image of a speaker based on the reproduced image information. The motion compensation portion 122 carries out a motion compensation process using the motion vector information. In a case where the discrete cosine transform process is carried out in the conference terminal device at the transmitting side, the inverse discrete cosine transform process is carried out in addition to the decoding process.
In the conventional multi-point conference system, the multi-point control unit 100 switches the image information transmitted therefrom and transmits the request for the intra-frame coding operation to a conference terminal device of a speaker. The coding operation is switched to the intra-frame coding operation in accordance with the request. Thus, in a conference terminal device at the receiving side, intra-frame coded data is initially received and interframe coded data after the intra-frame coded data is decoded, so that a reproduced dynamic image can be displayed using decoded frame data.
However, in the intra-frame coding operation, a relatively large amount of data is generated in comparison with the interframe coding operation. That is, a compression rate is decreased. Thus, when the network connected with the conference terminal devices has a low transmission speed, a long transmission time for image data obtained by coding the head picture in the intra-frame coding operation is required. As a result, a relatively long time is required to display a normal reproduced image. Thus, when a speaker is changed, a long time is required to change an image, corresponding to the speaker, to be displayed.