The present invention relates to an image communication apparatus for transmitting and receiving dynamic images, still images, audio signals, and others, and more particularly to synchronization of timing when transmitting and receiving by inserting preliminarily accumulated dynamic images and still images into real-time dynamic images and still images.
FIG. 1 is a structural diagram showing a standard H.324 terminal device in which ITU-T Recommendation H.324 is applied as a prior art.
In FIG. 1, crude data such as audio data, video data, and control data generated by microphone, camera, and system in a standard transmission capacity generation unit 101 having a standard transmission capacity of H.324 are encoded in audio encoder, video encoder, and control encoder, respectively. Using transmission protocols of AL2, AL3, SRP corresponding to the encoded data, they are issued as a bit stream 104 of audio encoded data, a bit stream 105 of video encoded data, and bit stream 106 of connection control encoded data. From an application specification storage unit 102 for storing an application specification for using a non-standard capacity, a bit stream 107 of data is issued through a protocol storage unit 103 for storing a transmission protocol of data channel. A multiplexer 108 assembles bit streams 104 to 107 into one packet, and sends out to a transmission path 113 as multiplexed data. A demultiplexer 109 separates the multiplexed data transmitted through the transmission path 113 into individual bit streams. Separated bit streams of video encoded data and connection control encoded data are received in a standard reception capacity generation unit 110 having a standard reception capacity of H.324 by using reception protocols corresponding to AL2, AL3 and SRP, and are respectively decoded in an audio decoder, a video decoder, and a control decoder, and issued to speaker, display unit, and system as audio data, video data, and control data. Data of non-standard capacity is issued to an application specification storage unit 112 for storing an application specification for using non-standard capacity through a protocol storage unit 111 for storing a reception protocol of data channel of non-standard capacity.
The standard H.324 terminal device shown in FIG. 1 is composed of three channels, that is, a control channel for managing the connection information between terminal devices, a dynamic image channel for transferring a dynamic image bit stream encoded in real time, and an audio channel for transferring an audio bit stream encoded in real time. For such data transfer, a standard specific transfer protocol designated by ITU-T Recommendation is used, and therefore exchange of real-time sound, real-time dynamic image and connection information is possible among all terminal devices.
If other functions than such standard functions are required, a data channel having a transfer protocol for such function (a protocol stored in the protocol storage unit 103) is needed, and it also requires an application specification for using such data (an application specification stored in the application specification storage unit 102). There are various types of transfer protocol and application specification depending on the purpose. It is, however, not always guaranteed that the terminal device at the destination side has the same protocol and specification as possessed at the sender""s side. Accordingly, at the time of start of connection, using the control channel, it is judged if usable or not by mutually checking the additional functions provided in the individual terminal devices.
In a television conference system, real-time video and audio are entered at the same time. Being encoded in the individual encoders, video and audio bit streams issued from encoders are immediately sent into the multiplexer, and the multiplexed bit stream is transmitted to the terminal device at the destination. Since they are immediately multiplexed after generation and transmitted, the synchronism of video and audio is maintained. However, the required encoding time differs between audio and video signals, and if put in the same time width, video encoding takes a longer time than audio encoding. If mixed right after generation of data, synchronization is actually deviated by the portion of this time difference.
Incidentally, the video bit stream designated, for example, in H.263 has time information TR of absolute time for turning one round in every eight seconds approximately in order to indicate the display start time of the image in every frame. The transmission side has a time counter for turning one round in every eight seconds, and on the basis of this time counter, the time information TR is inserted in each frame in the bit stream. The decoder adjusts the display changeover timing of each frame for composing a dynamic image on the basis of the time axis managed by itself and the time information TR in the video frame.
Problems occurring when inserting preliminarily accumulated images into real-time images are discussed below while referring to an example of television conference system. FIG. 2 (a) shows the stream state of encoder and decoder, and the stream state of accumulated video data when passing accumulated video data in video channels in the conventional television conference system, and FIG. 2 (b) shows the display timing of each frame when changing over from the real-time image to accumulated image.
In FIG. 2, reference numeral 201 shows a first frame of accumulated video data, and 202 shows a last frame of real-time video data issued from an encoder. Reference numeral 203 shows the display start position of last frame of real-time video data issued from the encoder, and 204 is the display start position of first frame of accumulated video data.
In the television conference system, when changing over from the video bit stream flowing in the video channel to the accumulated video bit stream, the difference between the time information TR (202) of the last frame 202 of the real-time dynamic image bit stream, and the time information TR (201) of the first frame 201 of the accumulated video bit stream is the time width from he display start point 203 of the last frame of the real-time dynamic image on the decoder till the display start point 204 of the first frame of the accumulated image. In the example in FIG. 2 (b), the time width 205 from the display start point 203 of the last frame of the real-time dynamic image till the display start point 204 of the first frame of the accumulated image is variable with the changeover timing, and it may be about eight seconds at worst, and in this period, therefore, the real-time last image is displayed as a still image.
In the conventional television conference system, for the purpose of strict synchronization, the synchronization deviation information called skew is transmitted to the terminal device at the destination separately through a control channel, and by adjusting the synchronization by its value, strict synchronization of video and audio is realized on the terminal device at the destination.
In the television conference system, while transmitting the dynamic image in real time, it is often changed over to preliminarily accumulated images (dynamic image or still image) on the way. For example, if attempted to transmit an accumulated still image by using a dynamic image channel according to the H.263 regulation, the image bit stream designated in the H.263 is variable in the time difference between the time information TR of the last frame of the bit stream of the dynamic image immediately before changeover, and the time information TR of the first frame of the accumulated image bit stream, when changed over from the image bit stream flowing in the video channel in the real time to the accumulated image bit stream, and, depending on the circumstance, there is a considerably long waiting time until reproduction of still image starts at the reception side.
If the transmission side and reception side have non-standard original transfer protocol for transfer of still image, it can be changed over to the still image according to the transfer protocol, but the both terminals to be connected are required to have the transfer protocol and application specification of the same specification.
The invention is devised to improve the problems of the prior art mentioned above, and it is an object thereof to present an image communication apparatus capable of transmitting and receiving accumulated dynamic images and still images without causing the aforesaid inconvenience, without requiring any particular transfer protocol or application at the destination side, if having a standard transfer protocol.
A first aspect of the invention relates to an image communication apparatus for reading and displaying preliminarily accumulated images, or directly displaying the images in the process of encoding or decoding, which comprises means for correcting and issuing time information existing in every frame in the bit stream of accumulated images when sending out the bit stream being read out from an image accumulation device.
In this constitution, in the transmitting direction or receiving direction, the time from the display start point of the last frame of the image in the process of encoding or decoding till the display start point of the accumulated bit stream can be kept constant by force, and the time difference between the bit stream of the image in the process of encoding or decoding flowing in the video channel and the bit stream of the preliminarily accumulated image can be absorbed.
A second aspect of the invention relates to an image communication apparatus for transferring real-time dynamic image or still image, which comprises time information correcting means for correcting and processing the time information portion in the frame of the video bit stream, in which when sending out the bit stream of still image being read out from an image accumulation device, the same video information as the video information of the still image bit stream once transmitted and corrected only of the time information existing in every frame of the still image bit frame transmitted previously is issued at a specific interval.
In this constitution, breakdown of the image decoder can be prevented, and the non-standard use of preliminarily accumulated still images is enabled.
A third aspect of the invention relates to an image communication apparatus which comprises an accumulation device for accumulating video or audio bit streams, and an accumulation processing unit for accumulating the data and encoding end time of each frame of bit streams in the accumulation device, in which when accumulating the bit streams in the accumulation device, the accumulation processing unit stores the data and encoding end time of each frame in the accumulation device while distinguishing between audio and video data.
In this constitution, when reproducing at the own terminal or the destination terminal, since the take-out timing of video and audio bit streams can be adjusted on the basis of the data and encoding end time of each frame, at the time of data output from the accumulation device to the multiplexer or encoder, audio and video synchronization is achieved. Moreover, since the accumulation processing unit accumulates also the skew of the own terminal, strict synchronous take-out of audio and video signals is realized.
A fourth aspect of the invention relates to an image communication apparatus for receiving and accumulating video and audio bit streams sent from other communication terminal, which comprises an accumulation device for accumulating video or audio bit streams, and an accumulation processing unit for accumulating data of each frame of bit stream and reception time of frame beginning position in the accumulation device, in which when accumulating the received bit stream in the accumulation device, the accumulation processing unit stores the data of each frame and reception time of each frame beginning position in the accumulation device while distinguishing audio and video data.
In this constitution, when reproducing, since the take-out timing of video and audio bit streams can be adjusted on the basis of the data of each frame and reception time of frame beginning position, at the time of data output from the accumulation device to the multiplexer or encoder, audio and video synchronization is achieved. Moreover, since the accumulation processing unit accumulates also the skew of the transmission side terminal, strict synchronous take-out of audio and video signals is realized.