1. Field of the Invention
The present invention relates to a videoconference terminal and an image/voice regeneration method to be used for this videoconference terminal, and relates, more particularly, to a method of regenerating an image and voice at a videoconference terminal.
2. Description of Related Art
In the transmission of image and voice signals by using a videoconference terminal, the received timings of the image signal and voice signal after the compression and expansion of these signals become different from the timings when the original signals were transmitted. This is because the time required for compressing and expanding the image signal is different from the time required for compressing and expanding the voice signal.
In general, an image signal requires a larger amount of processing information than a voice signal. Therefore, at the receiving end, the processing of the image signal is completed later than the voice signal. This results in an unnatural regeneration of the signals that the image appears after the voice.
In order to solve this problem, there has been a conventional videoconference terminal that has a function of selecting from among preset fixed values a delay volume for delaying the timing of regenerating the voice at the receiving end. When the videoconference terminal having this function is used, it is possible to match to some extent the timing of regenerating the image with the timing of regenerating the voice.
As another conventional technique for solving the above problem, there has been an image/voice synchronization system of the MPEG-2 (Moving Picture Experts Group phase 2) that is one of motion picture compression systems.
According to the MPEG-1 (Moving Picture Experts Group phase 1) system and the MPEG-2 system (hereinafter to be collectively referred to as the MPEG system), each of an image packet and a voice packet has own time stamp called PTS (Presentation Time Stamp).
The PTS is stored in the header (packet header) of the image packet and the voice packet respectively at the time of transmitting the signals to the receiving end.
FIG. 1 shows a state that the image and the voice are multiplexed with the packet by the MPEG-2. The packet header is embedded in the image packet and the voice packet respectively. A value of the PTS is stored in the packet header.
In the mean time, at the receiving end, there is a counter (STC: System Time Clock) that is accurately synchronized with the transmitting end. A decoder regenerates the image and the voice when the value of the PTS stored in each packet header of the received image and voice becomes equal to the value of the STC of the receiver.
In other words, when the value of the PTS (a regeneration time) is stored in advance at the transmitting end such that the image and the voice are regenerated at the same time at the receiving end, the receiver can obtain an output image and an output voice that are synchronized with each other. The MPEG system is described in detail in xe2x80x9cThe Latest MPEG Textbookxe2x80x9d (in Japanese), Ascii Publishing Co., Ltd., 1995.
According to the above-described conventional videoconference terminal, however, there has been the following problem. When the image and the voice are transmitted, the time required for the compression processing and the expansion processing is not constant and is different depending on the contents of the input signal. Therefore, according to the conventional method of fixing the delay volume, it is not always possible to make the timing of regenerating the image and the timing of regenerating the voice coincide with each other.
Further, according to the image/voice synchronization system that is employed in the MPEG system, it is always possible to make the timing of regenerating the image and the timing of regenerating the voice coincide with each other. However, this system has the following problems.
First, according to the above image/voice synchronization system, a large amount of information is required for the synchronized regeneration of signals. A set range of the PTS value is taken large (24 hours or more) at the receiving end. Therefore. the data width of the PTS is as large as 44 bits. As a result, the circuit scale becomes large. Further, as the PTS is stored in the header, the length of the header becomes large.
Second, the above-described image/voice synchronization system is an MPEG exclusive system. As the PTS is stored in the packet header that is own to the MPEG system, only a system that uses the MPEG system can utilize the PTS. Therefore. there is no compatibility with other motion picture encoding systems.
The present invention has been made to solve the above-described conventional problems. It is, therefore, an object of the present invention to provide a videoconference terminal and an image and voice regeneration system to be used therefor, which are capable of easily achieving a videoconference with a sense of realism by regenerating the image and the voice at the same timing as that of the transmitting end, without the need for increasing the header information and regardless of compression /expansion system.
In order to meet the above object, according to the present invention, there is provided a videoconference terminal that regenerates an image and voice by always accurately matching the regeneration timing with that at the transmitting end. Therefore, at the receiving end, it is possible to regenerate the image and voice at the same timing as that at the transmitting end.
More specifically. according to the videoconference terminal of the present invention, a videoconference terminal comprising a transmitter and a receiver, wherein the transmitter comprises: an analog-to-digital converter for converting input analog image and analog voice signals to input digital image and voice signals, respectively; a marker for simultaneously and periodically embedding a marking signal in the input digital image signal and the input digital voice signal corresponding to the input digital image signal to produce digital image and voice signals: and a data compressor for compressing the digital image signal and the digital voice signal to produce compressed image signal and compressed voice signal which are transmitted to another end of videoconference.
The receiver comprises: a data expander for expanding received image signal and received voice signal to produce received digital image signal and received digital voice signal; a time difference detector for detecting an arrival time difference between the received digital image signal and the received digital voice signal based on marking signals detected from the received digital image signal and the received digital voice signal, respectively; a digital-to-analog converter for converting the received digital image signal and the received digital voice signal to a received analog image signal and a received analog voice signal; and an adjuster for adjusting timings of the received analog image signal and the received analog voice signal depending on the arrival time difference.
The analog image signal input from a camera or the like is quantized by an image A/D converter. The quantized signal is then passed through a marking signal adding circuit, and is compressed according to a transmission speed in a transmission path by an image compressing circuit. Thereafter, the signal is multiplexed with the voice signal by a multiplexing circuit, and the multiplexed signal is sent to the transmission path.
On the other hand, the analog voice signal input from the microphone or the like is quantized by a voice A/D converter. The quantized signal is then paused through a marking signal adding circuit, and is compressed by a voice compressing circuit. Thereafter, the signal is multiplexed with the image signal by a multiplexing circuit, and the multiplexed signal is sent to the transmission path. The image compressing circuit and the voice compressing circuit compress the quantized image and voice signals respectively by using a reversible encoding algorithm.
At the receiving end, the signal received from the other side of the communication through the transmission path is separated into the image signal and the voice signal by a separating circuit. The image signal is passed through an image expanding circuit, a marking signal detecting circuit, and a D/A converter. Thus, the signal is regenerated as an analog image output signal.
Similarly, the voice signal is also passed through a voice expanding circuit, a marking signal detecting circuit, and a D/A converter. Thus, the signal is regenerated as an analog voice output signal. The image expanding circuit and the voice expanding circuit expand the compressed image signal and the compressed voice signal respectively by the algorithm reversed from that of the transmitting end.
According to the videoconference terminal of the present invention, the A/D converter quantizes the input analog image signal and the input analog voice signal respectively. Immediately after the signal quantization, that is, before these signals are compressed, marking signals are embedded in the quantized image signal and the quantized voice signal respectively by the marking signal adding circuit simultaneously and periodically. Each marking signal is embedded in each signal by replacing a part of the bits of the signal with the marking signal.
Based on the reversible algorithm for the compression and expanding of the signals, the signals before the compression are completely regenerated at the receiving end. Therefore, it becomes possible to detect the marking signal embedded in at the transmitting end, by the marking signal detecting circuit at the receiving end. Thus, it is possible to know an arrival time difference t between the arrival time of the marking signal embedded in the image signal and the arrival time of the marking signal embedded in the voice signal.
Then, the delay circuit delays the output of the first-arrived signal out of the image signal and the voice signal by the arrival time difference t. As a result, at the receiving end, it becomes possible to regenerate the image and the voice at the same timing as that when the signals are transmitted at the transmitting end. In other words, even when there is a time difference between the image signal arrival time and the voice signal arrival time, it is always possible to regenerate at the receiver""s videoconference terminal the image and the voice at the same timing as the transmission timing of the signals.
According to the method of the present invention, the marking signals are processed immediately after the image signal and the voice signal have been quantized, or immediately before the signals are converted into analog signals. Therefore, when the reversible algorithm is used, it is possible to provide a synchronized regeneration function for synchronously regenerating the image and the voice without depending on the compression and expansion system. Further, as the short marking signal is directly embedded in the image signal and the voice signal respectively without using a long time stamp, it is possible to achieve a synchronized regeneration of the image and voice based on a smaller amount of information.