1. Field of the Invention
The present invention relates to a TV telephone system having a plurality of TV telephone apparatuses communicating by video signals and audio signals via an Internet protocol network.
2. Related Art
In a TV telephone system of the past, communication by video and audio signals was conducted between TV telephone apparatuses via an internet protocol network such as an intranet or an internet.
In a TV telephone system, communication is conducted using video and audio signals, based for example on the Recommendation H.323 or H.225 of the International Telecommunication Union Telecommunication Standardization Sector (ITU-T).
A TV telephone apparatus used in a TV telephone system of the past is described below.
FIG. 6 of the accompanying drawings is a block diagram showing the example of the configuration of a TV telephone apparatus used in a TV telephone of the past.
The TV telephone apparatus 100 in this example of prior art conducts communication using a video signal and an audio signal via a network 114, with a TV telephone apparatus (not shown in the drawing) having the same configuration as the TV telephone apparatus 100.
As shown in FIG. 6, the TV telephone apparatus 100 of the prior art is formed by an audio I/O section 101, an audio CODEC 102, a receiving path delay section 103, a video CODEC 104, a video I/O section 105, a user data application section 106, a system controller 107, a system controller unnumbered information (UI) section 111, an H.225 layer section 112, and a network interface 113.
The audio CODEC 102 performs compression and encoding processing of an audio signal received from the audio I/O section 101, so as to generate compressed audio data. The compressed audio data output from the audio CODEC 102 passes through the receiving path delay section 103, the H.225 layer section 112, the network interface 113, and the network 114 and is transmitted to the other party""s TV telephone apparatus.
The audio CODEC 102 performs decompression and decoding of compressed audio data received from the other party""s TV telephone apparatus via the network 114, the network interface 113, the H.225 layer section 112, and the receiving path delay section 113, so as to play back the audio signal to the audio I/O section 101, which outputs the audio signal output from the audio CODEC 102 to a speaker (not shown in the drawing) or the like.
At the audio CODEC 102, compression/decompression and encoding/decoding processing are performed in accordance with any one of ITU-T Recommendations G.711, G.722, G.723, G.728, and G.729.
The video CODEC 104 performs interframe predictive decoding of a video signal captured by a video camera (not shown in the drawing) at the video I/O section 105, so as to generate an output compressed image data. The compressed image data output from the video CODEC 104 is sent to the other party""s TV telephone apparatus, via the receiving path delay section 103, the H.225 layer section 112, the network interface 113, and the network 114.
The video CODEC 104 performs decompression and decoding of compressed image data sent from the other party""s TV telephone apparatus via the network 114, the network interface 113, the H.225 layer section 112, and the receiving path delay section 103, so as to generate a video signal and output to the video I/O section 105, which displays the video signal output from the video CODEC 104 on a display (not shown in the drawing).
In the video CODEC 104, interframe predictive coding of a video signal captured at the video I/O section 105 is performed, in accordance with any one of the ITU-T Recommendations H.261, and H.263, and the compressed video signal sent from the other party""s TV telephone apparatus is expanded and decoded.
The ITU-T Recommendation H.261 (hereinafter referred to as simple H.261) is described below.
In H.261, intraframe coding (hereinafter referred to as INTRA) and interframe coding (hereinafter referred to as INTER) are both used as interframe predictive encoding.
Interframe predictive coding is coding in which data from an initial or previous frame is referenced when performing encoding processing, and intraframe coding is coding in which only data within the current frame is used in performing encoding processing.
For example, in the case in which there is little movement in the video signal captured by the video I/O section 105, because there is a large correlation with respect to previous and subsequent frames, interframe coding is performed. However, in the case in which there is a large amount of movement captured in the video signal at the video I/O section 105, because the correlation with respect to previous and subsequent frames is small, intraframe coding is used.
Thus, in H.261, in the case in which there is a large change between a frame being coded and an initial frame or a previous frame, intraframe coding, in which the data from the previous frame is not referenced, is performed. In coding other frames, the previous frame is referenced, and interframe coding is done.
In H.261, in the case of performing decompression and decoding of compressed image data that has been coded at the other party""s TV telephone apparatus using inter frame coding, previous frame data is referenced when performing decompression and decoding, and in the case of performing decompression and decoding of compressed data that has been coded at the other party""s TV telephone apparatus using intraframe coding decompression and decoding are performed using only the current frame.
H.263 is a partially improved version of H.261 for a general switched telephone network (GSTN) type of TV telephone system, and because features of interframe predictive coding and operation are very similar to H.261, it will not be described herein.
Therefore, compressed image data generated at the video CODEC 104 is a mixture of xe2x80x9cINTRAxe2x80x9d data without interframe predictive signals, obtained by intraframe coding and xe2x80x9cINTERxe2x80x9d data with interframe predictive signals, obtained by interframe coding.
The receiving path delay section 103, in the case in which there is a offset between the audio signal at the audio I/O section 101 and the video signal at the video I/O section 105, provides a delay so as to compensate for this offset by causing any one of the audio signal and the video signal to delay. The offset between the audio signal and the video signal is dependent upon communication condition over the transmission path between the TV telephone apparatus 100 and the TV telephone apparatus of the other party.
The user data application section 106 executes various applications that use the user data channel in the TV telephone apparatus 100.
With respect to the compressed audio data output from the audio CODEC 102 and the compressed image data output from the video CODEC 104, the H.225 layer section 112 adds an RTP (real time transfer protocol) header and performs UDP protocol processing (User Datagram Protocol), and also performs UDP protocol processing and removing the RTP header from compressed image data sent from the other party""s TV telephone apparatus via the network 114 and the network interface 113.
The UDP protocol is a connectionless type of protocol (RFC 768) that has been standardized by the IETF (Internet Engineering Task Force), and is a type of communication protocol for the IP network such as Internet and intranet.
Because of the simplicity of the UDP protocol, it features superior data communication throughput, and provides an advantage in improving the simultaneity of image communication, making it suitable, for use in real-time communication of both audio and video signals.
FIG. 7 of the accompanying drawings is a block diagram showing an example of the configuration of the H.225 layer section 112 of FIG. 6.
In the configuration example shown in FIG. 7, the H.225 layer section 112 is formed by an RTP section 120, an RAS (Remote Access Service) section 121, a UDP layer section 122, a call signaling section 123, an H.245 section 124, a TCP (Transport Control Protocol) layer section 125, and an IP layer section 122.
The RTP section 120 adds an RTP header to compressed audio data output via the receiving path delay section 103 from the audio CODEC 102, and the compressed image data output via the receiving path delay section 103 from the video CODEC 104, and also removes the RTP header from compressed audio data output from the UDP layer section 122 and compressed image data output from the UDP layer section 122.
The compressed audio data from which the RTP header has been stripped at the RTP section 120 is output to the audio CODEC 102 via the receiving path delay section 103, and the compressed image data from which the RTP header has been stripped at the RTP section 120 is output to the video CODEC 104 via the receiving path delay section 103.
The RAS section 121 performs RAS data communications for management of the communication condition and bandwidth, in accordance with the ITU-T Recommendation H.223, with respect to a gatekeeper (not shown in the drawing) on the network 114.
The call signaling section 123 performs communication of call signaling data for making calls and connections and disconnecting with respect to the TV telephone apparatus of the other party, based on the ITU-T Recommendation H.225.
The H.245 section 124 performs H.245 data communication with the TV telephone apparatus of the other party, for the arbitration of the operating mode in accordance with the ITU-T Recommendation H.245.
The UDP layer section 122 performs UDP protocol processing to the RAS data output from the RAS section 121, and to compressed audio data and compressed image data to which an RTP header has been added at the RTP section 120, and outputs the resulting RAS data, compressed audio data, and compressed video data to the IP layer section 127.
The UDP layer section 122 performs UDP protocol processing to RAS data, compressed audio data and compressed image data sent from the TV telephone apparatus of the other party via the network 114, the network interface 113, and the IP layer section 127, and outputs the resulting compressed audio data and compressed image data to the RTP section 120, the UDP protocol processed RAS data being output to the RAS section 121.
The TCP layer 125 performs TCP protocol processing to call signaling data output from the call signaling section 108 and to H.245 data output from the H.245 section 124, and outputs the TCP protocol processed call signaling data and H.245 data to the IP layer section 127.
The TCP layer 125 performs TCP protocol processing to call signaling data and to H.245 data sent from the TV telephone apparatus of the other party via the network 114, the network interface 113 and IP layer section 127, and outputs the TCP protocol processed call signaling data to a call signaling section 123 and outputs H.245 data to the H.245 section 124.
The IP layer section 127 performs IP protocol processing to various data output from the TCP layer section 125 and the UDP layer section 122, this IP protocol processed data being sent, via the network interface 113 and the network 114, to the TV telephone apparatus of the other party.
The IP layer section 127 performs IP protocol processing to various data set from the TV telephone apparatus of the other party via the network 114, the network interface 113, and outputs the IP protocol processed call signaling data and the IP protocol processed H.245 data to the TCP layer section 125, and outputs compressed audio data, compressed image data and RAS data to the UDP layer section 122.
The system controller 107 is formed by a call controller 108 that exchanges call signaling signals with the call signaling section 123 so as to control the call signaling section 123, an H.245 controller 109 that performs exchange of an H.245 control signal with the H.245 section 124 so as to control the H.245 section, and an RAS controller 110 that performs exchange of an RAS control signal with the RAS section 121 so as to control the RAS section 121, thereby controlling the overall TV telephone apparatus 100.
The system control UI section 111 performs negotiation with the other party""s TV telephone apparatus with regard to calling connection control and operating mode in accordance with a command from the system controller 107.
The network interface 113 sends various data output from the IP layer section 127 via the network 114 to the TV telephone apparatus of the other party, and receives various data sent from the TV telephone apparatus of the other party via the network 114, outputting the received data to the IP layer section 127.
The communication operation of the above-noted TV telephone apparatus for audio signals and video signals is described below.
In the system controller 107, call connection to the TV telephone apparatus of the other party, via the H.225 layer section 112, the network interface 113, and the network 114, is performed. When this is done, at the system control UI section 111, based on a command from the system controller 107, call connection control and negotiation with regard to the operating mode or the like are performed.
When the call connection with the TV telephone apparatus of the other party is established, communication operation for audio signals and video signals begins in the respective sections within the TV telephone apparatus 100.
The communication operation for an audio signal is as follows.
If an audio signal recorded by a microphone or the like at the audio I/O section 101 is to be sent to the TV telephone apparatus of the other party, the audio CODEC 102 performs compressing and encoding processing of the audio signal recorded at the audio I/O section 101, thereby generating compressed audio data, which is sent to the H.225 layer section 112, via the receiving path delay section 103.
In the H.225 layer section 112, an RTP header is added to the compressed audio data output from the audio CODEC 102 by the RTP section 120, and UDP protocol processing is performed by the UDP layer section 122.
Next, the compressed audio data that has been UDP protocol processed by the UDP layer section 122 is IP protocol processed by the IP layer section 127, after which it is sent, via the network interface 113 and the network 114, to the TV telephone apparatus of the other party.
In this example of prior art, the audio signal recorded by the audio I/O section 101 is compressed and encoded by the audio CODEC 102, after which it is UDP protocol processed by the UDP layer section 122, and further this compressed and UDP protocol processed audio data is sent to the TV telephone apparatus of the other party.
When playing back the compressed audio data sent from the TV telephone apparatus of the other party, the network interface 113 receives the compressed audio data.
In the H.225 layer section 112, the compressed audio data received in the network interface 113 is UDP protocol processed by the UDP layer section 122 and the RTP header thereof is removed by the RTP section 120, after which the compressed audio data without the RTP header is sent to the audio CODEC 102, via the receiving path delay section 103.
In the audio CODEC 102, the compressed audio data output from the H.225 layer section 112 via the receiving path delay section 103 is decompressed and decoded, thereby generating a decompressed and decoded audio signal, which is sent to the audio I/O section 101, which plays back the audio signal output from the audio CODEC 102 to a speaker or the like.
Next, the operation of communication with a video signal is described below.
In the case in which a video signal captured by a video camera or the like in the video I/O section 105 is to be sent to the TV telephone apparatus of the other party, in the video CODEC 104 compression and encoding processing is performed to the video signal captured at the video I/O section 105, thereby generating compressed image data, which is output to the H.225 layer section 112, via the receiving path delay section 103.
The compressed image data generated by the video CODEC 104 is a mixture of INTRA video data not containing interframe predictive coding, and INTER video data containing interframe predictive coding.
In the RTP section 120 of the H.225 layer section 112, an RTP header is added to the compressed image data output from the video CODEC 104, and UDP protocol processing is performed by the UDP layer 122 on the compressed image data.
Next, after the UDP protocol processing is performed to the compressed image data in the UDP layer section 122, IP protocol processing is performed to the UDP protocol processed compressed image data in the IP layer section 127, after which it is sent to the TV telephone apparatus of the other party, via the network interface 113 and the network 114.
In the above-noted example of prior art, a video signal obtained at the video I/O section 105 is compressed and encoded at the video CODEC 104, after which it is UDP protocol processed in the UDP layer section 122, and then IP protocol processed in the IP layer section 127, the resulting data being sent as compressed image data to the TV telephone apparatus of the other party.
When compressed image data sent from the TV telephone apparatus of the other part is to be displayed in the video I/O section 105, in the network interface 113 the compressed image data is received from the other party""s TV telephone apparatus, via the network 114.
In the H.225 layer section 112, the compressed image data received by the network interface 113 is UDP protocol processed by the UDP layer section 122, and the RTP header is removed therefrom by the RTP section 120, after which the compressed image data, from which the RTP header has been stripped, is output to the video CODEC 104, via the receiving path delay section 103.
When this is done, in the receiving path delay section 103, in the case in which there is an offset in timing between the compressed audio data input to the audio CODEC 102 and the compressed image data input to the video CODEC 104, a delay is imparted to either the compressed audio data or the compressed image data, thereby compensating for this offset.
In the video CODEC 104, decompression and decoding are performed of the compressed image data output from the H.225 layer 112 via the receiving path delay section 103, thereby generating a video signal, which is output to the video I/O section 105, at which the video signal output from the video CODEC 104 is displayed on a display or the like.
In a TV telephone system of the past as described above, interframe predictive coding processing to compressed image data is performed, and communication of compressed image data is performed using the UDP protocol.
While the UDP protocol has the advantage of providing superior throughput of communication, because it does not provide error correction and retransmission control when a communication error occurs, part of the compressed data can be corrupted when received and, in the case in which part of the compressed image data is corrupted, there is a great disturbance of the displayed image.
In the case in which the compressed image data is INTRA video data in particular, because INTRA video data has a greater amount of information than INTER video data, when receiving INTRA video data, there is the possibility that part of the INTRA video data becomes corrupted.
Additionally, because decompression and decoding of INTER video data is done by referencing data from a previous frame, in the case in which the data from the previous frame has been corrupted, so that the screen display is disturbed, the displayed image will continue to be disturbed.
Thus, in a TV telephone system of the past, once the displayed image becomes corrupted, this corrupted image continues for a long period of time.
Accordingly, in view of the above-described drawbacks of the prior art, it is an object of the present invention to provide a TV telephone system in which the image displayed on a TV telephone apparatus is not disturbed for a long period of time, in the case in which compressed image data is sent via the IP network to the TV telephone apparatus.
In order to achieve the above-noted object, the present invention adopts the following basic technical constitution.
Specifically, the first aspect of the present invention is a TV telephone system in which encoding processing is performed on an audio signal so as to generate compressed audio data, and interframe predictive encoding processing is performed on a video signal, so as to generate INTRA video data not including interframe predictive coding signal and INTER video data including interframe predictive coding signal, the system having a plurality of TV telephone apparatuses which perform mutual communication with each other, via an IP network, using the compressed audio data, INTRA video data and INTER video data, wherein the TV telephone apparatuses of the TV telephone system perform INTRA video data communication using the TCP protocol.
In the second aspect of the present invention, the TV telephone apparatus comprise a microphone for capturing an audio signal; an audio signal compression section which generates the compressed audio data by performing audio compression and encoding of the audio signal captured by the microphone; a video camera for capturing a video signal; a video signal compression section which generates both the INTRA video data and the INTER video data by performing interframe predictive encoding for each new frame of the video signal captured by the video camera; a UDP section which performs UDP protocol processing of the compressed audio data output from the audio signal compression section, and which performs UDP protocol processing of INTER video data output from the video signal compression section, and which performs UDP protocol processing of a compressed audio data and INTER video data received via the IP network; a TCP section which performs TCP protocol processing of the INTRA video data output from the video signal compression section, and which performs TCP protocol processing of an INTRA video data received via the IP network; an IP section which performs IP protocol processing of the compressed audio data and the INTER video data output from the UDP section and the INTRA video data output from the TCP section, and which performs IP protocol processing of the compressed audio data, the INTRA video data and the INTER video data received via the IP network and output the compressed audio data and the INTER video data to the UDP section and output the INTRA video data to the TCP section; and a video decompression section which performs decompression and decoding of the INTER video data output from the UDP section and the INTRA video data output from the TCP section.
In the third aspect of the present invention, the video signal compression section comprises a video compression and encoding section, which performs interframe predictive encoding for each frame of a video signal captured by the video camera, thereby generating and outputting either INTRA video data or INTER video data, and which also outputs an INTRA/INTER identification signal, in synchronization with a generation of the INTRA video data and the INTER video data; and a switch, which, in resp0opnse to the INTRA/INTER identification signal, causes INTRA video data output to the TCP section and causes the INTER video data output to the UDP section.
In the fourth aspect of the present invention, the video decompression section comprises an adder, in which the INTRA video data output from the TCP section and INTER video data output from the UDP section are added; and a video decompression and decoding section, which performs decompression and decoding of the INTER video data and the INTRA video data added by the adder, so as to reproduce a video signal.
The sixth aspect of the present invention is a TV telephone system in which encoding processing is performed on an audio signal so as to generate compressed audio data, and frame differential image encoding processing is performed on a video signal, so as to generate INTRA macroblock data and INTER macroblock data, the system having a plurality of TV telephone apparatuses, which perform mutual communication with each other, via an IP network, using the compressed audio data, the INTRA macroblock data, and the INTER macroblock data, wherein the TV telephone apparatuses perform communication of the INTRA macroblock data using a TCP protocol, and performs communication of the INTER macroblock data using UDP protocol.
In the present invention configured as described above, in the case in which communication is performed between TV telephone apparatuses with a video signal and an audio signal, INTRA video data generated by performing interframe predictive encoding of a video signal is communicated by means of the TCP protocol.
By doing this, when performing communication of INTRA video data, because error correction and resend control is performed by the TCP protocol, data is not corrupted when receiving INTRA video data. Additionally, because decompression and decoding processing is performed of the INTRA video data without referencing a previous frame, the image of the INTRA video data is not disturbed.
In the case in which communication of INTER video data is done using the UDP protocol, in addition to improving the simultaneity of the image communication, even should part of the INTER video data become corrupted, when the next INTRA data is received, because the image is restored to a proper image, the image is not disturbed for a long period of time.