The transmission of HD video/audio data, in particular HD Television (HDTV or just HD), requires the processing of extremely high data rates both at the transmitter and the receiver. This is because HDTV has a resolution much higher than traditional television systems, about one million pixels per frame as 720p or about two million pixels per frame as 1080i. This is roughly five times that of standard TV. A furtherdoubling of the data rate to be transmitted is required when 50 frames per second (fps) instead of the usual 25 fps shall be displayed. 50 fps avoids blur particularly for moving objects and thus results insignificantly better pictures.
Today, full high definition video/audio information, e.g. 1080i 50 fps or 1080p 60 fps, is mostly sent over proprietary links which are usually not compliant to the Internet Protocol (IP). Though it would be very advantageous to use the IP for such transmissions, there are numerous problems connected with transmitting and synchronizing of HD video/audio data over an IP network or using the IP in a non-IP network. If HD video/audio content is transmitted using the IP, other use of the IP network is limited because of the bandwidth needed: using H.264 (MPEG-4/AVC) introduces high latency and using JPEG or Motion JPEG results in image quality limitations. JPEG2000, which solves some of the problems, is presently available only for professional high-end, i.e. expensive, equipment.
The usual way to transmit or broadcast HD video/audio data over a packet switching network, e.g. an Ethernet, IP, or UDP network, is to compress the video frames and, optionally, the audio data, and divide, better: pack, the compressed frames into a series of packets. Packet switching networks have the advantage that the utilization of the network capacity is optimized, the response times are minimized, and the robustness of communication is increased. However, when the packets traverse network adapters, switches, and other network nodes, they must be buffered and queued which results in variable delays, depending on the traffic load in the network. This is one of the inherent problems of transmitting HD video/audio data over packet switching networks.
Packet switching of video/audio data requires that the sending device or transmitter unit forms packets and identifies each of them. This is achieved by adding a header which contains at least a time stamp and a sequence number; in addition, the first packet of a video frame contains a so-called M flag. This identification allows to rearrange the packets into the correct sequence at the receiver. To generate the time stamps, the transmitter unit contains a timer, usually crystal-controlled, which increments with a fixed frequency.
When transmitting video/audio data, the video data packets are usually generated and transmitted separately from the audio packets. Also, the audio data, often multiplexed from several audio channels, are generally of a much smaller size than the video data.
In this so-to-speak textbook approach for transmitting video/audio data, whether HD or not, over a packet switching network, the receiver must be able to generate from the received video and audio packets an uninterrupted, “lip-synchrous” video/audio data stream with as little delay as possible. All this must be performed reliably and with high throughput. Obviously this is not an easy task and requires solving a number of problems when HD video/audio data are to be transmitted.
One of these problems is the synchronization of the audio with the video data. Take a movie for example: the spoken word must be well synchronized with the lips of the person speaking. This is a problem because, usually, the video data are transmitted separately from the audio data.
A second problem is the unavoidable delay. As mentioned above, this problem is generic to packet switching networks because each packet is transmitted separately so that the packets may well arrive at the receiver in another sequence and/or with different delays as they were originally sent. The solution is to buffer and rearrange the packets to obtain a usable data stream.
A third problem is that all that has to be performed without interruption or delay—or any delay, in particular a varying delay, must be hidden, i.e. made invisible because the customer sitting in front of a HDTV display will not accept—and perhaps even pay for—a movie that comes with unintended and unwanted breaks.
Reasons for the above-mentioned problems are manifold. The compressed HD video images differ in size depending on picture contents and compression rate. Also, compression and decompression needs time and thus introduces delays. Further, as mentioned above, transmission over Ethernet or an IP network or any other packet switching network introduces variable delays between transmitter and receiver depending on the load of the net and the distance between receiver and transmitter. The receiver however must produce video images at a precise rate. Even further, since the transmitter and receiver clocks cannot but run asynchronously, even the smallest difference will result in distortions, e.g. cause a video image to blur or tear at one point in time. Finally, the audio information is difficult to synchronize with video information because of the separate transmission.
There are state-of-the-art JPEG2000 encoder/decoder ICs available, e.g. from Analog Devices in Norwood, Mass., USA, but, apart from their still high price, the maximum frequency of those presently available would be exceeded when compressing 1080p 60 fps HD video. Splitting the load onto several JPEG2000 encoder/decoders would introduce more latency/delay, depending on how the load is split amongst the processors. A less expensive solution would be to implement transmitter and receiver in ASICs, but this would also lower the flexibility.
The present invention provides a solution for these and other problems by furnishing a cost-effective, dynamic system for transmitting HD audio/video data over an IP network.