1. Field of the Invention
The present invention relates generally to audio and video decompression. More particularly, the present invention relates to a system and method for reducing audio breakups and other artifacts during audio and video decompression.
2. Description of the Related Art
Compression and subsequent decompression of audio and video data has wide spread application. Applications include digital video transmission and digital television. In a typical digital video transmission, the participants transmit and receive audio and video signals that allow the participants to see and hear one another.
To efficiently transmit the large amount of video and audio data generated at particular digital video transmission sites, digital video transmission systems typically digitize and compress the video and audio data for transmission across digital networks. Various compression schemes are available, and various digital networks are available as well.
Decompression of audio and video data at a receiving end of a digital video transmission system may be implemented in hardware, software, or a combination of hardware and software. Decompression of video data typically includes decoding sequential frames of the video data, converting the decoded frames of video data from one format (e.g. Luminance-Chrominance color space YUV)to another (e.g. Red, Green, and Blue (RGB) space), and rendering the converted frames of data. Decoding frames of video data may include decoding of blocks of pixels, performing motion compensation, inverse quantization to de-quantize data, inverse scanning to remove zig zag ordering of discrete cosine transforms, and/or inverse discrete cosine transformation to convert data from the frequency domain back to the pixel domain. Compressed frames of audio data are received and sequentially decoded into decoded frames of audio data. The decoded frames of audio data are subsequently rendered.
FIG. 1 shows in block diagram form, a prior art system for decompressing streams of video and audio frames in a digital video transmission system. The system in FIG. 1 includes a video decoder 102, a YUV to RGB converter 104, a video renderer 108, a Video reference memory 106, an audio decoder 110, and an audio renderer 112. The video decoder 102, YUV to RGB converter 104, video renderer 108, audio decoder 110, and audio renderer 112 are implemented wholly or partly in a processor executing respective software algorithms.
In FIG. 1, compressed video frames are first decoded by video decoder. Each decoded frame of video data is subsequently stored in the Video reference memory 106. A subsequent compressed frame of video data is decoded as a function of the decoded frame of video data previously stored within the reference memory. After decoding a frame of video data, YUV to RGB converter circuit 104 converts the format of the decoded frame of video data from YUV into RGB. Finally, once converted, the video renderer 108 renders the decoded frames of video data which are then subsequently displayed on a monitor. Compressed frames of audio are received and subsequently decoded by audio decoder 110. The decoded frames of audio data are subsequently rendered by audio renderer 112. More particularly FIG. 2 shows that decoded frames of audio data are first stored in a buffer 202 (typically a FIFO). Speaker 204 generates audio corresponding to the individual frames of decoded audio data stored in buffer 202. In a correctly operating digital video transmission system, speaker 204 constantly generates audio from decoded audio data stored in buffer 202.
The various activities of decoder 100 must be achieved in nearly real time. Failure to keep up with real time results in unnatural gaps, jerkiness, or slowness, in the motion video or audio presentation. Prolonged failure to keep up with the incoming compressed data will eventually result in the overflow of some buffer and the loss of a valid video frame reference which, in turn, results in chaotic video images due to the differential nature of the compressive coding.
The decompression algorithms described above may execute concurrently with an operating system. Frequently, software applications other than digital video transmission decompression algorithms, are required to be executed concurrently (i.e. multi-tasked) with the digital video transmission decompression algorithms. The processing power requirements of the operating system, digital video transmission independent software applications and the decompression algorithms described above, may cause the processor to become congested. In addition, some portions of the coded video require more processing than others. When an extended epoch of video of high computational complexity is received, the decoder may become congested. When congested, processors may not have enough processing power to execute the decompression algorithms at a rate sufficient to keep up with the source of the encoded data. Processor congestion is a state which is often incompatible with the real time requirements of digital video transmission.
Processor congestion may cause noticeable effects in the decompression of audio and video data. FIGS. 3 and 4 contrast the effects of processor congestion during video and audio decompression. FIG. 3 illustrates video and audio decompression when the processor has sufficient processing power. FIG. 4 illustrates potential effects on video and audio decompression when the processor is overloaded or congested.
FIG. 3 shows the display timing of subsequent images I1 through I6 corresponding to compressed video frames VF1 through VF6, respectively, after video frames VF1 through VF6 have been decompressed. FIG. 3 also shows the timing aspects of generating audio Al through A6 corresponding to compressed frames of audio data AF1 through AF6, respectively, after frames of audio data AF1 through AF6 have been decompressed. It is noted that various transmission formats may be used and the number of audio frames and video frames may be unequal in some transmission formats. When the processor has sufficient processing power (i.e., the processor is not congested) subsequent image frames I1 through I6 are displayed on a display screen at time intervals in general compliance with digital video transmission scheduling standards, thereby creating a continuous and artifact free sequence of displayed images. Likewise, when the processor has sufficient processing power, subsequent intervals of audio A1 through A6 are generated at time intervals in general compliance with digital video transmission scheduling standards, thereby creating continuous and artifact free audio. Audio artifacts occur when a noticeable time gap occurs between the generation of audio corresponding to any two consecutive frames of audio data.
FIG. 4, as noted above, illustrates the effects on video and audio decompression when the processor experiences congestion. With respect to video decompression, when the processor is congested, the scheduled decoding, converting, or rendering of one or more compressed frames of video data VF1 through VF6 may be-delayed, which, in-turn, may delay the display of one or more corresponding images I1 through I6 as shown in FIG. 4. Likewise, if the processor is congested, the scheduled decoding of one or more compressed frames of audio data AF1 through AF6 may be delayed, which in turn, delays the generation of one or more corresponding audio Al through A6 as shown in FIG. 4. The delay in audio generation manifests itself in the form of audio breakup. It is noted that digital video transmission participants are highly sensitive to audio breakups when compared to video artifacts.
The present invention seeks to reduce audio breakups caused by processor congestion. A further goal of the present invention is to prevent loss of valid Video reference memory with its consequential incorrect decoding of video images. In accordance with one embodiment of the present invention, one or more selected processes in the video process are identified as not necessary for maintaining a valid state for the decoder, and one or all of such processes are temporarily bypassed. Bypassing results in skipping frames in the visual presentation and the liberation of processing power to address more important activities such as maintaining a valid decoder state and presenting an uninterrupted audio stream.
In one embodiment, a method of the present invention comprises decoding first video data. First and second audio data are also decoded into first and second decoded audio data, respectively. First audio is generated from the first decoded audio data. Typically audio is generated from a speaker of the computer system employing this embodiment of the present invention. The method then determines whether second audio can be generated from the second decoded audio data without substantial time delay between the time the first audio generation ends and the time the second audio generation begins. In one embodiment, substantial time delay may be defined as the time between first and second audio generation that creates noticeable audio breakup. If the second audio can be generated without the substantial time delay between the first and second audio, then, according to the method, the second audio is generated from the second decoded audio data and a first image corresponding to the decoded first video data is displayed. However, if second audio could not otherwise be generated without the substantial time delay, the second audio will be generated without displaying the first image corresponding to the decoded first video data. In this embodiment, the method skips displaying the first image corresponding to the decoded first video data. To reduce the visual effects of skipping the display of the first image, the method, in one embodiment, redisplays an image corresponding to previously decoded and displayed video data. Since the first image corresponding to the decoded first video data will not be displayed, the computer system need not further process the decoded first video data which in turn reduces the processing load on the processor.