1. Field of the Invention
This invention relates to electronic recording and playback devices, and more particular to electronic recording and playback devices for recording and playing back compressed digital video and audio data.
2. Description of Related Art
A number of approaches exist for recording audiovideo sequences. Analog video tape solutions, such as VHS and Beta brand video cassette recorders, use a frame-to-frame recording process where each video frame is recorded on individual analog tracks at a fixed frame rate of 30 Hz for NTSC or 25 Hz for PAL. Each frame comprises two fields at 60 Hz for NTSC encoded video and two fields at 50 Hz for PAL encoded video. This analog method of recording allows for random access to each frame in the video sequence.
The Moving Picture Experts Group (MPEG) has promulgated widely accepted international standards for compression, decompression, and synchronization of digital video and audio signals. In particular, MPEG has defined a set of standards for video compression algorithms commonly used by broadcasters and studios for recording and broadcasting digital encoded video. The video and audio specifications give the syntax and semantics of encoded video and audio bitstreams necessary for communicating compressed digital video as well as for storing such video on media in a standard format. The MPEG1 standard is officially described in ISO/IEC 11172 and the MPEG2 standard in ISO/IEC 13818.
More particularly, the MPEG standards define how elementary streams of encoded digital audio and video data are multiplexed and converted to an MPEG format, which may then be communicated on a channel in some channel-specific format for direct replay or for storage before channel transmission and replay. Within a channel is a Channel Stream, which could be a System Stream in MPEG1, and either a Program Stream or Transport Stream in MPEG2.
A processed Channel Stream is demultiplexed, and the elementary streams produced are input into Video and Audio Decoders, the outputs of which are decoded video and audio signals. FIG. 1 is a block diagram of a prior art MPEG decoder, showing the application of a Channel Stream of MPEG encoded data to a channel specific decoder 1 which decodes the channel-specific transmission format. The output of the channel specific decoder 1 is coupled to a system decoder 2, which demultiplexes the digital audio and video data. Video data is further applied to a video decoder 3, and audio data is further applied to an audio decoder 4. There is a flow of timing information among the several decoders, controlled by a clock controller 5. All elementary data streams are decoded and presented with time synchronization.
The MPEG2 standard specifically defines three types of video pictures or frames: intra-coded, predicted, and bi-directional. Intra-coded or I-frames are coded using only information present in an image frame itself. I-frames provide random access points into a compressed video data stream. I-frames use only transform coding (discrete cosine transform, or "DCT"), and therefore provide moderate compression. Predicted or P-frames are coded based in part on information within the nearest previous I or P frame, using a technique called forward prediction. P-frames provide more compression than I-frames and serve as a reference for bi-directional or B-frames and for later P-frames. P-frames can propagate coding errors since P-frames are generally predicted from previous P-frames. B-frames are frames that use both past and future frames as a reference. Bi-directional frames provide the most compression of the three frame types and do not propagate errors because they are never used as a reference. The MPEG2 algorithm allows the encoder to choose the frequency and location of I-frames, and thus MPEG recordings have a non-fixed frame rate. This characteristic makes it difficult to have random accessibility to a scene cut in a video sequence which is between I-frames. To achieve random access, a video sequence must start at an I-frame.
It would be desirable to be able to record and playback encoded video and audio that is digitally encoded at a non-fixed frame rate. However, two problems exist when attempting to perform such recording: a "channel changing" problem arising from attempts to record video sequences from two or more independent MPEG data streams, and a "pause and resume" problem arising from attempts to record a single MPEG video sequence while occasionally pausing such recording.
The MPEG standards provide a timing mechanism that ensures synchronization of audio and video. The MPEG1 standard defines two parameters used by an MPEG decoder: a system clock reference (SCR) and a presentation time stamp (PTS). The MPEG2 standard adds a program clock reference (PCR) that is equivalent to the SCR. Both the SCR and PCR have extensions in order to achieve a resolution of 27 MHz (the term "SCR/PCR" will be used herein to denote either clock reference). The SCR/PCR is a snapshot of the encoder system clock. The SCR/PCRs used by an MPEG video decoder 3 and audio decoder 4 must have approximately the same value for proper synchronization. The video decoder 3 and audio decoder 4 update their internal clocks using the SCR/PCR value sent by the system decoder 2. Each decoded video picture and decoded audio time sequence (both are also referred to as "presentation units") has a PTS associated with it. The PTS represents the time at which the video picture is to be displayed or the starting playback time for the audio time sequence.
If video sequences from two or more independent streams are spliced together, the SCR/PCR values will be different for each of the streams. Hence, the PTS for the video and audio will become unlocked from the original SCR/PCR clock. Conventionally, the video decoder 3 and audio decoder 4 will either discard the affected presentation units if their PTS is earlier (as a smaller value) than the current SCR/PCR, or repeat the affected presentation units if their PTS is later (has a larger value) than the current SCR/PCR. In either case, the output is visually and audibly affected by the clock mismatch.
When pausing during the recording of an MPEG video sequence, mixing of streams having B-frames or P-frames will result in improper decoding. Streams from the different recording sessions must be separated to avoid generation of predicted picture frames using inappropriate picture references (i.e., I or P frames from a different video sequence). This problem also occurs when splicing two independent sequences.
Digital Video (DV) is a relatively new video compression format standard. DV does not contain the bi-directional and predictive frames of MPEG2. Therefore, session boundaries can be at any frame. DV produces a fixed data rate of approximately 25 Mbps utilizing a fixed 5:1 compression based on 4:1:1 YUV video sampling. DV compression relies on discrete cosine transforms like MPEG, but adds the enhancement of field interpolation on low-motion scenes. When recording digital video data, such information is interleaved across the recording medium (e.g., tape) within a single frame. This technique is used to reduce dropouts and other tape artifacts commonly found in analog track formats. Although DV provides random access, it does not have the ability to record more highly compressed MPEG data streams, which include P-frames or B-frames.
Accordingly, the inventors have determined that it would be desirable to be able to intermittently record and pause MPEG video sequences, to splice independent sequences, and to provide random access to recorded video sequences without the problems of current technology. The present invention provides a method and apparatus for accomplishing this goal.