(1) Field of the Invention
The present invention relates to an encoding/recording device and an encoding/recording method for compressing and encoding video data based on time correlation properties of the video data, multiplexing the video data with audio data, and recording the multiplexed video data and audio data. More specifically, the present invention relates to a technology for enhancing a function to suspend recording by the encoding/recording device.
(2) Description of the Prior Art
In recent years, increasing amount of information is digitized. Especially, more and more sound and images are digitized since digital information suffers no degradation with the passage of time and is relatively easy to be processed. Hereafter, sound and images in digital format is collectively called “AV (audio-video) data”.
MPEG (Moving Picture Expert Group, including MPEG-2 in this specification) is an international standard used to compress AV data to record it effectively.
MPEG for video data uses a compression method based on time correlation properties between different pictures in addition to a conventionally used compression method based on discrete cosine transform (DCT). The compression method based on the time correlation properties achieves a high compression rate by representing one picture as differential data between the picture and other similar pictures to this picture which are reproduced before and after this picture. However, with this MPEG based on the time correlation properties, presenting order and decoding order for pictures are different, and therefore it is necessary to record, or decode and store a picture that is referred to for encoding another picture, prior to the other picture that refers to this picture. For MPEG, a picture that is referred to for encoding of another picture is called an I picture (intraframe predictively encoded picture) or a P picture (interframe predictively encoded picture). A picture that is encoded referring to another picture (I or P picture) is called a B picture (bi-directionally predictively encoded picture).
Video data is a plurality of sets of still image data per unit time (each set of still image data is hereafter called a video frame), and therefore video data usually contains similar images. As MPEG can provide a higher compression rate for video data containing more images similar to one another, MPEG is suitable for compression of video data.
MPEG can effectively compress data by providing a different compression rate for each image and dynamically assigning encoding bits to the image in accordance with its complexity.
Audio data, on the other hand, has a smaller size than video data, and therefore a different compression method than used for the video data is usually used.
For instance, a DVD recorder that records AV data onto a DVD-RAM according to MPEG allows a user to select whether to compress audio data. When selecting that the AV data should be compressed, the user can further select whether MPEG Audio or Dolby AC3 should be used-as a compression method. The DVD recorder then encodes the audio data using the selected compression method. When the user selects that no compression is performed for the audio data, LPCM (linear pulse code modulation) is performed for the audio data. The DVD recorder then encodes and compresses video data according to MPEG, multiplexes the encoded video data and audio data into a piece of MPEG System stream according to MPEG System, and records the piece of MPEG System stream.
With MPEG System, audio data and video data, which have been encoded and compressed, are divided into audio packets and video packets that have predetermined sizes, and time-division multiplexed into MPEG System stream. Hereafter, the terms “audio data” and “video data” are used to represent audio data and video data that have been encoded and compressed. MPEG System stream has a hierarchical structure composed of a pack and a packet, with one pack being composed of one or more packets. For instance, a pack recorded on a DVD-RAM is composed of one packet. For ease of explanation, one pack is assumed to be composed of one packet in this specification, as is the case with a pack recorded on a DVD-RAM.
FIG. 1 shows a construction of a pack and a packet generated according to MPEG System.
Each packet is 2 KB, and contains a pack header 11, a packet header 12, and a payload 13.
The pack header 11 contains an SCR (system clock reference) that shows a time at which the pack should be inputted to a video buffer or an audio buffer in an MPEG decoder.
The packet header 12 contains the following information: a stream ID that identifies the content of the payload 13; a DTS (Decoding Time Stamp) showing a decoding start time; and a PTS (Presentation Time Stamp) showing a presentation time. Note that an audio pack does not contain a DTS since audio data is decoded and presented almost simultaneously.
The payload 13 is audio data or video data.
Audio data is usually divided into audio packets that each contain audio data corresponding to one audio frame, and therefore a large-capacity audio buffer is not required by the MPEG decoder. As with video data, however, video frames have different sizes, and the differences in the size between different video frames are very large. For instance, video data corresponding to one video frame may be divided into a plurality of video packets. Accordingly, an MPEG decoder is required to have a video buffer that has at least the same size as a video frame of the largest size. Packs are positioned in MPEG System stream in order of an SCR, the earliest SCR first.
FIG. 2 is a diagram showing a standard decoder for MPEG System stream.
This MPEG decoder comprises the following elements: an STC (system time clock) 21 that generates a system time based on which the MPEG decoder operates, a demultiplexer 22 that separates system stream into audio packets and video packets based on a stream ID of each packet; a video buffer 23 that temporarily buffers video data; a video decoder 24 that decodes video data a reordering buffer 25 that temporarily stores video data to be referred to by other video data; a switch 26 that is used to adjust output order of video data; an audio buffer 27 that temporarily buffers audio data; and an audio decoder 28 that decodes audio data.
The following describes decoding operations by this MPEG decoder.
A pack is extracted to be inputted to the demultiplexer 22 when a system time generated by the STC 21 agrees with an SCR written in the pack. The demultiplexer 22 then refers to a stream ID of the inputted pack and sends a packet in the pack to either the video buffer 22 or the audio buffer 27 accordingly. The video buffer 23 accumulates payloads of packets sent by the demultiplexer 22 and manages a DTS and a PTS of each packet. The video decoder 24 reads video data that has a DTS equal to a current system time from the video buffer 23. This read video data corresponds to one video frame. The video decoder 24 then decodes the read video data. Following this, video data (i.e., an I picture or a P picture) which is referred to for encoding of other pictures is temporality buffered by the reordering buffer 25, and selectively outputted in accordance with a PTS via the switch 26. The video decoder 24 decodes video data (i.e., B picture) that is encoded referring to other pictures, and outputs it immediately. On receiving audio data that is a payload of each audio packet from the demultiplexer 22, the audio buffer 27 buffers it and manages a PTS in the audio packet. The audio decoder 28 reads audio data that has a PTS equal to a current system time from the audio buffer 37. This read audio data corresponds to one audio frame. The audio decoder 28 then decodes the read audio data.
In order to present images without delays, MPEG defines that an MPEG decoder starts decoding video data only after the video buffer 23 has become full. This generates a time lag between a start of accumulation of packets in the video buffer and a start of decoding for video data. This time lag is called “vbv_delay” in MPEG. MPEG also defines a capacity of a video buffer as 224 KB, and data of a size exceeding this capacity is not allowed to be buffered. MPEG further defines that the video buffer cannot buffer the same data for one second or longer.
To control a video buffer in accordance with MPEG in this way, an MPEG encoder assigns an SCR and a DTS to each pack appropriately when such packs are recorded.
In this way, video data has to be inputted to a video buffer a certain time before the video data is presented, while audio data has to be inputted to an audio buffer only shortly before its presentation. Accordingly, when video data and audio data should be presented simultaneously, the video data is multiplexed in prior to the audio data.
When a sequence of images is recorded after one recording for another sequence has been completed, the aforementioned time lag “vbv_delay” is generated between the two sequences, which prevents the two sequences from being continuously reproduced. This can happen, for instance, when a program and commercials are broadcasted and only the program is recorded without the commercials being recorded, or when different sequences, which are not consecutive, of images are taken by a digital video camera.
When packets corresponding to different sequences, which are not consecutive, are joined together to prevent the vbv_delay from being generated, however, a video buffer in the MPEG decoder can no longer be used in accordance with MPEG standard, and therefore may suffer a breakdown.
FIG. 3A shows transition of a size of video data buffered in a video buffer in an MPEG decoder used when video data for a sequence of consecutive images is reproduced. FIG. 3B shows transition of a size of video data buffered in the video buffer when video packets, which are not consecutive and have been joined together, are reproduced.
In FIG. 3B, the video buffer overflows at time t3 as a result of non-consecutive packets composed of packets corresponding to a period before time t1 in FIG. 3A and packets corresponding to a period after time t2 being joined together by discarding packets corresponding to a period from t1 to t2.
Joining non-consecutive packets without special considerations being given also causes the following problems.
To simultaneously present sound and images, an audio pack is multiplexed into MPEG System stream after a period equal to “vbv_delay” has passed since a video pack was multiplexed into the MPEG System stream. As a result, if video packets and audio packets corresponding to a certain period in MPEG System stream are discarded, and packets before and after the certain period are joined together, sound corresponding to images immediately before the joined part of the stream are lost while sounds corresponding to the discarded images remain.
Secondly, since an audio frame and a video frame have different frame generation frequencies, deleting certain audio frames and video frames in units of respective frames results in generating a time lag, the so-called “lip sync (synchronization) lag”, between sound and images for frames that follow the deleted frames. For instance, Dolby Digital AC-3 compresses audio data for a DVD-RAM as audio frames whose frame generation frequency is 32 msec although a video frame has a frame generation frequency of 33.3667 msec. Accordingly, the lip sync lag will almost certainly occur if certain audio frames and video frames are deleted in units of respective frames.
Lastly, when two non-consecutive audio frames are joined together after certain audio frames are deleted, the two non-consecutive audio frames often do not have similarities. As a result, noise is generated when these audio frames are inputted to an audio buffer to be reproduced.
Accordingly, it is not appropriate to delete certain packets or frames from MPEG System stream and to join remaining packets or frames together without special considerations being given.