The present invention relates to a disc-shaped recording medium for use in recording high-efficiency coded digital image data and a disc recording apparatus and a disc reproducing apparatus for use in recording and reproducing the high-efficiency coded digital image data on and from the disc-shaped recording medium.
As is well known, a CD-ROM (Compact Disc Read Only Memory) is standardized based on a music CD (Compact Disc Digital Audio: hereinafter simply abbreviated as CD-DA).
Initially, a physical format thereof will be described briefly. The physical format is the format in which data can be read out from the CD-ROM disc when such disc is loaded onto a CD-ROM drive.
One disc can include music tracks or data tracks of 99 tracks at maximum. Information concerning such tracks are recorded on the disc at its starting portion called a TOC (Table of Contents), i.e., at the innermost peripheral portion of the disc. A portion in which the TOC is recorded is referred to as a leading track (leading Track). On the other hand, the portion in which a piece of last music is recorded on the CD-DA is referred to as a leadout track (Leadout Track).
In the CD-DA, a stereo audio signal is converted into digital audio signal at a sampling rate of 16 bits and 44.1 kHz and recorded, data of 2 (stereo).times.2 (16 bits).times.44.100=176,400 bytes is recorded per second. In the CD-ROM, a sector which results from dividing one second equally by 75 is handled as a minimum unit and therefore one sector is formed of 2,352 bytes.
In the case of CD-ROM MODE-1, one sector includes SYNC data (12 bytes), header (4 bytes), ECC (Error Collection Coding: 276 bytes) for error-correction, EDC (Error Detect Coding: 4 bytes) or the like and therefore remaining 2048 bytes are recorded as user data. ECC and EDC are omitted from the data, such as audio and image data, which need not be strictly error-corrected by data interpolation processing or the like, and 2,336 bytes except SYNC and header are recorded in one sector as user data. This is referred to as CD-ROM MODE-2.
Recently, personal audio equipment called mini disc (trademark) that can be recorded and reproduced have been developed and are now commercially available on the market.
The mini disc employs an EFM (Eight to Fourteen Modulation) as a disc writing modulation system and CIRC (Cross Interleave Reed-Solomon Code) as an error correction code. Audio data compressed according to an ATRAC (Adaptive Transform Acoustic Coding) system is recorded in accordance with this format. Compressed data is recorded at every block called a cluster as shown in FIG. 6. This is a format very close to the above-mentioned CD-ROM MODE 2.
The CD-ROM uses 98 frames of CD as one sector. This is equivalent to a playback time of 13.3 ms. A CIRC interleave length is 108 frames (14.5 ms) and is longer than one sector of the CD-ROM. When data is recorded by using the CIRC error correction code, it is necessary to secure at least 3 extra sectors. This area is referred to as a link area. It is necessary to maintain a link area of 108 frames (1 sector+.alpha.) before data starts being written. After data was written, it is necessary to maintain the area of 108 frames similarly, otherwise the error correction interleaving is not completed.
If data is written from an arbitrary position, then the link area is dispersed into respective portions of the disc with the result that efficiency with which data is recorded and reproduced is deteriorated. Therefore, data is written at every recording unit of a certain magnitude. This recording unit is referred to as a cluster in the mini disc. One cluster is formed of 36 sectors. This rewriting is constantly carried out at the unit of an integral multiple of one cluster. Data to be recorded is temporarily stored in a RAM and then written in the disc. This RAM can be used commonly as a shock-proof memory that can realize a shock-proof function when used upon reproduction.
In the magneto-optical disc type mini disc that can be recorded and reproduced, 3 sectors of one cluster (=36 sectors) are maintained as a link sector and the next 1 sector is maintained as a sub-data sector. Compressed data is recorded on the remaining 32 sectors.
When data is recorded, data starts being written from about the 2nd link sector of the preceding cluster. When the writing of data on the 36th sector is finished, error correction data has to be written into the starting link sector and the 2nd link sector.
In the mini disc that is similar to the preformatted CD, data need not be rewritten at the unit of clusters and data is recorded successively so that 3 sectors on the link area need not be provided. Therefore 4 sectors (3 link sectors and 1 sub-data sector) are all assigned to sub-data in which graphics data or the like can be stored.
As described above, in the mini disc, the recording disc and the preformatted disc differ from each other in sub-data capacity so that, if sub-data is included, then it is not possible to copy data from the preformatted disc to the recording disc completely.
When a part of previously-recorded data is rewritten in the recording disc, if data to be updated is of very small amount, the whole of the cluster should be rewritten because data are interleaved at the unit of clusters.
On the other hand, an image signal high-efficiency coding system for digital storage media is standardized according to the MPEG1 (Moving Picture Image Coding Experts Group Phase 1) standard. Storage media that are an object of this system are media whose successive transfer rate is 1.5 Mbit/sec or smaller, such as CD, DAT (digital audio tape), hard disk or the like. Moreover, the storage media not only are connected to a decoder directly but also are assumed to be connected to the decoder through transmission media, such as a computer bus, a LAN (local area network) and telecommunication system. Further, the storage media can be not only played back in the forward direction but also played back in some special way, such as random access, high speed playback and reverse direction playback.
A principle of image signal high-efficiency coding system based on the MPEG1 is as follows.
In accordance with the high-efficiency coding system, redundancy in the timebase direction is reduced by calculating differences between image signals, whereafter redundancy in the timebase direction is reduced by using discrete cosine transform (DCT) and variable length coding.
Redundancy in the timebase direction will be described below.
It is customary that preceding and succeeding images and a target image (image at a certain time) of a continuous moving picture are very similar to each other. Therefore, as shown in FIG. 16, for example, if difference between an image to be coded and a preceding or succeeding image is calculated and a resultant difference is transmitted, then it becomes possible to reduce an amount of transmitted information by reducing redundancy in the timebase direction. The image thus coded is referred to as a predictive-coded picture (Predictive-coded picture, P picture or P frame) which will be described later on. Similarly, if the difference between an image to be coded and a preceding or succeeding image or an interpolated image generated from the preceding and succeeding images is calculated and a resultant small difference is transmitted, then it becomes possible to reduce an amount of transmitted information by reducing redundancy in the timebase direction. The image thus coded is referred to as a bidirectionally predictive-coded picture (Bidirectionally Predictive-coded picture, B picture or B frame) which will be described later on. In FIG. 16, an image represented by reference symbol I represents an intra-coded picture (Intra-coded picture, I picture or I frame) which will be described later on. An image represented by reference symbol P in the figure represents the P picture, and an image represented by reference symbol B represents the B picture.
Motion compensation is carried out in order to generate a predictive image.
According to the motion compensation, a block (referred to hereinafter as a macroblock) of 16.times.16 pixels composed of a unit block of 8.times.8 pixels, for example, is extracted and a macroblock with a smallest difference is searched from the position near the position of a macroblock corresponding to the preceding image. Then, it is possible to reduce an amount of transmitted data by calculating difference between the extracted macroblock and the searched macroblock. In actual practice, it is possible to code the P picture (predictive-coded picture) by selecting images with small data amounts at the macroblock unit of 16.times.16 pixels from image whose difference relative to the motion-compensated predictive image is calculated and image whose difference relative to the motion-compensated predictive image is not calculated.
However, in the above-mentioned case, large data amount have to be transmitted with respect to an image portion, such as background or the like which can be exposed after an object was moved. Therefore, in the B picture (bidirectionally predictive-coded picture), data of decoded forward or backward image whose motion was compensated from a time standpoint, a difference between an interpolated image generated by adding the preceding two images and an image to be coded and an image having no difference, i.e., an image to be coded, is calculated and the result with the smallest data amount is coded.
Redundancy in the spatial axis direction will be described below.
Difference of image data is not transmitted as it is but processed at every unit block of 8.times.8 pixels by discrete cosine transform (DCT). The DCT expresses an image not by the pixel level but by an amount of frequency component of cosine function. By 2-dimensional DCT, for example, data of the unit block of 8.times.8 pixels is converted to coefficient block of 8.times.8 cosine function components. In general, it is frequently observed that an image signal representing a natural picture obtained by a television camera becomes a smooth signal. In this case, it is possible to efficiently reduce data amount by processing the image signal by DCT.
Specifically, if the smooth signal, such as the image signal representing the natural picture, is processed by DCT, then a large value is concentrated near a particular coefficient. If this coefficient is quantized, then almost all of the 8.times.8 coefficient blocks become zero and only a large coefficient remains.
Thus, when data of 8.times.8 coefficient blocks is transmitted, if such data is transmitted in the form of a Huffman code composed of a set of non-zero coefficients and a 0-run representing the number of 0s provided ahead of a non-zero coefficient in a zig-zag scanning fashion, then it becomes possible to reduce a transmission amount. The decoding side rearranges an image in the opposite procedure.
A structure of data that the above-mentioned coding system handles is illustrated in FIG. 17. The data structure shown in FIG. 17 is composed of a block layer, a macroblock layer, a slice layer, a picture layer, a group of picture (GOP: Group Of Picture) layer and a video sequence layer, in that order from below. This data structure will be described sequentially from the bottom layer in FIG. 17.
Initially, in the block layer, each unit block of the block layer is composed of 8.times.8 pixels (pixels of 8 lines and 8 pixels) of luminance or adjacent color difference blocks. Each unit block is processed by the above-mentioned DCT.
In the macroblock layer, each macroblock is composed of 6 blocks, 4 luminance blocks (luminance unit blocks) Y0, Y1, Y2, Y3 adjoining in the left and right and upper and lower directions and 2 color difference blocks (color difference unit blocks) Cr, Cb which are equivalent at the same positions as those of the luminance blocks. These blocks are transmitted in the order of Y0, Y1, Y2, Y3, Cr and Cb. In this coding system, a type of predictive image (reference image from which a difference is calculated) and transmission of the difference are judged at the macroblock unit.
The slice layer is composed of one or a plurality of macroblocks connected in the scanning order of images. In the header of the slice layer, a difference between a motion vector and a DC (direct current) component within an image is reset and the first macroblock includes data representing the position within the image. Thus, when an error occurs, the slice layer can be restored. For this reason, the length and the starting position of the slice layer are made arbitrary and can be varied depending on the error state on a transmission line.
In the picture layer, a picture, i.e., each image is composed of one or a plurality of slice layers. In accordance with the coding system, the slice layers are classified as four kinds of images of the intra-coded picture (I picture or I frame), the predictive-coded pictures (P picture and B frame) and the DC intra-coded picture (DC coded (D) picture).
In the intra-coded picture (I picture), upon coding, only closed information in one picture is used. Accordingly, in other words, upon decoding, a picture can be reconstructed by only information of the I picture. In actual practice, a difference is not calculated and image data is processed by DCT and coded. Although this coding system is generally low in efficiency, if the I picture is inserted everywhere, then it becomes possible to carry out random access and high-speed reproduction.
In the forward predictive-coded picture (P picture), I picture or P picture located at the advanced position from a time standpoint and which was already decoded is utilized as a predictive picture (picture which becomes a reference to calculate a difference). In actual practice, a more efficient method of coding a difference of a motion-compensated predictive picture and a method of (intra-) coding a predictive picture without calculating a difference is selected at the unit of macroblock.
In the bidirectionally predictive-coded picture (B picture), there are used three kinds of pictures of I picture or P picture located at the advanced position as the predictive picture from a time standpoint, which are already decoded, and an interpolated picture generated from the aforementioned two pictures. Thus, the most efficient method of coding a difference between the motion-compensated three kinds of pictures and the intra-coding method can be selected at the unit of macroblock.
The DC intra-coded picture is an intra-coded picture formed of only a DC coefficient of the DCT and cannot exist on the same sequence as those of the other three kinds of pictures.
The above-mentioned group of picture (GOP) layer is composed of one or a plurality of I pictures and 0 or a plurality of non-I pictures.
When the input order to the encoder is set to 1I, 2B, 3B, 4P*5B, 6B, 7I, 8B, 9B, 10I, 11B, 12B, 13P, 14B, 15B, 16P*17B, 18B, 19I, 20B, 21B, 22P, the sequential order of the output from the encoder, i.e., input to the decoder, is set to 1I, 4P, 2B, 3B*7I, 5B, 6B, 10I, 8B, 9B, 13P, 11B, 12B, 16P, 14B, 15B*19I, 17B, 18B, 22P, 20B, 21B, for example.
The reason that the sequential order is replaced in the encoder is that, when the B picture, for example, is coded or decoded, the delayed I picture or P picture which becomes a predictive picture from a time standpoint should be coded previously. A spacing (e.g., 9) of the I picture and a spacing (e.g., 3) of the I picture or B picture can be set arbitrarily. The spacing of the I picture or P picture may of course be changed within the group of picture layer. A pause of the group of picture layer can be represented by "*". Also, reference symbol I depicts the I picture, reference P depicts the P picture and reference symbol B depicts the B picture.
The video sequence layer shown on the topmost portion of FIG. 17 is composed of one or a plurality of group of picture layers whose picture sizes and image rates are the same.
Assuming that the digital image data high-efficiency-coded according to the MPEG1 system is recorded on the previously-mentioned mini disc, then the following problems will arise.
If the 1GOP recording unit is set to an arbitrary unit which is not related to the cluster, there is then the possibility that image data of 1GOP will be recorded over two clusters or greater. In this case, the GOP is started from somewhere in one cluster and may be ended somewhere in another cluster. As a result, it becomes difficult to carry out an edit processing, such as replacement to other GOP, by using the GOP as a cut unit. Even if the edit processing can be carried out, there is then the problem that an average transfer rate is lowered.
When the P picture or B picture which takes the final frame of the immediately-preceding GOP as the predictive picture (picture which becomes a reference) is disposed at the starting portion of the 2GOP, the immediately-preceding GOP also should be decoded in order to decode the P picture or B picture of the GOP. There is then the problem that, upon seek reproduction such as a fast forward and a reverse reproduction, it becomes difficult to reproduce a picture rapidly.
An object of the present invention is to enable an edit processing based on the unit of GOP to be carried out by a simple and rapid processing and to enable a special reproduction, such as a fast forward and a reverse reproduction to be carried out by a simple and rapid processing when high-efficiency-coded digital image data is recorded on a disc-shaped recording medium, such as a mini disc or the like.