1. Field of the Invention
The present invention relates to a video and audio data compression system for compressing and coding moving pictures and sounds (hereinafter referred to as "video and audio data"), and particularly to a video and audio data compression system which is readily usable by another system.
2. The Related Art
As this type of video and audio data compression system, a video and audio compressing device is disclosed in Japanese Patent Application No. Hei-7-185596 which was not published at the time when the present application was filed in the Japanese Patent Office. According to this video and audio compressing device, moving pictures and sounds (video and audio data) are compressed in conformity with MPEG (Moving Picture coding Experts Group) which is widely used as a video and audio data compression system at present.
FIG. 11 is a block diagram showing the construction of a video and audio data compression system.
As shown in FIG. 11, the video and audio data compression system comprises device control means 1 for controlling the operation of plural applications, applications 2 and video and audio compressing means 5 which are controlled by the device control means 1, input means for inputting command data from a keyboard, a mouse or the like, and output means 4 for outputting a compression status to a display or the like.
The video and audio compressing means 5 comprises interface (I/F) processing means 10 for interfacing with the device control means 1, process selecting means 11 for determining which one of video data (moving picture) compression processing, audio data (sounds) compression processing and system processing should be selected and performed, video compression processing means 12 for compressing and coding the video data, audio compression processing means 13 for compressing and coding the audio data, system processing system 14 for integrating the compressed and coded video data and the compressed and coded audio data, compression target file device 15, and compressed file device 16.
FIGS. 13A to 13C are a diagram showing the construction of video data which are compressed and encoded in conformity with MPEG, wherein FIG. 13A shows a video sequence, FIG. 13B shows the structure of GOP (group of pictures), and FIG. 13C shows an arrangement of I, P and B pictures.
As shown in FIG. 13A, the video sequence starts from the sequence head at the head to GOP (Group of Pictures), ends with sequence end code, and includes one or GOP(s).
As shown in FIG. 13B, each GOP comprises one or more picture (s), and one picture represents one image (one frame). Three types of pictures are set as the type of pictures. A first type of picture is I picture (Intra-picture) is expandable on the basis of only its frame, a second type of picture is P picture (Predictive picture) which is expandable on the basis of preceding frames, and a third type of picture is B picture (Bidirectional Predictive picture) which is expandable on the basis of preceding and following frames. One GOP contains one or more I picture(s), and zero, one or more P picture(s) and B picture(s).
FIGS. 14A and 14B show the data structure of the audio data which are compressed and encoded in conformity with MPEG. FIG. 14A shows an audio sequence, and the audio sequence comprises plural AAU (audio Access Unit) as shown in FIG. 14A. Each AAU comprises an AAU header containing information on a synchronous word, a bit rate, a sampling frequency, etc., and compressed audio data as shown in FIG. 14B.
FIGS. 15A to 15C show the data structure of the video and audio data (moving pictures and sounds) which are compressed and encoded in conformity with MPEG. As shown in FIG. 15A, the video and audio data comprises plural packs, and each pack comprises one pack header and one or more packets as shown in FIG. 15B. The packets are classified into two types: video packets and audio packets.
As shown in FIG. 15C, each video packet comprises a packet header and video data, and the video data of respective video packets are extracted and continuously linked to one another to construct a video sequence.
As shown in FIG. 15C, each audio packet comprises a packet header and audio data, and the audio data of respective audio packets are extracted and continuously linked to one another to construct a audio sequence.
FIG. 16 shows the construction of the video compression processing means 12 shown in FIG. 11. As shown in FIG. 16, the video compression processing means 12 comprises video compressing control means 601 for controlling the entire video compression processing, original picture reading means 602 for reading original picture data from the compression target file device every compression unit, color signal converting means 603 for converting the original picture to a color signal format (YCrCb format) which is usable for MPEG, motion estimating means 604 for searching the motion between the picture of a preceding/following frame and the picture of the present frame on a block (the space area of 8 pels.times.8 pels in MPEG) basis, motion-compensated predicting means 605 for calculating the differential values between the pixel values of the blocks of the preceding/following frame and the present frame on the basis of the picture motion which is searched by the motion estimating means 604, DCT (Discrete Cosine Transform) means 606 for performing discrete cosine transform, quantizing means 607 for performing quantization, VLC means 608 for performing high efficiency variable-length encoding, a compressed code buffer 609 for storing video data which are compressed after VLC, dequantizing means 610 for performing dequantization, IDCT means 611 for performing inverse discrete cosine transform, motion compensating means 612 for adding the differential value and the pel (pixel) value of the block of the preceding/following frame to calculate a new reference frame, and a reference frame unit 613 for storing the picture of the preceding/following frame to be referred to.
The video compression is performed as follows. That is, as shown in FIG. 16, an original picture is read from the compression target file device every compression unit by the original picture reading means 602 to convert the read-in data to YCrCb data by the color signal conversion means 603, the motion between the preceding/following frame picture and the present frame picture is searched every block area, and then the compression of the present frame is started.
When the compression of I picture is performed, the pixel values of each block of the present frame are subjected to discrete cosine transform by the DCT means 606, quantized by the quantizing means 607, and then highly efficiently compressed to Huffman codes by the VLC means 608. The data thus compressed are successively stored into the compressed code buffer 609.
Next, in order to decode a compressed picture to regenerate a picture used as a reference picture, the quantized data are dequantized by the dequantizing means 610, subjected to inverse discrete cosine transform by the IDCT means 611, and stored in the reference frame unit 618.
Further, when the compression of P picture, B picture is performed, by the motion-compensated predicting means 605, the differential values are calculated between the pixel values of each block of the present frame and the pixel values of each block of the preceding frame or preceding/following frames which have been stored in the reference frame unit 613 and referred to on the basis of the motion searched by the motion estimating means 604. Thereafter, the differential values are subjected to the discrete cosine transform by the DCT means 606, then quantized by the quantizing means 607, and then highly efficiently compressed to Huffman codes. The data thus compressed are successively stored into the compressed code buffer 609.
Next, in the compression processing of the P picture, in order to regenerate a compressed picture used as a reference picture, the quantized data are dequantized by the dequantizing means 610, and then subjected to the inverse discrete cosine transform by the IDCT means 611 to calculate the differential values. The differential values are added to the pixel values of each block of the preceding frame which has been stored in the reference frame unit 613 and is referred to by the motion-compensated predicting means 604, and then the sum is stored in the reference frame unit 613.
The video compression control means 601 controls the entire series of compression processing as described above, and controls to temporarily intercept the compression processing at the interval of a constant time and restart it.
FIG. 17 shows the construction of the audio compression processing means 13 shown in FIG. 11. As shown in FIG. 17, the audio compression processing means 13 comprises audio compression control means 701 for controlling the entire audio compression processing, original sound data reading means 702 for reading out original audio data from the compression target file device, original audio data chop up means 703 for chopping up original audio data into AAUs, 32 frequency band mapping means 704 for performing frequency band mapping processing every AAU unit, psychoacoustic processing means 705 for performing psychoacoustic processing, quantizing and encoding means 706 for performing linear quantization and encoding, frame forming means 707 for adding encoded data with additive information to form compression data of one AAU, and compressed code buffer 708 for storing the compressed data formed in the frame forming means 707.
The audio compression is performed as follows. As shown in FIG. 17, audio data having the data amount corresponding to the reproduction time which is needed to reproduce the moving pictures (video data) compressed by the video compressing means are read out from the compression target file by the original audio reading means 702, and the data of one AAU (1152 samples in the case of MPEG audio layer 2) are chopped up from the read-in original audio data by the original audio data chop up means 703. The data thus chopped up are subjected to the following processing every AAU.
The 32 frequency band mapping means 704 divides an input signal into subband signals of 32 bands by a subband analyzing filter, and calculates a scale factor for each subband signal to normalize the dynamic range. The psychoacoustic processing means 705 performs fast Fourier Transform on the input signal and calculates apsychoacoustic masking by using the transform result to determine bit allocation to each subband. The quantizing and encoding means 706 performs the quantization and encoding processing according to the determined bit allocation. The frame forming means 707 adds a header and auxiliary information to the quantized and encoded subband signal, shapes a bit stream and stores it into the compressed code buffer 708.
The audio compression control means 701 controls a series of entire compression processing as described above, and controls to intercept the compression processing at the interval of a constant time or the processing time of one AAU and restart the processing.
FIG. 18 is a diagram showing the construction of the system processing means 14 shown in FIG. 11. As shown in FIG. 18, the system processing means 14 comprises a video data buffer 801 for storing compressed video data which are supplied from the video compression processing means 12, an audio data buffer 802 for storing compressed audio data which are supplied from the audio compression processing means 13, multiplexing means 804 for packetizing and packing the compressed video data and the compressed audio data, a time code counter 803 for adding time data to packs and packets, and a compressed video and audio code buffer 805.
As shown in FIG. 18, the system processing means 14 receives the compressed video code from the video compression processing means 12 and stores the compressed video code into the video data buffer 801. Further, it receives the compressed audio code from the audio compression processing means 13 and then stores the compressed audio code into the audio data buffer 802. The multiplexing means 804 chops up the compressed video code into packet size, adds a packet header and a pack header thereto, and further receives the reproduction time stamp from the time code counter 803 and adds the time stamp to the pack header and the packet header. Further, the multiplexing means 804 chops up the compressed audio code into packet size from the audio data, adds a packet header and a pack header, and further it receives the reproduction time stamp from the time code counter 803 and adds the time stamp to the pack header and the packet header. The packs thus formed are successively stored into the compressed video and audio code buffer 805.
The operation of the video and audio compressing device will be described hereunder with reference to FIG. 11.
First, when the start of the compression is instructed through the input means 3 by an operator, the video and audio data compression means 5 is invoked by the device control means 1. When the video and audio compressing means 5 is invoked, the control is shifted through the I/F processing means 10 to the processing selection means 11.
The processing selection means 11 first invokes the video compressing means 12. The video compression processing means 12 opens a compression target file which is indicated by the operator from the compression target file device 15 every several frames (one compression unit) to start the compression process of video data (moving pictures).
One compression unit of moving pictures corresponds to the GOP unit as described above or is set to a picture range from an I picture or P picture to the next I picture or P picture.
Next, when the control is shifted to the processing selection means 11, the audio compressing means 13 is invoked. The audio compressing means 13 reads from the compression target file device 15 compression target file of audio data whose data amount corresponds to the reproduction time of the number of frames (one compression unit) which are processed in the video compression processing means 12, and starts the compression processing of audio data (sounds). In the case of the audio compression processing, the processing is returned to the processing selection means 11 after every completion of AAU processing as in the case of the video compression processing. When the last AAU in the compression unit of video data is completed, the audio compressing means 13 notifies to the processing selection means 11 that the system processing should be performed next.
The audio compression process may be more efficient when it is performed every AAU, and usually the audio compression processing is performed for the maximum number of AAUs within the time corresponding to one video compression unit. Further, for audio data whose data amount is smaller than the residual one AAU, the audio data are compressed when the control is shifted to the audio compression processing means 13 next time.
Next, when the processing is shifted from the audio compression processing means 13 to the processing selection means 11 for the last time in the video compression unit, the system processing means 14 is invoked. The system processing means 14 receives the compressed and encoded data of one compression unit which are generated in the video compression processing means 12 for one picture compression unit and the audio compression processing means 13 for several audio compression unit in one picture compression unit, packetizes and packs them so that the compressed video and audio code are synchronized with each other, and then writes them into the compressed data file.
The video data (moving pictures) compression process, the audio data (sounds) compression process, and the system process are performed every compression unit as described above.
In the above-described video and audio compressing device shown in FIG. 11, however, it is scarcely considered that the compression function of video and audio data is incorporated into another application having no compression function of video and audio data. Therefore, the video and audio compressing device has such a problem that it is difficult to incorporate the video and audio data compression function into another application. This is because if the video and audio data compression function should be incorporated into another application having no video and audio data compression function, expert knowledge is needed to form video, audio and system data according to the method as described above in the video and audio compressing device which is described with reference to FIG. 11, etc.
FIG. 12 is a diagram showing the processing flow in the device shown in FIG. 11, and shows the processing flow when another application 2 in FIG. 11 is selected as foreground application in the device shown in FIG. 11. In this case, another application can be smoothly operated by frequently shifting the processing from the I/F processing means 10 to the device control means 11. However, the video and audio data compression means 5 shown in FIG. 11 is constructed as one application, and thus if only the compression processing portion is separated from the application and incorporated in another application, complicated control is needed and thus expert knowledge is also needed.