The present invention relates to a video and audio coding method, a coding apparatus, and a coding program recording medium and, more particularly to a coding method, a coding apparatus and a coding program recording medium in which video, audio, or video and audio are captured and coded under software control using a general purpose computer resource.
Techniques for digitizing video or audio as analog data to obtain digital video data or audio data have been widespread and developed because it is easy to handle recording, transmission, editing, and reproduction of digital data. An advantage of digitization is that data can be compressed with ease and compressive coding is very important especially for recording or transmission. For the compressing coding techniques, international standards have been established, among which MPEG (Moving Picture Experts Group) standard is well-known as a general digital standard in which video or audio may be handled.
In addition, with high-speed and low-cost semiconductor devices in computers and VLSI or the like, cheap personal computers which are called multimedia personal computers are on the market. As a result, regeneration of video and audio as compressively coded digital data by conventional addition of decode hardware can be easily realized in the personal computers by a software. Also, delivery of video or audio is performed by an internet and coded data of video and audio according to MPEG is extensively utilized.
As concerns coding to produce coded data of video or audio, software processing is difficult in the personal computer and special hardware must be added thereto. Though it is possible to perform coding by software processing after recording video and audio as files, it takes time several times as long as input time of video and audio to perform conversion, so that it is not appealing to the user.
In order to realize that general personal computer users can capture video including a moving picture or audio to produce coded data, it is desired that a capture board or a sound board is used to capture a moving picture or audio and real-time coding can be performed by a software, which must be developed with progress and speed use of hardware.
A prior art apparatus which performs xe2x80x9cA video codingxe2x80x9d, xe2x80x9cB audio codingxe2x80x9d and xe2x80x9cC video and audio codingxe2x80x9d is described hereinafter as an example of a status quo of video, audio or video and audio coding.
A. PRIOR ART VIDEO CODING APPARATUS
Video including a moving picture or a still picture is digitized in real time and captured into a computer to perform coding processing in real time by using an expansion card for personal computer which performs coding the video in real time according to the MPEG as an international standard for moving picture compression.
FIG. 58 is a block diagram illustrating a structure of a video coding apparatus that is realized in a computer including such special hardware. As shown in Figure, the prior art video coding apparatus comprises a coding section 5001 and a coding parameter decision means 5002, the apparatus inputting video as input picture data and outputting video coded data. The coding section 5001 includes a DCT (discrete cosine transform) processing means 5003, a quantization means 5004, a variable length coding means 5005, a bit stream generating means 5006, an inverse quantization means 5007, an inverse DCT processing means 5008, and a prediction picture generating means 5009.
In the Figure, the coding section 5001 is used for inputting video data comprising a series of still pictures in which video is digitized as input picture data and performing coding processing according to set coding parameters to output coded data. Individual still picture data constituting input picture data is referred to as a frame picture. The coding parameters are given by the coding parameter decision means 5002 mentioned later as indicators of a coding type and resolution.
The coding parameter decision means 5002 is used for deciding a coding type indicating an intra-frame coding or an inter-frame coding and resolution, and outputting the same to the coding section 5001.
In the coding section 5001, the DCT means 5003 is used for performing DCT processing to the input picture data and outputting resulting DCT data. The quantization means 5004 is used for quantizing the DCT data and outputting quantized data. The variable length coding means 5005 is used for performing variable length coding processing to quantized data to produce variable length coded data that is compressively coded. The variable length coded data is input to the beam stream generating means 5006, from which coded data of the video coding apparatus is output as a bit stream which can be transmitted and recorded.
The inverse-quantization means 5007 is used for performing inverse-quantization processing to the quantized data that is output from the quantization means 5004 and outputting inverse-quantized data. The inverse-DCT means 5008 is used for performing inverse-DCT processing to inversely-quantized data and outputting resulting inverse DCT data, which is input to the prediction picture generating means 5009 and output as prediction picture data. In case of coding using prediction picture in accordance with the coding parameters, difference data between the prediction picture data and the input picture data is input, thereby inter-frame coding is performed in the coding section 5001.
An operation of video coding in the video coding apparatus constructed above is described hereinafter.
Prior to coding, the coding parameter decision means 5002 decides coding parameters including a coding type and resolution and outputs the same to the coding section 5001.
Generally, compressive coding includes an intra-frame coding in which a still picture of a frame (corresponding to a screen) is compressed excluding redundancy based on a spatial correlation (intra-frame correlation) thereof, and an inter-frame coding in which still pictures of consecutive frames that are close in time are compressed excluding redundancy based on time correlation (inter-frame correlation).
The prior art video coding apparatus basically performs intra-frame coding. In addition to the intra-frame coding, the apparatus performs inter-frame coding, thereby coded data with high-compression ratio is obtained. However, to perform inter-frame coding, prediction pictures are produced by decoding or motion detection and motion compensation processing, and difference between the prediction picture and a picture to be coded is obtained. These processing causes increased burden on the apparatus. For generation of the prediction picture in inter-frame coding, forward prediction on the basis of previously processed data, backward prediction on the basis of subsequently processed data, and bidirectional prediction in which the forward prediction or backward prediction is performed, one of which is employed. Hereinafter, the intra-frame coding, the forward predictive coding, the bidirectionally predictive coding (including the backward coding) are represented by xe2x80x9cIxe2x80x9d, xe2x80x9cPxe2x80x9d, and xe2x80x9cBxe2x80x9d, respectively.
Resolution of a picture is generally represented by a number of pixels in longitudinal and lateral directions in a screen such as xe2x80x9c320xc3x97240xe2x80x9d or xe2x80x9c160xc3x97120xe2x80x9d. High resolution, i.e., many pixels in one screen can provide data of high playback quality of picture. However, targets to be processed is increased, causing increased burden on processing.
To conform to the MPEG standard, it is necessary to input/output or transfer data at a given transfer rate. Coded data must be output with this transfer rate satisfied. In case of processing video, the transfer rate is generally expressed as a frame rate that is represented by a number of frames per/sec.
Therefore, it is desirable to set parameters so that real-time processing is performed to captured video with the frame rate satisfied and simultaneously coded data of high playback quality of picture (high resolution) using a high compression ratio as possible is obtained, allowing for processing capability of the video coding apparatus.
In the prior art video coding apparatus, it is considered that the coding parameters are preset allowing for these factors. The coding parameter decision means 5002 holds this set parameters and outputs the same to the coding section 5001 for coding. With regard to the coding type of the coding parameters, a method of deciding a coding type on the basis of information of input video as xe2x80x9cscene changexe2x80x9d is disclosed in xe2x80x9cpicture coding apparatus (Japanese Patent Application Hei. No. 8-98185)xe2x80x9d.
A parameter of resolution of coding parameters input to the coding section 5001 is input to the DCT processing means 5003 and used for processing. A parameter of the coding type is used for controlling switching of input to the DCT processing means 503, between input picture data itself and difference data between the same and prediction picture.
The DCT processing means 5003 performs DCT processing to the input frame picture or the difference data on the basis of resolution that is input from the coding parameter decision means 5002 and outputs resulting DCT data. In the DCT processing, data to be processed is divided into (8xc3x978 pixels) blocks and two-dimensional inverse DCT is performed for each divided block. The quantization means 5004 quantize the DCT data using a give value and outputs quantized data. Quantization is generally performed by division using a value in quantization step (the given value). The variable length coding means 5005 performs variable length coding to the quantized data and outputs variable-length coded data. The variable length coding is performed by allocating a code with the fewest bits to data having the highest frequency in allocation of bits for coding, thereby amount of data is reduced. The bit stream generating means 5006 generates a bit stream from the variable-length coded data that is output from the variable length coding means 5005 and outputs the bit stream as an output of the video coding apparatus.
In case of inter-frame coding, the following operation is performed. The inverse-quantization means 5007 inversely quantize the quantized data that is output from the quantization means 5004 and outputs resulting inversely-quantized data. The inverse-DCT processing means 5008 performs two-dimensional inverse DCT processing to the inversely-quantized data for each (8xc3x978 pixels) block that is divided by the DCT processing means 5003 and outputs resulting inverse DCT data. The prediction picture generating means 5009 generates prediction picture on the basis of the inverse DCT data, to be output. The difference data between the input picture data and the prediction picture is input to the DCT processing means 5003.
B. PRIOR ART AUDIO CODING APPARATUS
An audio coding method according to a subsampling coding method conforming to an MPEGAudio system is employed for coding audio with extensive subband such as human voice, music, natural sound, or various effective sounds.
In multi media personal computers of high performance, it is possible to perform coding in real time to audio that is captured using a sound board which is standard on the computer.
A first audio coding apparatus according to a prior art in which input audio is coded according to the subsampling coding method will be described.
In addition, there is a method of applying psychoacoustic analysis as a method of audio coding conforming to MEPG1Audio.
Generally, in an encoder conforming to MEG1Audio, priority for allocating bits to each subband is decided after consideration of limit of human hearing ability or masking effect using psychoacoustic model. This is for high-efficiency coding adapted to static and dynamic hearing characteristics of human being, but does not affect a data format according to MEPG1Audio standard, so that it is possible to produce coded data of MEPG1Audio without this. Also, since processing burden of psychoacoustic analysis is large as mentioned later, this processing is dispensed with as illustrated in the first example, thereby significant processing burden on CPU can be reduced. Since psychoacoustic analysis is not applied, playback quality of sound is degraded.
An audio coding in which the psychoacoustic analysis is applied is described as a second example of a prior art audio coding.
B1-PRIOR ART AUDIO CODING APPARATUS
FIG. 59 is a block diagram illustrating a structure of an audio coding apparatus according to a first example of prior art. As shown in Figure, the audio coding apparatus comprises an audio input unit 2551, an input audio sampling unit 2553, a subsampling unit 2555, a coding bit allocation unit 2556, a quantization unit 2557, a coding unit 2558, and a coded data recording unit 2559.
In Figure, the audio input unit 2551 is used for inputting audio to be coded. Generally, the audio is input from a microphone or as a line input. The input audio sampling unit 2553 is realized in input function and control program and used for sampling the audio that is input from the audio input unit 2551. The subsampling unit 2555 is used for subsampling the sampled data. The coding bit allocation unit 2556 is used for allocating coding bits to each subband that is divided by the subsampling unit 2555. The quantization unit 2557 is used for quantization in accordance with the number of coding bits allocated by the coding bit allocation unit 2556. The coding unit 2558 is used for outputting quantization value that is output from the quantization unit 2557 as coded audio data. The units 2555 to 2558 are realized in CPU in a computer, a main memory, and a program. The coded data recording unit 2559 is realized in a storage device such as a magnetic storage and in a control program of the storage device.
FIG. 60 is a flowchart of the prior art audio coding method. FIG. 61 is a diagram for explaining sampling. FIGS. 62 and 63 are diagrams for explaining subsampling.
Hereinafter, an operation of the first prior art audio coding apparatus with reference to FIGS. 59 to 63 and simultaneously following a flow in FIG. 60.
In FIG. 60, in step 1, the input audio sampling unit 2553 samples input audio signals at a sampling frequency fs to obtain sampled data. As shown in FIG. 61, the input audio is represented by a graph indicating a relation between time and sound pressure. The input audio is sampled every sampling cycle (time ts). As shown in Figure, a reciprocal relation is established between the sampling cycle ts and the sampling frequency fs.
In subsequent steps including step 2 in FIG. 60, operation is chiefly performed in software under control of CPU. In step 2, sampled data is sub-sampled into M frequency bands. FIG. 62 is a diagram illustrating dividing audio data as band input signals into 12 subbands. As shown in Figure, 12 subband signals from subband 0 signals BPFO to subband 11 signals BPF 11 are produced. FIG. 63 is a diagram illustrating 12 subband signals to sub-sampled. In the Figure, subband signals are different from those in FIG. 61 and sound pressure is represented using not time but frequency.
In case of MPEG audio, layers 1 to 3 are defined. In the layers 1 to 3, playback quality of sound and required hardware performance becomes higher, and hardware scale becomes larger in the order of 1xe2x86x922xe2x86x923. In the audio coding which is adapted to the layer 1, the number p of input audio samples to be sub-sampled at a time is p=32. 512 samples including 32 samples as a target are divided into 32 subbands and each subband audio data is output.
M subband signal data resulting from subsampling in step 2 is passed from the subsampling unit 2555 to the quantization unit 2557.
In step 3, the coding bit allocation unit 2556 allocates coding bits to M subband signals. In step 4, the quantization means 2557 quantizes the subband signal data that is passed by the subsampling unit 2555 for each subband in accordance with the number of coding bits allocated by the coding bit allocation unit 2556, to obtain a quantization value. In step 5, the coding unit 2558 performs coding to the quantization value to be output and resulting coded data is recorded by the coded data recording unit 2559.
While audio is being input, steps 1 to 5 are repeated. Audio is continuously input and real-time processing is performed to the same, thereby coated data is output and recorded. On completion of audio input, the coding is completed.
Coded data stored in the storage device is preserved as MPEG regenerative data. Alternatively in place of recording and storage, the coded data may be transmitted over a network and used.
B-2 PRIOR ART AUDIO CODING APPARATUS
FIG. 64 is a block diagram illustrating a structure of a second prior art audio coding apparatus. As shown in Figure, the second prior art coding apparatus comprises an audio input unit 2651, an input audio sampling unit 2653, a subsampling unit 2655, a quantization unit 2657, a coding unit 2658, a coded data recording unit 2659, an FFT (Fast Fourier Transformation) unit 2660, a psychoacoustic analysis unit 2661, and a coding bit allocation unit 2662. The apparatus has a structure in which the FET unit 2660 and the psychoacoustic analysis unit 2661 are added to the first apparatus.
In Figure, the FET unit 2660 is used for performing ET processing to signals. To the signals that have been processed in the FET unit 2660, the psychoacoustic analysis unit 2661 is used for performing comparison with minimum audible limit and analysis of masking effect. The coding bit allocation unit 2662 is used for performing allocation of coding bits on the basis of analysis of the psychoacoustic analysis unit 2661 so that allocation of coding bits to audible signals is increased. The audio input unit 2651, the input audio sampling unit 2653, the subsampling unit 2655, the quantization unit 2657, the coding unit 2658, and the coded data recording unit 2659 are identical to those of the first application and will not be discussed.
FIG. 65 is a flowchart of MEG1Audio coding. FIG. 66 is a diagram illustrating minimum audible limit. An operation of the second prior art audio coding apparatus is described with reference to FIGS. 64 to 66.
In Steps 1 to 2 in FIG. 65 are performed as in the first example, resulting in M subband signals. Suppose that M=32 subband signals are obtained, for example. As in the first example, the subband signals are passed from the subsampling unit 2655 to the quantization unit 2657.
In step 3, after the FET unit 2660 divides sampled input audio data into L subbands using FET processing, it passes the resulting signals to psychoacoustic analysis unit 2661, which analyzes the L signals. For example, in case of the layer 1 of MPEG audio, 512 sampled data is used. The FET unit 2660 performs subsampling into L=256 subbands. In case of layer 2, 1024 samples are used to output 512 subbands, causing increased processing burden.
The psychoacoustic analysis unit 2661 compares each subband signals with the minimum audible limit indicating inaudible limit level shown in FIG. 66. FIG. 66 shows division of 32 subbands. IF the number is increased (256), a graph of the minimum audible limit remains unchanged and subdivision is performed with respect to lateral axis (subband) in the same range shown in FIG. 66.
To subbands which have been decided that they are less than the minimum audible limit by the psychoacoustic analysis unit 2661, bits are not allocated in subsequent steps. Therefore, more bits are allocated to subbands other than them.
As concerns auditory sense of human being, there is known a masking phenomenon that relatively little sound, namely signals of low sound pressure cannot be head when there is big sound which is close in frequency or in time, namely signals of high sound pressure. The psychoacoustic analysis unit 2661 checks relation between each subband signals and signals which are close to them to detect signals masked (inaudible) due to the masking phenomenon.
To the signals that have been detected herein, bits are not allocated in subsequent steps, so that more bits are allocated to subbands other than them.
In step 5 of flow in FIG. 65, the coding bit allocation unit 2662 performs allocation of coding bits on the basis of the analysis of the psychoacoustic analysis unit 2661. At this time, allocation is performed to M subbands on the basis of analysis of L subbands. Therefore, to the signals which are inaudible or less audible to human being, bits are not allocated, so that more bits are allocated to audible signals.
In subsequent steps including a step 6, as in the first example, steps 1 to 7 are repeated, thereby audio coding is performed upon input of audio.
Thus, more coding bits are allocated to audio sound, thereby in MPEGAudio audio coding which adopts psychoanalysis, coded audio data of high playback of picture can be obtained.
C. PRIOR ART VIDEO AND AUDIO CODING APPARATUS
FIG. 67 schematically shows a prior art video and audio coding apparatus. As shown in Figure, the prior art video and audio coding apparatus comprises a video camera 2701, an audio capture unit 2702, an audio coding unit 2703, a video capture unit 2704, and a video coding unit 2705.
As shown in Figure, coded audio information and coded video information are output from the apparatus and transmitted or recorded as required.
In the same Figure, the video camera 2701 is used for capturing video and audio information and dividing it into analog audio information and analog video information, to be output. The audio capture unit 2702 is used for inputting analog audio information that is output from the video camera 2701 and outputting the same as digital pro-audio information comprising discrete digital data. The audio coding unit 2703 is used for compressively coding the pre-audio information and outputting coded audio information. The video capture unit 2704 is used for inputting analog video information that is output from the video camera 2701 and outputting digital pro-video information comprising discrete digital data and plural pieces of still pictures per unit of time. The video coding unit 2705 is used for inputting pro-video information that is output from the video capture unit 2704 and compressively coding to output coded video information.
An operation of capturing and coding video and audio in real time in the prior art video and audio coding apparatus constructed above is described hereinafter.
The video camera 2701 captures video and audio information and divides it into analog audio information and analog video information, to be output.
The analog audio information is input to the audio capture unit 2702, which performs analog-to-digital conversion to produce digital pro-audio information, which is output to the audio coding unit 2703. The analog video information is input to the video capture unit 2704, which performs analog-to-digital conversion to produce digital pro-video information comprising plural still pictures, which is output to the video coding unit 2705.
The audio coding unit 2703 performs coding to pro-audio information and outputs coded audio information. The video coding unit 2705 performs coding to the pro-video information and outputs coded video information.
While video and audio are being captured, digitization and coding are performed by the audio capture unit 2702, the audio coding unit 2703, the video capture unit 2704, and the video coding unit 2705. On completion of capturing video and audio, digitization and coding are completed.
As shown in the prior art examples A to C, in the video coding apparatus, the audio coding apparatus, and the video and audio coding apparatus according to the prior art, video, audio, or video and audio are captured and coded in real time, and coded video data, coded audio data, or coded video data and coded audio data are output, to be recorded or transmitted.
A. PROBLEM OF PRIOR ART VIDEO CODING
However, to implement the video coding apparatus which can perform real-time processing shown in prior art A in a general purpose computer system such as a personal computer (PC) as the one which executes software for coding, since the software may be executed in hardware with various performance in various environments (peripheral equipment or network environment), the following problems occur.
For example, in case of performing real-time processing to captured input video and coding according to MPEG1 standard in (320xc3x97240) resolution as a implementation of the real-time video coding apparatus as application soft which operates on PC, suppose that repetition of xe2x80x9cIBBPBBxe2x80x9d is selected, the xe2x80x9cIxe2x80x9d, the xe2x80x9cPxe2x80x9d and xe2x80x9cBxe2x80x9d representing intra-frame coding and inter-frame coding as coding types, respectively.
If the software processing is performed in relatively high performance, for example, with CPU operating at operating frequency 16 MHz, assume that 6 frame pictures are processed in {fraction (6/30)} second according to the coding type xe2x80x9cIBBPBBxe2x80x9d pattern as described above. In this case, video can be coded in real time in 30 frames/sec.
On the other hand, if the software processing is performed in low performance, for example, with CPU operating at operating frequency 100 MHz, the coding cannot be performed in {fraction (6/30)} second, resulting in coded data with low frame rate. When the frame rate of the coded result is not higher than 30 (frame/sec), motion of video obtained by regenerating the coded result is not obvious, so that preferable coding cannot be achieved.
In case of executing such software processing as one task on a multi operating system, when another application soft such as a word processor is executed as another task or there is an interrupt, the same goes for a hardware environment with relatively high performance.
In addition, the same coding can be smoothly executed in resolution xe2x80x9c320xc3x97240xe2x80x9d. However, in resolution xe2x80x9c640xc3x97400xe2x80x9d, processing speed is not sufficient, causing a problem due to reduction of the frame rate.
A case in which hardware performance is degraded has been described. On the other hand, in some cases, high-performance hardware is not made the best use of.
For example, when input video is captured and processing is performed in real time in a hardware with CPU operating at a frequency 166 MHz and coding is performed according to MPEG 1 standard in resolution xe2x80x9c320xc3x97240xe2x80x9d on condition that only xe2x80x9cIxe2x80x9d of a coding type is used, since a frame picture can be processed in {fraction (1/30)} sec, video can be coded in real time in 30 frames/sec.
If the software processing is performed in relatively high performance, for example, with control device (processor, CPU) operating at operating frequency 200 MHz, the hardware performance is not made use of, for one frame picture processing of the xe2x80x9cIxe2x80x9d type can be performed in less than {fraction (1/30)} sec. Use of a high-performance control device causes cost-up. Therefore such video coding apparatus does not provide excellent cost-performance.
In this case, for example, if coding is performed using xe2x80x9cPxe2x80x9d or xe2x80x9cBxe2x80x9d type as well as xe2x80x9cIxe2x80x9d type, coded data with a high compression ratio of the same picture quality can be obtained. Therefore, a device resource is not made use of by using only the xe2x80x9cIxe2x80x9d type for producing coded data in a low compression ratio.
The same goes for a case in which a computer resource (allocation of CPU time) is exploited for execution on a multi task operating system beyond expectation of coding is performed in resolution xe2x80x9c160xc3x97120xe2x80x9d that is lower than xe2x80x9c320xc3x97240xe2x80x9d.
B. PROBLEM OF PRIOR ART AUDIO CODING
In the prior art audio coding method according to the first example, audio can be captured and coded in real time by software processing in a multi media personal computer with a sound board.
However, this is realized, provided that a device with performance which is sufficient for real-time coding is used. The device comprises an LSI adaptively designed for a purpose or a processor with high performance is selected. In a control device (processor) with a low performance, data is recorded as a file in the middle of processing and the recorded data is processed, which requires time several times as long as real time.
When the subsampling coding for use in MPEGAudio is executed in CPU as in a software and audio is input and processed in real time, a hardware environment in which the software is executed, that is, capability or incapability is decided depending on CPU performance. For example, real-time coding cannot be performed on a coding level corresponding to CPU performance.
The audio coding apparatus constructed above is designed to input audio and perform real-time coding at a given rate. In case of general-purpose personal computer or the like, CPU processing capability is degraded by effect of another task due to multi task processing or interruption, so that audio coding cannot be performed according to initialization, which is difficult to handle.
As shown in the second example B-2, in the subsampling coding using psychoacoustic analysis, coded data of high playback quality of picture is obtained by performing bit allocation according to human hearing characteristics.
However, division, into many subbands and conversion and comparison of divided signals causes considerable burden. The psychoacoustic analysis yields double processing burden. Therefore, use of the psychoacoustic analysis in a standard personal computer makes it difficult to capture audio and perform processing in real time. As a result, high-performance hardware such as a specified processor or board must be added, or coding must be performed taking time after recording as a file without performing real-time processing.
C. PROBLEM OF PRIOR ART VIDEO AND AUDIO CODING
As should be appreciated from the foregoing, in the prior art video and audio coding apparatus, pro-audio information (digital audio information), and pro-video information (digital video information) are directly input to corresponding coding units, respectively, to be performed coding therein. Therefore, the audio coding unit and the video coding unit each requires capability of processing the pro-audio information and the pro-video information with reliability according to the MPEG standard, for example. For example, when the audio coding unit inputs audio information (1 sample=1 byte) at a sampling frequency 48 KHzit requires reliable capability of coding audio information of 48 Kbyte per/sec. When the video coding unit inputs video information (320xc3x97240, one pixel=2 byte, 30 fps), it requires reliable capability of coding video information of 4.6 Mbyte per/sec.
Therefore, in the past, the audio coding unit and the video coding unit operate independently an use specified hardware with which coding is performed with reliability, thereby coding of video and audio is implemented. It is extremely difficult to realize coding of audio and video as a software program which operates on a multi task operating system using a general-use CPU without using a specified hardware.
This is because each coding unit operates as a task and another task uses some operating time of CPU when another software (resident program in which communication processing is performed) also functions as another task on the multi task operating system, when coding is stopped. Therefore, the coding software cannot always process audio or video fully. As a result, it is not necessarily possible to obtain preferable coded result with no trouble such as discontinuity of audio or video.
Further, there is another problem other than xe2x80x9canother taskxe2x80x9d in processing video and audio. Since video coding and audio coding are always processed as separate tasks on the multi task operating system, they affect each other, thereby video becomes nonuniform and still pictures constituting video information change very moment. It is extremely difficult to perform coding (compression) to one part of a series of video and it may take much time to perform coding to the same. In this case, if another software is not operating at all on the operating system, the video coding unit uses considerable CPU time and processing in the audio coding unit is delayed, thereby only coded result with audio discontinuity is obtained.
There is still another problem. In the video and audio coding apparatus comprising a software in execution on a general-purpose computer, as in the case of A or B, there""s possibility that the software is executed on a computer system with various hardware capability. Therefore, apart from the problem that computer capability allocated to coding is reduced for one period irrespective of average capability of coding audio and video, there""s a possibility that video and audio cannot be performed at initial value in software design if the hardware does not have sufficient computing capability. In this case, if the computing capability which is consumed in video coding is not reduced quickly so as to conform to the operating system, preferable coded result cannot be obtained, causing audio discontinuity in playback.
Off course, some coded results of video coding may have defects due to effect of audio coding. However, in general, amount of video data is more than that of audio data in a fixed time period and lack of audio data affects playback more significantly rather than that of video data, so that the problem in audio coding is more important. Therefore, audio discontinuity must be avoided.
As shown in A to C, when video, audio or video and audio is are captured and coded in real time by executing a coding software on a general-purpose computer such as a personal computer, the problems are as follows.
(1) Performance of hardware in which the software is executed is important. Preferable coded result cannot be obtained in a low hardware performance and an apparatus resource may not be made use of in a high hardware performance.
(2) When the software is executed on the multi task operating system, another task affects a task being performed by a coding unit. The share of another task in the apparatus resource affects coding virtually as in the case (1) of high/low hardware performance.
(3) In case of processing video and audio, video coding and audio coding affects each other as another task.
Accordingly, it is an object of the present invention to provide a video coding method wherein video is captured and coded in real time, in which coding parameters including resolution or a coding type are properly set and an apparatus resource is utilized in accordance with a basic capability of a computer that executes the coding method, thereby preferable coded result can be obtained.
It is another object of the present invention to provide an audio coding method wherein audio is captured and coded in real time, in which coding is controlled and the apparatus resource is utilized in accordance with a basic capability of the computer that executes the coding method, thereby preferable coded result can be obtained.
It is still another object of the present invention to provide an audio coding method wherein audio is captured and coded in real time, in which coding is controlled and the apparatus resource is utilized in accordance with capability of the computer at that point of time that executes the coding method, thereby preferable coded result can be obtained.
It is a further object of the present invention to provide an audio coding method wherein audio is captured and coded in real time, in which alternative processing or psychoacoustic analysis is executed and the apparatus resource is utilized in accordance with a basic capability of the computer that executes the coding method, thereby preferable coded result can be obtained.
It is a still further object of the present invention to provide a video and audio coding method wherein video and audio are captured and coded in real time, in which video coding is controlled and the apparatus resource is utilized in accordance with a basic capability of the computer that executes the coding method, thereby preferable coded result with no audio discontinuity can be obtained.
It is a still further object of the present invention to provide a video and audio coding method wherein video and audio are captured and coded in real time, in which video coding is controlled and the apparatus resource is utilized in accordance with capability of the computer at that point of time that executes the coding method, thereby preferable coded result with no audio discontinuity can be obtained.
It is another object of the present invention to provide the video coding apparatus, the audio coding apparatus, and the video and audio coding apparatus which execute the video coding method, the audio coding method, and the video and audio coding method.
It is still another object of the present invention to provide a recording medium which records a video coding program, an audio coding program, and a video and audio coding program in which the video coding method, the audio coding method, and the video and audio coding method can be realized.
The present invention is directed to video, audio or video and audio coding in which processing capability of the coding apparatus which executes the coding is obtained as an indicator value and processing condition of the coding is set or the coding is controlled based on the indicator value.
Other objects and advantages of the invention will become apparent from the detailed description that follows.
The detailed description and specific embodiments described are provided only for illustration, since various additions and modifications within the spirit and scope of the invention will be apparent to those skilled in the art from the detailed description.
According to a first aspect of the present invention, a method of coding video comprising the steps of: coding one or a plurality of still picture information of pro-video information consisting of the plurality of still picture information in which video is digitized according to coding parameters; and deciding one or more coding parameters based on one or more of resolution of the pro-video information, frame rate required for reproducing coded data resulting from coding, processing performance indicating processing capability of the coding apparatus which performs the video coding step, and one or a plurality of coding parameters which affects amount of processing or coding in the video coding step.
According to a second aspect of the present invention, the method of coding video of the first aspect further comprising the steps of deciding processing capability of the apparatus which executes the video coding step and outputting a decision result.
According to a third aspect of the present invention, the method of coding video of the first aspect wherein the coding parameters includes one or more of resolution in coding to the pro-video information, a coding type indicating intra-frame coding or predictive coding, and a detection range for detecting motion vector used in the predictive coding.
According to a fourth aspect of the present invention, the method of coding video of the first aspect wherein in the processing capability decision step, decision is performed on the basis of a kind of a control unit included in the video coding apparatus.
According to a fifth aspect of the present invention, the method of coding video of the second aspect wherein in the processing capability decision step, decision is performed on the basis of required time of coding in the coding step.
According to a sixth aspect of the present invention, the method of coding video wherein the processing capability decision step further comprising: a video buffering step in which the input pro-video information is temporarily stored with a series of still picture information constituting the pro-video information sequentially preserved and read in the coding step, thereby the coded still picture information is sequentially abandoned; and a frame rate control step in which preservation of the series of still pictures information in the video buffering step is controlled so as to perform at a prescribed frame rate decided on the basis of the given frame rate; the decision being performed on the basis of amount of stored pro-video information temporarily stored in the video buffering step.
According to a seventh aspect of the present invention, an audio coding method wherein audio is coded by subsample coding method, said audio coding method executing the steps of: storing a set frequency fs as a value used for coding processing and a conversion constant n; inputting audio as a coding object; forming sampled audio data using a sampling frequency determined on the basis of the stored set frequency fs; on the assumption that a number of sampled audio data obtained with the set frequency fs as a sampling frequency is m and a number of data determined based on the conversion constant is mxe2x80x2, outputting converted audio data which consists of m pieces of audio data and contains mxe2x80x2 pieces of sampled audio data; subsampling the converted audio data to obtain M subband signals; allocating coding bits to some of the subband signals which are less than a limit frequency fs/2n obtained from the stored set frequency fs and the conversion constant n; performing quantization according to the allocated coding bits; outputting the quantized data as coded data; and recording the coded data to be output.
According to an eighth aspect of the present invention, the audio coding method of the seventh aspect including: the input audio sampling step wherein m pieces of sampled audio data are formed by subjecting the input audio to sampling processing with the stored set frequency fs as a sampling frequency; and the audio data converting step wherein plural pieces of sampled audio data are extracted from the m pieces of sampled audio data at intervals of (nxe2x88x921) pieces of data and (nxe2x88x921) pieces of audio data are inserted between adjacent pieces of the extracted, sampled audio data to form m pieces of converted audio data.
According to a ninth aspect of the present invention, the audio coding method of the eight aspect including the audio data converting step wherein the converted audio data is formed comprising contiguous groups of n pieces of the extracted, sampled audio data.
According to a tenth aspect of the present invention, the audio coding method of the seventh aspect wherein, in the input audio sampling step, using the sampling frequency fs/n obtained according to the stored set frequency fs and the conversion constant n as a sampling frequency, m/n pieces of sampled audio data are formed by subjecting the input audio to sampling processing, and in the audio data conversion step, (nxe2x88x921) pieces of audio data are inserted between adjoining pieces of the sampled audio data to convert the sampled audio data into m pieces of converted audio data.
According to an eleventh aspect of the present invention, the audio coding method of the tenth aspect wherein, in the audio data conversion step, the converted audio data comprising contiguous groups of n pieces of sampled audio data is formed based on the m/n pieces of sampled audio data.
According to a twelfth aspect of the present invention, the audio coding method of the seventh aspect further executing the steps of: temporarily storing the sampled audio data in an input buffer; and checking the amount of data within the input buffer to make a comparison between the amount of data and a predetermined value, and based on the result of the comparison, changing the value of the conversion constant n stored in the register; wherein, in the input audio sampling step, the sampled audio data is written into the input buffer, and in the audio data conversion step, sampled audio data is read from the input buffer and subjected to the above-described conversion processing.
According to a 13th aspect of the present invention, the audio coding method of the seventh aspect includes executing the coded data supervising step wherein the amount of the coded data output per unit time in the coding step is checked and compared with a predetermined value, and based on the result of the comparison, the value of the conversion constant n stored in the register is changed.
According to a 14th aspect of the present invention, a method of coding audio in which audio is coded using subsampling coding comprising the steps of: storing a control constant used in the coding; sampling input audio and outputting sampled data; subsampling sampled data obtained in the sampling step and outputting sub band signal data; allocating coding bits to the sub band signal data obtained in the subsampling step; quantizing the sub band signal data according to the coding bit allocation and outputting quantized value; outputting coded data as a coding step on the basis of the quantized value obtained in the quantizing step; and controlling data processing in the subsampling step, the coding bit allocation step, the quantizing step, and the coding step.
According to a 15th aspect of the present invention, the method of coding audio of the 14th aspect wherein, the control constant storing step includes storing an unit period constant K in an unit period decision constant as the control constant and the coding control step including: on assumption that the number of sampled data to which one subsampling is performed in the subsampling step is p, and time corresponding to p pieces of sampled data is an unit period, for each p pieces of sampled data, deciding whether corresponding unit period is a coding period or not on the basis of the stored unit period decision constant; when it is decided that the unit period is the coding period, performing control so that sampled data in the unit period is output to the subsampling step; and when it is decided that the unit period is not the coding period, performing control so that the stored fixed coded data is output as coded data in the coding step.
According to a 16th aspect of the present invention, the method of coding audio of the 15th aspect wherein in the decision control step, on assumption that i-th unit period is ti, when i=nxc3x97k+1 (k: unit period decision constant, n: arbitrary; integer), it is decided that the unit period ti is the coding period.
According to a 17th aspect of the present invention, the method of coding audio of the 14th aspect wherein the control constant storing step includes storing an operation decision constant q in an operation decision constant register as the control constant and, the coding control step includes an operation stopping step in which the operation in the subsampling step is controlled so as to stop in the middle.
According to an 18th aspect of the present invention, the method of coding audio of the 17th aspect wherein the operation stopping step includes performing control so that the operation of a basic low-pass filter in the subsampling step is stopped in the middle at both end step of the filter.
According to a 19th aspect of the present invention, the method of coding audio of the 14th aspect wherein the control constant storing step includes storing a subband selecting constant r in a subband selecting register as the control constant and the coding control step includes a subband reducing step in which the coding bit allocation step and the quantizing step are performed only to data selected on the basis of the stored subband selecting constant r among subband signal data output in the subsampling step.
According to a 20th aspect of the present invention, the method of coding audio of the 19th aspect wherein the subband reducing step includes selecting subband signal data skipping r pieces of subband signal data (r: subband selecting constant) among output M pieces of subband signal data obtained in the subsampling step.
According to a 21th aspect of the present invention, the method of coding audio of the 14th aspect further comprising a processing status supervising step in which status of data processing in audio coding is obtained and a value of the stored control constant is changed according to the obtained status.
According to a 22th aspect of the present invention, the method of coding audio of the 21th aspect wherein the processing status supervising step includes an audio buffering step in which sampled data is temporarily stored in an input buffer and an input supervising step in which amount of data held in the input buffer is compared with a preset value and the control constant is changed on the basis of the comparison result.
According to a 23th aspect of the present invnetions, the method of coding audio of 21th aspect wherein the processing status supervising step includes a coding supervising step in which amount of the coded data output per unit of time in the coding step is compared with a preset value and the value of control constant is changed on the basis of the comparison result.
According to a 24th aspect of the present invention, an audio coding method in which coding is performed to pro-audio information that is obtained by digitizing audio, using a subsampling coding method, comprising: a step for performing sampling to input audio to output sampled data; a step for performing subsampling to the sampled data to output subband signal data; a step for allocating coding bits to the subband signal data; a step for controlling the bit allocation at the coding bit allocation step by an alternative psychoacoustic analysis control system; a step for quantizing the subband signal data according to the coding bit allocation to output quantized values; and a step for outputting coded data on the basis of the quantized values.
According to a 25th aspect of the present invention, the audio coding method of the 24th aspect, wherein the bit allocation control step comprises a sequential bit allocation step of performing the coding bit allocation to the subband signal data, according to the order of bit allocation that has been previously specified by the alternative psychoacoustic analysis control system.
According to a 26th aspect of the present invention, the audio coding method of the 24th aspect, wherein the bit allocation control step comprises a subband output adaptive bit allocation step of performing the coding bit allocation to the subband signal data, based on the weighting to each subband predetermined using the psychoacoustic analysis alternative control method and output level of each subband signal data.
According to a 27th aspect of the present invention, the audio coding method of the 24th aspect, wherein the bit allocation control step comprises an improved subband output adaptive bit allocation step of performing the coding bit allocation to the subband signal data, according to weight of the subbands that has been previously specified by the alternative psychoacoustic analysis control system, weight corresponding to the bit allocation numbers of the respective subbands, and the output levels of the respective subband signal data.
According to a 28th aspect of the present invention, the audio coding method of the 24th aspect, wherein the bit allocation control step comprises a minimum audible limit comparing step of comparing the subband signal data with the minimum audible limit, and controlling so that no bit allocation is performed to the subbands not reaching the minimum audible limit, and the bit allocation is increased to the other subbands.
According to a 29th aspect of the present invention, a method for coding video and audio information wherein a part or all of coding processes thereof share a common computer resource, comprising the steps of: buffering pro-audio signals temporarily when video and audio information is processed into pro-video signals composed of plural still picture information representing still picture taken per a unit time and pro-audio signals representing audio information; coding said buffered pro-audio signals which are read out, before outputting coded audio information; evaluating the processing performance of said video and audio information coding using a coding-load criterion information indicating how much load of coding video information is; controlling below-described pro-video signal coding based on the results of said processing performance evaluating step; coding pro-video signals composed of still pictures according to said controlling step; and outputting coded video information.
According to a 30th aspect of the present invention, a method for coding video and audio information according to the 29th aspect wherein: the coding load evaluating step includes the steps of obtaining coding-load evaluation information based on the total size of pro-audio signals stacked in said pro-audio signals buffering step and said coding load criterion information when said pro-video signals composed of plural still picture information are output, comparing said coding-load evaluation information with a predetermined limit of coding load, outputting said pro-video signals into said pro-video signals coding step when said coding-load evaluation information reaches said limit of a coding load or abandoning said pro-video signals when said coding-load evaluation information does not reach said limit of a coding load.
According to a 31th aspect of the present invention, a method for coding video and audio information according to the 29th aspect wherein: performed are the steps of, when analog video information is input and video resolution information is output, converting said analog video information to pro-video information composed of plural discrete digital pixel signals and comprising plural still picture information with resolution according to the video resolution information, and outputting said pro-video information to be processed in the video coding step, the coding load evaluation step including obtaining a codingxe2x80x94load evaluation information based on the total size of pro-sudio signals stacked in said pro-audio signal buffering step and a coding load criterion information indicating how much load of coding video information including obtaining a picture resolution information indicating the resolution of pictures for coding video information and outputting said picture resolution information, and the video coding step including coding said pro-video signals according to said pictures resolution when taking said picture resolution information and outputting coded video information.
According to a 32th aspect of the present invention, a method for coding video and audio information according to the 29th aspect wherein: said processing performance evaluating step just outputs a coding-load evaluation information into said pro-video signal coding step in which codes said pro-video signals to as much extent as a size calculated using said coding-load evaluation information and outputs coded video information.
According to a 33th aspect of the present invention, a method for coding video and audio information according to the 29th aspect wherein: the steps of reading out pro-audio signals stacked in said pro-audio signal buffering step, calculating the total size of said pro-audio signals which is output as a processed audio signal size, coding said pro-audio signals and outputting coded audio information, are in place of said pro-audio signal buffering step and said buffered pro-audio signal coding step; and the steps of obtaining a pro-audio signal input size based on an elapsed time and the size of said pro-audio signals input per a unit time, obtaining a predictive audio signal buffer size which is a difference between said pro-audio signal input size and said processed audio signal size and obtaining said coding-load evaluation information using said predictive audio signal buffer size, are in place of said processing performance evaluating step and said pro-video signal coding controlling step.
According to a 34th aspect of the present invention, a method for coding video and audio information according to the 29th aspect wherein: the steps of obtaining a pro-audio signal input size based on an elapsed time and the size of said pro-audio signal input per a unit time when said pro-video signals are input, obtaining a processed audio signal size based on the total size of coded audio information output by said buffered pro-audio signal coding step, obtaining a predictive audio signal buffer size which is a difference between said pro-audio signal input size and said processed audio signal size and obtaining said coding-load evaluation information using said predictive audio signal buffer size.
According to a 35th aspect of the present invention, the method for coding video and audio information of the 29th aspect wherein, variations of the decision result in the coding load evaluating step are supervised and the coding load criterion information is set in accordance with the variations.
According to a 36th aspect of the present invention, a video coding apparatus for coding video comprising: a video coding means for coding one or a plurality of still picture information of pro-video information consisting of plural still picture information in which video is digitized; a coding parameter decision means for deciding coding parameters which decides amount of processing of the coding means on the basis of a given frame rate on assumption that one or more resolution is a coding parameter and one or more coding types of coding types including intra-frame coding, forward-predictive coding, backward-predictive coding, and bidirectional-predictive coding are another coding parameters.
According to a 37th aspect of the present invention, an audio coding apparatus for coding audio using subsampling coding comprising: a register for storing set frequency fs and conversion constant n used in coding; an audio input means for inputting audio to be coded; an input audio sampling means for producing sampled audio data using sampling frequency decided on the basis of the stored set frequency fs; an audio data conversion means for outputtting converted audio data, on assumption that the number of sampled audio data obtained using the set frequency fs as sampling frequency is m and the number determined on the basis of the conversion constant n is mxe2x80x2, the converted audio data consisting of m pieces of audio data including mxe2x80x2 pieces of sampled audio data, and said audio coding apparatus comprising: a subsampling means for obtaining M subband signals by subsampling the converted audio data; a coding bit allocation means for allocating coding bits only to subband signals with a frequency that is not higher a limited frequency among the subband signals on assumption that frequency fs/2n obtained from the stored set frequency fs and conversion constant n is limited frequency; a quantization means for performing quantization on the basis of the allocated coding bits; a coding means for outputting the quantized data as coded data; and a coded data recording means for recording the output coded data.
According to a 38th aspect of the present invention, an audio coding apparatus for coding audio using subsampling coding comprising: a control constant storing means for storing a control constant used in the coding; a sampling means for sampling input audio and outputting sampled data; a subsampling means for subsampling to sampled data obtained by the sampling means and outputting subband signal data; a coding bit allocation means for allocating coding bits to subband signal data obtained by the subsampling means; a quantization means for quantizing the subband signal data according to allocation of the coding bits and outputting quantized value; a coding means for outputting coded data on the basis of quantized value obtained by the quantization means; and a coding control means for controlling data processing in the subsampling means, the coding bit allocation means, the quantization means, and the coding means.
According to a 39th aspect of the present invention, an audio apparatus for coding audio using subsampling coding comprising: a sampling means for sampling input audio and outputting sampled data; a subsampling means for subsampling to sampled data obtained by the sampling means and outputting subband signal data; a coding bit allocation means for allocating coding bits to subband signal data obtained by the subsampling means; a bit allocation control means for controlling allocation in the coding bit allocation means using psychoacoustic analysis alternative control method; a quantization means for quantizing the subband signal data according to allocation of the coding bits and outputting quantized value; and a coding means for outputting coded data on the basis of quantized value obtained by the quantization means.
According to a 40th aspect of the present invention, a video and audio coding apparatus which codes video and audio using a common computer resource in a part of processing or in entire processing, said apparatus comprising: an audio buffering means for temporarily storing pro-audio information on input of video and audio information consisting of pro-video information consisting of a plurality of still picture information representing still pictures per unit of time and pro-audio information representing audio; an audio coding means for reading pro-audio information stored in the audio buffering means, coding the read pro-audio information, and outputting coded audio information; a coding load evaluation means for deciding processing capability of the video and audio coding apparatus using coding load criterion information representing degree of load and controlling output of the pro-video information to a video coding means; and a video coding means for coding still picture information and outputting coded video information according to control of the coding load evaluation means upon input of still picture information constituting the pro-video information.
According to a 41th aspect of the present invention, a recording medium for recording a video coding program which codes video, said recording medium is used for recording a coding program which executes a step of coding one or a plurality of still picture information of pro-video information consisting of the plurality of still pictures information in which video is digitized according to coding parameters; and a step of deciding coding parameters which decides amount of processing of the coding means on the basis of a given frame rate on assumption that one or more resolution is a coding parameter and one or more coding types of coding types including intra-frame coding, forward-predictive coding, backward-predictive coding, and bidirectional-predictive coding are another coding parameters.
According to a 43th aspect of the present invention, a recording medium for recording an audio coding program which codes audio using subsampling method, said recording medium is used for recording a coding program which executes a storing step for storing set frequency fs and conversion constant in used in coding; an audio input step for inputting audio to be coded; an input audio sampling step for producing sampled audio data using sampling frequency decided on the basis of the stored set frequency fs; an audio data conversion step for outputting converted audio data, on assumption that the number of sampled audio data obtained using the set frequency fs as sampling frequency is m and the number determined on the basis of the conversion constant n is mxe2x80x2, the converted audio data consisting of m pieces of audio data including mxe2x80x2 pieces of sampled audio data, and said audio coding apparatus comprising: a subsampling step for obtaining M subband signals by subsampling the converted audio data; a coding bit allocation step for allocating coding bits only to subband signals with a frequency that is not higher a limited frequency among the subband signals on assumption that frequency fs/2n obtained from the stored set frequency fs and conversion constant n is limited frequency; a quantizing step for performing quantization on the basis of the allocated coding bits; a coding step for outputting the quantized data as coded data; and a coded data recording step for recording the output coded data.
According to a 43th aspect of the present invention, a recording medium for recording an audio coding program which codes audio using subsampling method, said recording medium is used for recording a coding program which executes a step for storing a control constant used in the coding; a step for sampling input audio and outputting sampled data; a step for subsampling sampled data obtained in the sampling step and outputting subband signal data; a step for allocating coding bits to the subband signal data obtained in the subsampling step; a step for quantizing the subband signal data according to the coding bit allocation and outputting quantized value; a step for outputting coded data as a coding step on the basis of the quantized value obtained in the quantized step; and a a step for controlling data processing in the subsampling step, the coding bit allocation step, the quantizing step, and the coding step.
According to a 44th aspect of the present invention, a recording medium for recording an audio coding program which codes audio using subsampling method, said recording medium is used for recording a coding program which executes a sampling step for sampling input audio and outputting sampled data; a subsampling step for subsampling to sampled data obtained in the sampling step and outputting subband signal data; a coding bit allocation step for allocating coding bits to subband signal data obtained in the subsampling step; a bit allocation control means for controlling allocation in the coding bit allocation step using psychoacoustic analysis alternative control method; a quantizing step for quantizing the subband signal data according to allocation of the coding bits and outputting quantized value; and a coding step for outputting coded data on the basis of quantized value obtained in the quantization step.
According to a 45th aspect of the present invention, a recording medium for recording a video and audio coding program which executes coding of video and audio using a common computer resource in a part of processing or in entire processing, said recording medium is used for recording a coding program which executes an audio buffering step for temporarily storing pro-audio information upon input of video and audio information consisting of pro-video information consisting of a plurality of still picture information representing still picture per unit of time and pro-audio information representing audio; an audio coding step for reading pro-audio information stored in the audio buffering means, coding the read pro-audio information, and outputting coded audio information; a coding load evaluation step for deciding processing capability of the video and audio coding apparatus using coding load criterion information representing degree of load and controlling output of the pro-video information to a video coding means; and a video coding step for coding still picture information and outputting coded video information according to control of the coding load evaluation means upon input of still picture information constituting the pro-video information.