The present invention relates to methods for compressing video and audio at low bit rates in general and to methods for compressing a sub-sampled MPEG video and audio in particular.
A single channel audio signal is considered, in the art, a single dimension function of time, while a video signal is considered a two dimensional function of time. In the art, video and audio are each sampled separately, but generally, simultaneously, since they, usually, are related. Accordingly, video and audio have to be played back and displayed in a synchronous way.
Methods for compressing digital video and audio signals, as well as decompressing the compressed digital code, are known in the art. According to a family of standards, known as Motion Picture Expert Group (MPEG) such as ISO/IEC 11172 (MPEG-1) and ISO/IEC 13818 (MPEG-2), each frame or field of the original video signal, can be compressed into three main types of pictures. It is noted that a picture in MPEG can be either a video frame or a video field.
A first type is an intra-decoded picture (I-frame) which contains all of the information needed to produce a single original picture.
A second type is a predictive picture (P-frame) which includes information for producing an original video frame, based on a previous reference frame. A reference frame is an adjacent I-frame of P-frame. The size of a P-frame is typically smaller than the size of an I-frame. A third type is a bi-directional predictive (B-frame) which includes information for producing an original video frame, based on either the previous reference frame, the next reference frame or both. The size of a B-frame is typically smaller than the size of a P-frame.
Sub-sampling refers to sampling a given signal, audio or video, at a considerably low rate, lower than an optimal one, which is usually predetermined in a given standard.
For example, the human eye is not likely to detect a single frame in a visual signal which is updated 24 times or more, in a second. The human eye regards such a visual signal as continuous motion. Thus, a video sampling rate of at least 24 video samples (frames) per second provides fluent video motion.
Similarly, the human ear cannot detect high audio frequencies. Thus a sampling rate of at least 30 KHz is likely to provide an audio signal which can not be distinguished from the original, by the human ear.
Compression standards such as MPEG are usually restricted to working according to a predetermined closed list of sampling rates in video as well as audio.
For example, MPEG operates according to a video sampling rate of, generally, 25 samples (frames) per second (when operating according to a broadcasting standard such as PAL) or, alternatively, according to a video sampling rate of, generally, 29.97 samples (frames) per second (when operating according to a broadcasting standard such as NTSC). In the context of this application 30 frames per second refers to 29.97 frames per second and is used for convenience only.
MPEG audio compression can be applied to signals, which are sampled at 32 KHz, 44.1 KHz and 48 KHz MPEG-2 allows, in addition, sampling rates of 16 KHz and 22.05 KHz.
Given a set of sampling and compression parameters, lowering the bit-rate produced by the encoder degrades the quality. Methods for maximizing the ratio between quality and bit-rate for low bit-rate MPEG applications are known in the art.
One method known in the art is applicable to video compression. The method reduces the bit-rate without effecting the quality of compressed frames and is particularly suited to compressing video with little or no motion. According to the method the signal is sub-sampled before compression and therefore some of the frames are not compressed.
According to the method, a video signal is sub-sampled according to a predetermined or dynamic duty cycle.
Where this signal to be presented to an encoder, the duration of the stream at a standard video decoder would be a fraction of the original duration. To overcome this, according to this methodthe MPEG encoder is instructed to use IP encoding (no B frames) and the stream that is produced is edited after compression. A P frame is inserted in the stream in place of each discarded frame. These P frames specify that all of the information for the frame exists in the previous reference frame in the stream and are therefore relatively small. It will be noted that this method requires editing of the compressed stream. Those skilled in the art will appreciate that the edited stream will contain a complete frame set. Moreover, the stream will be smaller than a stream that is produced by a conventional encoder that is presented with a signal from which frames were discarded and replaced by duplication of the previous frame before encoding.
It will be noted that this method is not specified for audio compression.
Reference is made to FIG. 1 which is schematic illustration of a video signal and sub-sampled compressed video, known in the art.
Video signal 1 includes fifteen original frames referenced 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38 and 40. Video signal 1 is provided according to the NTSC standard. The NTSC standard determines a frame rate of approximately 30 frames per second. Thus, video signal 1 represents one half of a second according to the NTSC standard.
According to the prior art, in a first stage, half (every other frame) of the original frames are digitized, compressed so as to produce a frame-set 50A. In the present example, original frames 11, 18, 22, 26, 30, 34 and 38 are not digitized.
Frame-set 50A is an MPEG partial representation of video signal 1, compressed according to a sub-sampling rate of half. Frame-set 50A includes I-frames 52A and 72A and P-frames 56A, 60A, 64A, 68A, 76A and 80A. I-frames 52A and 72A are compressed representation of original frames 12 and 32. P-frames 56A, 60A, 64A, 68A, 76A and 80A are compressed representation of original frames 16, 20, 24, 28, 36 and 40.
It will be appreciated by those skilled in the art that if frame-set 50A were provided to a standard MPEG decoder, the decoder would play it, frame by frame, at a rate of 30 frames per second. Thus, frame-set 50A, which includes 8 frames, will be played for a period of time of about one quarter of a second.
The time period spanned between original frames 12 and 40 is about half a second and so should be the time period determined by I-frame 52A and P-frame 80A. In reality, a decoder provides each frame {fraction (1/30)} of a second and thus, the actual time period which elapses between the displaying of I-frame 52A and P-frame 80A is about one quarter of a second.
To overcome this problem, a second stage is performed in which a compressing controller edits the stream and adds, after each of the compressed frame, a string of bits which represents a P-frame, relating to the adjacent previous reference frame, so as to transform frame-set 50A into frame-set 50B.
Frame-set 50B includes, in addition to the frames of frame-set 50A, P-frames 52B, 56B, 60B, 64B, 68B, 72B and 76B.
Accordingly, frame-set 50B has now an identical number of frames as the original video signal 10. A decoder, decoding frame-set 50B, will present frame-set 50B in half a second, since it includes 15 frames wherein each is displayed in {fraction (1/30)} of a second.
At first, the decoder decodes I-frame 52A and provide it for display. Then, the decoder decodes P-frame 52B, which is a prediction that the present frame is identical to the previous one and so, the decoder provides frame 52A for display, again. Accordingly, each of the frames originated at frame-set 50A, is provided for displayed twice, when decoding frame-set 50B.
The disadvantages of this method are as follows:
According to the MPEG standard the size of a P-frame that contains no information other than a reference to another frame, is around 100 bits of storage area which, as will be appreciated by those skilled in the art, can be accumulated into a considerable amount of storage area.
It is therefore clear that although, this prior art method stores and provides half of the visual information, it uses more than half of the storage area required to store the entire MPEG video, thus failing to decrease the bit-rate by the sub-sampling factor. Though only half of the information is present, more than half of the bandwidth is required for compression.
Furthermore the prior art method multiplies each previous adjacent reference frame. Therefore it can only use I frames and P-frames as a source, because they are the types of frames which are defined in the standard as reference frames. A B frame can not be a reference frame and as such, it can not be used as a source for multiplication. Hence, this method can not make any use of B-frames in the first stage of creating frame-set 50A. It will be appreciated that the full compressing skills of MPEG-1 are not utilized according to these methods.
Additionally, the method is not applicable to audio compression. The MPEG audio compression techniques does not allow editing as described above for video.
Moreover, the method is only applied to MPEG video compression or to other compression techniques that have syntactic elements such as P frames. Such elements are required to represent frames by specifying reference frames of which they are duplicates.
It is an object of the present invention to provide novel system for producing low bit-rate MPEG streams using sub-sampling which overcomes the disadvantages of the prior art.
Referring to the disadvantages of the prior art:
The system decreases the bit-rate required to encode sub-sampled video streams by the sub-sampling factor.
Furthermore, the system does not preclude the encoding of B-frames during the video encoding process.
Additionally, the system is applicable to audio signals as well as video signals.
Moreover, the system is applied to any compression technique that supports time stamps to synchronize decoded audio and video.
It is another object of the present invention to provide a method for operating the system. The method includes the following steps:
Sampling the given signals, according to a predetermined or dynamic duty cycle, so as to provide a plurality of digitized samples;
Encoding the digitized samples, so as to produce encoded samples; and
Attaching a presentation time stamp to a selection of the encoded samples wherein each selected encoded sample is to be reproduced at a point in time determined by the presentation time stamp attached thereto.
The step of encoding can be performed according to MPEG compression, or any other smaller compression method.
According to one aspect of the invention, at least one of the given signals is a video signal. According to another aspect of the invention, at least one of the given signals is an audio signal.
The duty cycle is given by   K  N
wherein N is the number of detected samples in a given cycle and K is the number of selected samples in the given cycle.
A method of the invention is also operable using encoders which receive the sample for encoding together with the presentation time stamp and so produce frames which already include presentation time stamps.
In accordance with another aspect of the invention, there is thus provided a system for providing a sub-sampled compressed signal which includes at least one sampling unit, at least one encoding unit, wherein each of the encoding units is associated and connected to a selected one of the sampling units, a controller
at least one sampling unit, for sampling at least one signal, so as to provide at least one sampled stream, at least one encoding unit, wherein each of the encoding units is associated and connected to a selected one of the sampling units, a controller, a multiplexer.
The controller connected to sampling units and to the encoding units and the multiplexor is connected to the encoding units and to the controller.
Each of the encoding units encodes a sampled signal, so as to produce an encoded stream which includes a plurality of encoded frames. The controller provides a presentation time stamp to each of the encoded frames. Finally, the multiplexor multiplexes the encoded streams.