Encoding of multiple videos has been considered in two major areas: transmission and recording. Transmission of multiple videos is mainly applied in broadcasting and delivery applications, while recording of multiple videos is usually applied to surveillance applications. Although video recording is common in many consumer electronics products, such video recorders deal typically with the encoding of a single video, rather than multiple concurrent videos.
In television broadcasting applications, it is common practice to encode multiple videos, such that the encoded video bitstreams can be transmitted together over a single channel having a fixed bandwidth. For instance, given N programs and a total channel bandwidth of 45 Mbps, which is common for satellite links, the problem is to encode the N programs with an overall maximum quality, and multiplex them onto the single channel. Because the bandwidth is fixed and the complexity of each program varies, each of the programs is encoded at a variable bit-rate (VBR). In this way, a near-constant distortion can be maintained across all programs. Thus, more complex portions of one videos can be allocated more bits, by decreasing the bits allocated for less complex portions of other videos that are concurrently encoded.
The encoding process described above is referred to as statistical multiplexing. Techniques associated with this process are described by Haskell in “Multiplexing of Variable Rate Encoded Streams,” IEEE Transactions on Circuits and Systems for Video Technology, 1994, Wang et al, “Multi-Program Video Coding with Joint Rate Control,” U.S. Pat. No. 6,091,455, “Statistical Multiplexer for Recording Video,” issued to Yang on Jul. 18, 2000, U.S. Pat. No. 6,195,388, “Apparatus and Method for Encoding Multiple Video Programs,” issued to Choi et al. on Feb. 27, 2001, and references included therein.
Along similar lines, Sun and Vetro have described the encoding of multiple objects in a scene subject to a fixed bandwidth constraint in U.S. Pat. No. 5,969,764, issued on Oct. 19, 1999. That method allocates bits to each object. In U.S. patent application Ser. No. 09/579,889, “Method for encoding and transcoding multiple video objects with variable temporal resolution,” filed by Vetro et al. on May 26, 2000, a method to satisfy a total bandwidth constraint with each object in a scene having a different temporal rate is described. There, the method minimizes composition artifacts that occur when multiple objects in a scene are encoded at different temporal rates.
The above prior art methods encode multiple videos or objects subject to a total bandwidth constraint of a single transmission channel.
In the prior art, resource constraints other than bandwidth have been considered in the processing of multiple videos, see for example, U.S. Pat. No. 6,052,384, “Using a Receiver Model to Multiplex Variable Bit-Rate Streams Having Timing Constraints,” issued to Huang et al. on Apr. 18, 2000, which describes techniques to determine the output bit rates of each stream so that neither the queue for the bitstream in the multiplexer nor the buffer in a decoder overflows or underflows. The rates are determined using timing information that is read from the bitstream, and a receiver model that considers the operation of the decoder buffer.
Transcoding multiple videos considering timing, both delay and processing, constraints are described in U.S. Pat. No. 6,275,536, “Implementation architectures of a multi-channel MPEG video transcoder using multiple programmable processors,” issued to Chen et al. on Aug. 14, 2001. Input bitstreams are first partitioned into processing units. In one architecture, the processing units are split into different substreams; each substream with its own queue, then each substream is processed in a corresponding branch. In a second architecture, the processing units are assigned to any available processor from a common queue. Independent processing units are processed concurrently according to a queuing system model to minimize an averaging processing time. In contrast to the first architecture that is a parallel process of multiple branches, the second architecture is a single branch to multi-processing.
Similar to Chen et al., U.S. Pat. No. 6,008,848, “Video Compression Using Multiple Computing Agents,” issued to Tiwari et al., on Dec. 28, 2001 also describes a system and method that uses multiple processors. In contrast, Tiwari applies to encoding of a video and describes techniques to achieve the encoding using coarse grain parallelism effected by multiple processors or compressing agents.
FIG. 1 shows a general system model for encoding multiple videos for a surveillance application. Cameras 101 acquire videos 102 for a video recorder 110. Typically, the recorder 110 compresses the videos. The compressed videos are then stored in a memory 120. Later, a video player 130 can play the stored videos.
FIG. 2 shows the details of the recorder 200. The acquired videos 102 are sent to a high-speed switch 210. The switch samples the analog video signals in time. The samples are fed to a decoder 220, and the digitized images are encoded by a still-image-encoder 230 to yield compressed images. A memory controller 240 writes the compressed images to allocated space in the memory 120. The stored video can be played back later.
The main problem with the recorder of FIG. 2 is that still images are encoded. That does not exploit any temporal redundancy in the videos. As a result, the memory 120 needs to be very large if weeks and months of surveillance videos are stored. It should be noted that a huge amount of surveillance videos, particularly those taken at night, are of totally static scenes. Significant events are rare.
FIG. 3 shows an obvious solution to the above problem. In this scheme, there is one encoding channel for each video. In the encoding channel, the video is first NTSC decoded 220. Due to the predictive encoding used in video coders, such as MPEG, a frame memory 310 is maintained for each video encoder 320 to store reference pictures. The input frames from the different camera 101 are then encoded separately and the results are written to the memory 120 using the memory controller 240. The temporal rate of the input frames can be controlled by uniformly sampling with fixed period T 301. This sampling makes better utilization of the memory 120. The main drawbacks of that scheme are that the video encoders 320 are not fully utilized considering that they are typically designed to handle full-rate video. Also, the many decoders and encoders increase the cost of the system.
In U.S. Pat. No. 6,314,137, “Video data compression system, video recording/playback system, and video data compression encoding method,” issued to Ono et al. on Nov. 6, 2001, a system and method that overcomes the above drawbacks is described, as shown in FIG. 4. There, a single video encoder 420 is used to encode all of the videos. The digitized video frames from each camera input 101 are sub-sampled with period T and buffered in the respective frame memories 210. In order to achieve the predictive encoding with the single encoder 420, a series of input video frames corresponding to one camera input are fed into the video encoder so that predictive coding from frames of the same camera input can be made successively. The Group of Pictures (GOP) structure in MPEG coding allows independent units of such to be formed. In that way, the memory controller 240 becomes a GOP-select, and the GOP's from each camera input are time-multiplexed into the encoder according to the controller 410. With that scheme, a single bitstream for all camera inputs is produced. To identify the portions of the videos that correspond to a given camera, a camera identifier 401 is multiplexed into the encoded bitstream.
With the above solution, one GOP's worth of data is required to be stored in each of the frame memories, which is a much larger requirement than the system of FIG. 2 that only requires 1 or 2 reference pictures, at most. Therefore, although there is a significant savings in encoding hardware, memory requirements are still large and expensive. This drawback cannot be overcome by simply sampling the input video more aggressively. Although this will reduce the temporal resolution of the video, the same data for a GOP still needs to be buffered. Only a shorter GOP period would reduce the memory requirements, but this would imply more frequent intra frame coding, which means less coding efficiency. In the most extreme case, a GOP period of 1 would degenerate to the still image coding system shown in FIG. 2.
High memory requirements are just one drawback of the system in FIG. 4. The problem becomes proportionately worse when the system is scaled to higher number of videos.
Therefore, it is desired to provide a system and method for concurrently encoding multiple videos.