1. Field of the Invention
This invention relates to methods and systems for efficient video compression by recording various state signals of a camera, which include luminous intensity, frame index, movements of the camera, zooming state of the camera, aperture state of the camera, focus state of the camera and camera identification number of the camera. In particular, the camera records these state signals along with video and audio signals, and video compression algorithms utilize such state signals to predict the current image from previous reconstructed images.
2. Description of the Related Art
Recent advancements in digital technology make it possible to record video signals in digital formats. Most video signals can be viewed as a sequence of still images, each of which is called a frame. For typical video signals, there are 25–30 frames per second. Sometimes, video signals may be represented as a sequence of fields. Since fields are created by dividing each frame into a set of two interlaced fields, the idea and teaching of the present invention are also applied to the field-based video signals. Although the following description will be described mainly in terms of frame-based video signals, it is emphasized that the teaching of the present invention can be applied to both frame-based and field-based video signals. Sometimes, the terminology “image” will be used and it may be interpreted as either a frame or a field.
For the standard definition television, there are several hundreds of thousands of pixels in each frame and color video signals have three channels. Thus, the bandwidth of digital video signals can be very large. In order to save such a large amount of digital video signals in digital formats, video compression techniques must be employed.
Most video compression algorithms try to reduce spatial, spectral and temporal redundancies in video signals. The spatial redundancy is a redundancy within a frame and the temporal redundancy is a redundancy among successive frames. In general, the compression algorithms, which have been proposed to reduce the spatial redundancy within a frame, utilize transform coding, quantization and variable length coding. Two of the most widely used transforms are the discrete time cosine transform, which is extensively used in JPEG and MPEG, and the wavelet transform. Some of the most widely used variable length coding algorithms include Huffman coding and arithmetic coding. Due to their importance, numerous coding algorithms have been proposed for still images.
Since there are 25–30 frames per second for typical video signals, successive frames in video signals are highly correlated. In other words, successive frames are very similar. In particular, if there is no moving object, successive frames will be identical assuming that the camera states, which include its zooming state, focus state, aperture state and the position of the camera, are unchanged and that the surrounding light condition remains the same. If there is a moving object, successive frames will be different due to the motion of the moving object. However, if the motion of the moving object can be estimated, one can predict the location of the moving object in the current frame from previous reconstructed frames. Then, the difference image between the current image and the predicted image is computed and transmitted instead of transmitting the original image. The operation to predict the current image from the previous reconstructed images using motion vectors is called motion compensation and is a key element in video compression algorithms. A block diagram of a typical video encoder utilizing the motion compensation is shown in FIG. 1, where DCT 100 represents the discrete cosine transform, Q 101 quantization, VLC 102 variable length coding, Q−1 103 inverse quantization, and IDCT 104 the inverse discrete cosine transform. If the prediction is good, the pixel values of the difference image will be very small and the difference image can be very efficiently encoded, resulting in a significant reduction in data size. Thus, the key idea in reducing the temporal redundancy in video signals is to estimate motion vectors between successive frames and to use the information to make a good prediction of the following image. In practice, the motion estimation can be done in both the forward direction and backward direction. Due to their importance in video coding, numerous motion estimation and compensation algorithms have been proposed. One of the most widely used motion estimation algorithms is the block matching algorithm. In the block matching algorithm, a frame is divided into a number of blocks and the motion estimation is performed for each block. However, there are many problems with the current motion estimation and compensation algorithms. First of all, an accurate estimation of the motion of moving objects is a very difficult task. Furthermore, the motion estimation is a very time-consuming process, consuming a significant portion of the processor power.
In general, there are many factors that cause differences in successive frames. Obviously, if there is a moving object, successive frames will be different. Sometimes, an object of interest may be moving toward or away from the camera, thereby resulting in differences in successive frames. However, there are other factors, too. For instance, if the camera is panned, successive frames will be different. If the zooming or aperture states are changed, successive frames will change accordingly. On the other hand, a change in the surrounding light conditions also causes differences in successive frames. Since there are so many factors that make successive frames different, it is very difficult to estimate motion vectors accurately. However, if information on those various states of the camera is available, the motion estimation can be performed more easily and accurately. In other words, if information on the various states of the camera is available, this information can be effectively used in predicting the current image from previous reconstructed images. Fortunately, the information on the movement, zooming state, focus state, aperture state of the camera and the information on luminous intensity can be readily obtained and recorded.
A typical motion picture is produced by editing parts from video signals taken by a number of cameras. Quite often, video signals from several cameras are alternately concatenated. Generally, when video signals from several cameras are alternately concatenated, it is of no use to try to predict the first frame after a boundary from previous reconstructed frames before the boundary. For instance, the first frame after the boundary may not be predicted from frames taken by a different camera. In this case, most video compression algorithms give up trying to predict the current frame from the previous reconstructed frames and just transmit the first frame without any motion compensation. However, transmitting the original image without motion compensation significantly increases the data size. However, if video signals from several cameras are alternately concatenated, the first frame after the boundary can be accurately predicted some of the previous frames that were taken by the same camera. For instance, in FIG. 7, the first frame 170 of VIDEO 3 can be predicted from the last frame 171 of VIDEO 1 and the first frame 172 of VIDEO 4 from the last frame 173 of VIDEO 2. Thus, if one can determine which frames were taken by the same camera, such information will be very useful for predicting the first frame after the boundary 175. For this purpose, the present invention records a frame index and a camera identification number for each frame. In other words, according to the teaching of the present invention, a different camera identification number is assigned to each camera and each camera records the camera identification number and the frame index to each frame.
Therefore, it is an object of the present invention to provide a video camera that has means to record the frame index, camera identification number, movement, zooming state, focus state, aperture state of the camera, and the luminous intensity along with audio and video signals. Another object of the present invention is to develop video compression algorithms that use such information for efficient video compression.