As total systems for filming and displaying moving images, the National Television System Committee (NTSC) system is adopted in the United States of America and Japan, and the Phase Alternating Line (PAL) system is adopted in Europe. In recent years, the High Definition (HD) system, in which the number of pixels per frame is greater, is beginning to become more prevalent. In these moving picture systems, about 25 to 60 pictures are displayed per second. Since these pictures are continuously displayed, the filmed subject appears to move on the display to the human eye.
Moreover, as the Internet has become more pervasive, it has become more commonplace to distribute moving picture data over the Internet, and for individual users to receive them and display them on a display device, such as the display of a personal computer (PC) or the like. In order to distribute data in such a manner, compression is performed since moving picture data are generally massive. Common compression systems include Moving Picture Experts Group Phase 2 (MPEG-2) (ISO/IEC 13818-2) and Moving Picture Experts Group Phase 4 (MPEG-4) (ISO/IEC 14496-2). An outline of MPEG-2 is disclosed in, for example, “Point Zukaishiki, Saishin MPEG Kyoukasho (The Point-Illustrated Newest MPEG Textbook)” written by Hiroshi Fujiwara, ISBN 4-7561-0247-6, and the like.
A detailed description of MPEG-4 can be found in, for example, “MPEG-4 no Subete (Everything About MPEG-4)” published by Kogyo Chosakai Publishing Co., Ltd., ISBN 4-7693-1167-2. However, since there are differences in the processable, that is, displayable, data rates depending on the display devices from user to user, standards accommodating such cases are also described in the standards of MPEG-4. In other words, a system in which moving picture data is divided into several layers to be compressed and transmitted (or stored) is defined. This is generally referred to as scalability, which is described in detail below.
On the data transmitting end, a low resolution moving picture is first generated from an original high resolution moving picture, and the generated low resolution moving picture is compressed and transmitted in a first layer as “low resolution compressed moving picture data.” Moreover, the difference data between the original high resolution moving picture and the low resolution moving picture, which has been generated on the basis of the original high resolution moving picture, is compressed in a second layer as “compressed difference data between the original high resolution moving picture and the low resolution moving picture.”
On the data receiving end, when the processable data rate is low, only the first layer, that is the “low resolution compressed moving picture data,” is received and the data of the second layer is ignored. The picture display device, which has received the first layer moving picture data, that is, the low resolution moving picture data, decompresses the received first layer compressed moving picture data, that is, the low resolution compressed moving picture data, and is thus able to display a low resolution moving picture.
On the other hand, when the processable data rate is high on the receiving end and data input with a huge data rate can be received, both the data in the first layer and the second layer, that is, both the “low resolution compressed moving picture data” and the “compressed difference data between the original high resolution moving picture and the low resolution moving picture,” are received.
On the receiving end, which has received the data, first, the data in the first layer is decompressed, and the low resolution moving picture is restored. Then, the data in the second layer is decompressed, and the difference data between the “original high resolution moving picture” and the “low resolution moving picture” is restored. Furthermore, the “high resolution moving picture” is restored based on the restored “low resolution moving picture” and the “difference data between the ‘original high resolution moving picture’ and the ‘low resolution moving picture.’” By having the restored “high resolution moving picture” displayed, users are able to enjoy the original high resolution moving picture.
Thus, by performing the respective processes depending on whether the processable data rate of the display device is low or high, it is made possible to watch a low resolution moving picture or a high resolution moving picture. The description above pertains to a process based on the difference in scalability in the spatial direction, that is, scalability in terms of the number of pixels constituting a frame.
Temporal scalability is also standardized. On the data transmitting end, a low frame rate moving picture is first generated from an original high frame rate moving picture by decimation, and the generated low frame rate moving picture is compressed and is then transmitted as “low frame rate compressed moving picture data” in a first layer. Moreover, the difference data between the original high frame rate moving picture and the low frame rate moving picture, namely the frames that were decimated, is compressed and is then transmitted as “compressed difference data between the original high frame rate moving picture and the low frame rate moving picture” in a second layer.
On the receiving end, when the processable data rate of the data receiving end is low, only the data in the first layer, that is, the “low frame rate compressed moving picture data,” is received, and the data in the second layer is ignored. Moreover, the data in the first layer, that is, the low frame rate compressed moving picture data, is decompressed, thereby making it possible to display the low frame rate moving picture.
On the other hand, when the processable data rate of the data receiving end is high and massive data can be received, both the data in the first layer, that is, the “low frame rate compressed moving picture data,” and the data in the second layer, that is, the “difference data between the original high frame rate moving picture and the low frame rate moving picture” are received.
Next, the data in the first layer is decompressed, and the low frame rate moving picture is restored. Then, the data in the second layer is decompressed, and the difference data between the “original high frame rate moving picture” and the “low frame rate moving picture” is restored. Furthermore, the “high frame rate moving picture” is restored based on the restored “low frame rate moving picture” and the “difference data of the ‘original high frame rate moving picture’ and the ‘low frame rate moving picture.’” By having the restored “high frame rate moving picture” displayed, users are able to enjoy the original high frame rate moving picture. Thus, by performing the respective processes depending on whether the processable frame rate of the display device is low or high, it is made possible to watch a low frame rate moving picture or a high frame rate moving picture.
In addition to temporal and spatial scalability, systems for realizing other kinds of scalability such as SNR scalability, content scalability and the like have already been proposed. For example, there are a system that uses quantization errors in an enhancement layer, a system that uses, in an enhancement layer, object information which does not exist in a base layer, and the like. Moreover, it is possible to use temporal, spatial and other scalabilities with various modifications. Moreover, it is also possible to use temporal, spatial and other scalabilities in combination as deemed appropriate (see for example, Japanese Patent Application Publication Number 1999-266457, International Publication Number WO97/28507, and the like, the contents of which are hereby incorporated by reference).
The present invention relates particularly to a system including at least an existing version or any modified version of temporal scalability. Accordingly, scalability in the temporal direction will be further described with concrete examples. It is assumed in this example, for purposes of illustration, that an original high frame rate moving picture is a moving picture taken at a frame speed of 120 frames/sec. Naturally, each picture is taken in 1/120 of a second. In other words, the exposure time of each picture is 1/120 seconds. A low frame rate moving picture is generated from this moving picture. For example, three out of four consecutive pictures are discarded, and only the remaining one picture is kept. Thus, a moving picture of 30 frames/sec can be generated. This process is described below using FIG. 1.
In FIG. 1, the horizontal axis indiates time (sec). A picture group 101 indicates a group of frames of an original high frame rate moving picture. FIG. 1 shows thirteen picture frame data 101-1 to 101-13 of the picture group 101. The picture frame data 101-1 is the data of the first picture of the moving picture. As shown in FIG. 1, the picture frame data 101-1 is the data of a picture generated through exposure during the first 1/120 seconds (between the 0th seconds and the 1/120th second in FIG. 1). The picture data 101-2 is the data of the second picture, which is generated through exposure during the subsequent 1/120 seconds (from the 1/120th second to the 2/120th second in FIG. 1). The picture frame data 101-3 and the subsequent picture frame data are generated in a similar manner, and are consecutive high frame rate picture frame data whose exposure time is set at 1/120 seconds.
Moreover, in FIG. 1, a picture group 102 is a group of frames of a low frame rate moving picture produced by decimating pictures from the picture group 101. Picture frame data 102-1 is the first picture data of this moving picture, and is exactly the same as the picture frame data 101-1 of the picture group 101. Picture data 102-2 is the second picture frame data, and is exactly the same as the picture frame data 101-5 in the picture group 101. The same goes to picture frame data 102-3 and the subsequent picture frame data, and picture frame data 102-n is exactly the same as the picture frame data 101-(4×n−3) in the picture group 101.
In FIG. 1, a picture group 103 is a group of pictures that were decimated in generating the low frame rate picture group 102 from the high frame rate picture group 101. In other words, the picture group 103 includes the picture data 101-2, 101-3, 101-4, 101-6, 101-7, 101-8, . . . , 101-(4×n−2), 101-(4×n−1), 101-(4×n), and so forth.
On the data transmitting end, the low frame rate moving picture shown in the picture group 102 is compressed and transmitted as first layer data. Moreover, the picture group 103, that is, the picture group including the pictures that were decimated in generating the low frame rate picture group 102 from the high frame rate picture group 101, is compressed and transmitted as second layer data.
The processing on a picture data receiving end will be described. First, the processing on a picture data receiving end where displaying is performed by a display device having the capability to display only at 30 frames/sec will be described. In this case, only the data transmitted in the first layer, that is, the low frame rate moving picture shown as the picture group 102, is received and is decompressed to restore the low frame rate (30 frames/sec) moving picture shown as the picture group 102. Then, by displaying the restored moving picture, it becomes possible to display the low frame rate (30 frames/sec) moving picture.
Next, the processing on a picture data receiving end where displaying is performed by a display device having the capability to display at 120 frames/sec will be described. In this case, both the data transmitted in the first layer and the data transmitted in the second layer are received. In other words, both the low frame rate moving picture shown as the picture group 102, and the picture group 103 of the pictures that were decimated in generating the low frame rate picture group 102 from the high frame rate picture group 101 are received.
In processing the received data, the data transmitted in the first layer is decompressed and the low frame rate (30 frames/sec) moving picture shown as the picture group 102 is restored. Next, the data transmitted in the second layer is decompressed and the picture group 103 is restored. Then, the restored moving pictures of the picture groups 102 and 103 are combined to generate the picture group 101. Then, this picture group 101 is displayed at a frame rate of 120 frames/sec. By the process described above, it becomes possible to display the moving picture at the high frame rate, that is, at a frame rate of 120 frames/sec.
Thus, a user having only a display device whose processable frame rate is low may display the low frame rate moving picture, while a user having a display device whose processable frame rate is high may display the high frame rate moving picture. When data is transmitted in a hierarchical structure having temporal scalability, even a user having a display device of an inferior frame rate can display a moving picture, albeit with a lower frame rate. This is one advantage of transmitting data with a hierarchical structure having temporal scalability.