In recent years, we have greeted the age of multimedia in which audio, video and other data are integrally handled, and the conventional information media, i.e., means for transmitting information to men such as newspapers, magazines, televisions, radios, and telephones, have been adopted as the objects of multimedia. Generally, “multimedia” means to represent, not only characters, but also diagrams, speeches, and especially images simultaneously in relation with each other. In order to handle the conventional information media as the objects of multimedia, it is necessary to convert the information of the media into a digital format.
When the data quantity of each information medium described above is estimated as a quantity of digital data, in the case of characters, the data quantity for each character is 1˜2 byte. However, in the case of speech, the data quantity is 64 kbits per second (quality for telecommunication) and, in the case of moving picture, it is more than 100 Mbits per second (quality for current television broadcasting). Thus, in the information media such as televisions, it is not practical to process such massive data as it is in a digital format. For example, although visual telephones have already been put to practical use by ISDN (Integrated Services Digital Network) having a transmission rate of 64 kbps˜1.5 Mbps, it is impossible to transmit an image of a television camera as it is by the ISDN.
As a result, data compression technologies are demanded. In the case of visual telephones, a moving picture compression technology standardized as H.261 by ITU-T (International Telecommunication Union-Telecommunication Sector) is employed. Further, according to a data compression technology of MPEG1, it is possible to record image data, together with audio data, in an ordinary music CD (compact disk).
MPEG (Moving Picture Experts Group) is an international standard relating to a technology for compressing and expanding an image signal corresponding to a moving picture, and MPEG1 is a standard for compressing moving picture data to 1.5 Mbps, i.e., compressing data of a television signal to about 1/100. Since the transmission rate to which MPEG1 is directed is limited to about 1.5 Mbps, MPEG2 capable of compressing moving picture data to 2˜15 Mbps has been standardized to meet the demand for higher image quality.
In the image signal compression and expansion technologies according to MPEG1 and MPEG2 which have already been put to practical use, only a fixed frame rate is basically employed, namely, intervals between image display timings of the respective frames are regular. As a result, there are only several kinds of frame rates, and in MPEG2 a frame rate designated by a flag (frame rate code) which is transmitted with coded data is selected from plural frame rates (frame rate values) with reference to a table shown in FIG. 13.
Under the existing circumstances, standardization of MPEG4 is now proceeded by the working group for standardization of MPEG1 and MPEG2 (ISO/IEC JTC1/SC29/WG11). MPEG4 enables coding and signal operation in object units, and realizes new functions required in the age of multimedia. MPEG4 enables coding and signal operation in object units, and realizes new functions required in the age of multimedia. MPEG4 has originally aimed at standardization of image processing at a low bit rate, but the object of the standardization is now extended to more versatile image processing including high-bit-rate image processing adaptable to an interlaced image.
Also in MPEG4, when a table similar to the table for MPEG2 (refer to FIG. 13) is added at the beginning of a video object layer (corresponding to a video sequence of MPEG2), frame rates can be expressed according to the table. In MPEG4, however, since image signals in a broad range from an image signal of a low bit rate to a high-quality image signal of a high bit rate are processed, the number of frame rates required is out of count. Therefore, it is difficult to perform decision of frame rates by the use of a table.
As a result, MPEG4 employs a data structure including frame display time data inserted in each frame in order to deal with almost uncountable number of fixed frame rates and, furthermore, to process an image having variable intervals of image display timings or decoding timings of the respective frames.
FIG. 14 shows a data structure of a conventional coded image signal 200.
The coded image signal 200 corresponds to one image (in MPEG4, a series of frames constituting an image corresponding to one object) and includes a header H at the beginning. The header H is followed by code sequences Sa0, Sa1, Sa2, . . . , San corresponding to frames F(0), F(1), F(2), . . . , F(n), respectively, which code sequences are arranged according to priority for transmission (transmission order). Here, “n” is the number indicating the transmission order of data of each frame in the frame sequence corresponding to one image, and n of the head frame is 0.
In this example, at the beginnings of the code sequences Sa0, Sa1, Sa2, . . . , San of the respective frames, display time data Dt0, Dt1, Dt2, . . . , Dtn indicating the display timings of the frames are arranged. The respective display time data are followed by coded image data Cg0, Cg1, Cg2, . . . , Cgn.
Since each of the display time data indicates a time relative to a reference time, the quantity of data required for expressing this display time, i.e., the bit number of the display time data, increases as the number of the frames constituting the image increases.
Further, at the decoding end of the coded image signal, according to the display time data Dt0-Dtn inserted in the code sequences Sa0-San corresponding to the respective frames, image display of each frame is carried out at the time indicated by the display time data.
FIG. 15 shows the transmission order and the display order of the coded image data corresponding to each frame in the series of frames. As described above, “n” indicates the transmission order, and “n′” indicates the display order (n′ of the head frame is 0). Further, frames F(n) (F(0)˜F(18)) are arranged based on the order of frames in the data structure shown in FIG. 14 (transmission order). The frames F(n) arranged in the transmission order are rearranged according to the display order of the frames as shown by arrows in FIG. 15, resulting in frames F′(n′) (F′(0)˜F′(18)) arranged in the display order. Accordingly, a frame F(n) and a frame F′(n′) related to each other with an arrow are identical. For example, the frames F(0), F(1), F(2), and F(3) are identical to the frames F′(0), F′(3), F′(1), and F′(2), respectively.
Amongst the frames F(n) (F(0)˜F(18)) arranged in the transmission order, the frames F(0) and F(13) are I (Intra-picture) frames (hereinafter also referred to as I-VOP), the frames F(1), F(4), F(7), F(10), and F(16) are P (Predictive-picture) frames (hereinafter also referred to as P-VOP), and the frames F(2), F(3), F(5), F(6), F(8), F(9), F(11), F(12), F(14), F(15), F(17), and F(18) are B (Bidirectionally predictive picture) frames (hereinafter also referred to as B-VOP).
When the frames F(n) (F(0)˜F(18)) arranged in the transmission order (IPBBPBBPBBPBBIBBPBB) are rearranged in the display order (IBBPBBPBBPBBPBBIBBP), this display order n′ is represented by frame numbers B(n) (B(0)˜B(18)) corresponding to the respective frames F(n). That is, the frame numbers B(n) represent the numbers n′ indicating the display order. To be specific, as shown in FIG. 15, B(0)=0, B(1)=3, . . . , B(17)=16, B(18)=17. Accordingly, the image display cycle L of the I-VOPs is 15, and the image display cycle M of the VOPs including both of the I-VOPs and the P-VOPs is 3.
The frame number B(n)=n′ is represented by the following formulae (1)˜(3) using n.B(n)=n=0(n=0)  (1)B(n)=n+M−1(n=M×i+1)  (2)wherein i and M are integers not less than 0 (0, 1, 2, . . . ).B(n)=n−1 (when n is other than the above values)  (3)
The first I-VOP satisfies the condition (n=0), the I-VOPs other than the first I-VOP and the P-VOPs satisfy the condition (n=M×i+1), and the B-VOPs satisfy the condition (when n is other than the above values).
Formulae (1)˜(3) define the relationship B(n)=n′ between the display order n′ and the transmission order n in the case where the code sequences of the frames corresponding to the respective I-VOPs, P-VOPs, and B-VOPs are transmitted periodically. In other cases than mentioned above, the display order n′ and the transmission order n are correlated one to one by a relational expression or a method other than formulae (1)˜(3).
FIG. 16 is a diagram for explaining an example of an image display method wherein the intervals of the image display timings of the respective frames are variable.
In the figure, t′(n′) (t′(1), t′(2), t′(3), t′(4), . . . ) indicates the interval between the time at which image display of the frame F′(n′−1) is performed and the time at which image display of the frame F′(n′) is performed, and h′(1), h′(2), and h′(3) indicate the times for image display of the frames F′(1), F′(2), and F′(3), respectively, with the time h′(0) for image display of the frame F′(0) as a reference. Further, h(n) (h(1), h(2), h(3), h(4), . . . ) indicates the time for image display of the frame F(n) (F(1), F(2), F(3), F(4), . . . ) with the time h′(0) for image display of the frame F(0)=F′(0) as a reference. Accordingly, the display time h′(n′) of the frame F′(n′) arranged in the display order is expressed by h′(n′)=h′(n′−1)+t′(n′), and h′(0)=0.
Next, decoding and image display of the coded image signal having the data structure shown in FIG. 14 will be briefly described using FIG. 16.
At the decoding end, when the coded image signal 200 shown in FIG. 14 is input, the coded image data Cg0, Cg1, Cg2, . . . of the respective frames F(0), F(1), F(2), . . . as the constituents of the coded image signal 200 are decoded, and the images corresponding to the frames F(0), F(1), F(2), . . . are displayed at the image display times h(0), h(1), h(2), . . . based on the display time data Dt0, Dt1, Dt2, . . . of the respective frames.
In this way, even when the intervals between the image display timings of the respective frames (image display cycle) of the coded image signal are not fixed, i.e., are variable, the coded image signal is decoded at the decoding end and displayed at a prescribed timing.
When the intervals between the image display timings of the respective frames of the coded image signal are fixed, as in the case where the intervals are variable, the images corresponding to the frames F(0), F(1), F(2), . . . are displayed at the image display times h(0), h(1), h(2), . . . based on the display time data Dt0, Dt1, Dt2, . . . of the respective frames.
By the way, when expressing a frame rate (number of frames displayed in a second) simply with k bits (k: natural number), a frequency used for television broadcasting, for example, 29.97. Hz (to be exact, 30000/1001 Hz) cannot be expressed.
As a result, such a frame rate is expressed as follows. That is, a prescribed time interval (1 modulo time), for example, one second, is divided into N (N: natural number) to obtain a sub-unit time (1/N) and, using this as a unit of time (1 time tick), the display time of each frame is expressed for both of the image having a variable frame rate and the image having a fixed frame rate.
To be specific, as shown in FIG. 17(a), the display time of each of the images VOP0, VOP1, VOP2, and VOP3 corresponding to the frames F′(0), F′(1), F′(2), and F′(3) arranged in the display order is expressed by y (VOP rate increment) pieces of 1/N (sub-unit time) with a time X as a reference, that is, it is expressed by y/N. For the images VOP1, VOP2, VOP3, and VOP4, y is defined as follows: y=y′0, y=y′1, y=y′2, and y=y′3, respectively.
FIG. 17(c) shows a coded image signal 200a having a data structure in which the image display timings of the respective frames are expressed by using the sub-unit time (1/N sec) and y.
The coded image signal 200a includes a header H containing sub-unit time data Dk that indicates N (natural number) for obtaining the sub-unit time, and the header H is followed by code sequences Sbn (SB0, Sb1, Sb2, . . . ) corresponding to the respective frames F(n) (F(0), F(1), F(2), . . . ). Each code sequence Sbn contains display cycle multiplier data Dyn (Dy0, Dy1, Dy2, . . . ) indicating a display time h(n) (h(0), h(1), h(2), . . . ) which is measured by using the sub-unit time (1/N), and the number y of (1/N), with the time X as a reference.
In FIG. 17(c), Cgn (Cg0, Cg1, Cg2, . . . ) are coded image data corresponding to the respective frames F(n) (F(0), F(1), F(2), . . . ).
However, when the image VOP0 is an I-VOP (I frame), the VOP2 and VOP3 are B-VOPs (B frames), and the VOP4 is a P-VOP (P frame) as shown in FIG. 17(b), in the bit stream of the coded image signal 200a shown in FIG. 17(c), the P-VOP (VOP3) and the B-VOP (VOP1) are arranged as the code sequences of the frames F(1) and F(2) which follow the code sequence of the frame F(0) corresponding to the I-VOP (VOP0).
A description is now given of the drawbacks of the image signal data structures described with respect to FIGS. 14-16.
As described above, in a coded image signal obtained by coding an image signal having a fixed interval T of frame display timings, the image display timing h(n) of each frame is expressed by h(n)=n′×T, wherein n′ is the number indicating the order of display, and n′=B(n).
In other words, when the coded image signal having the fixed frame-display interval T (i.e., a coded signal of an image having a fixed frame rate) is decoded for display, if the period T (the fixed display interval) is detectable at the decoding end, the display time h(n) of the n-th frame F(n) in the transmission order can be uniquely decided by increasing the display interval T by n′ (=B(n)) times. Nevertheless, when decoding the coded image signal, there is no choice but to perform complicated display using the display time data Dtn (Dt0, Dt1, Dt2, . . . ) inserted in the coded image signals corresponding to the respective frames F(n) (F(0), F(1), F(2), . . . ) as shown in FIG. 14.
Next, a description is given of the drawbacks of the image signal data structures described with respect to FIGS. 17(a)-17(c).
As described above, in the image signal data structure proposed by the current MPEG4, even when the frame rate is fixed, the value of the frame rate cannot be known unless several frames are decoded and, therefore, it is difficult to simplify the circuit structure for implementing the actual decoding process.
This problem will be briefly described. When the VOP0 is an I-VOP (I frame), the VOP1 and the VOP2 are B-VOPS (B frames), and the VOP3 is a P-VOP (P frame) as shown in FIG. 17(b), since the frame F(0) corresponding to the I-VOP (I frame) is followed by the frame F(1) corresponding to the P-VOP (P frame) and the frame F(2) corresponding to the B-VOP (B frame) in the bit stream of the coded image signal 200a shown in FIG. 17(c), the frame display cycle (1 fixed VOP increment), i.e., the interval between the display timing of the I-VOP and the display timing of the following B-VOP (B frame), cannot be known until the frame F(2) corresponding to the B-VOP (B frame) is transmitted.