In recent years, distribution of images under low-bitrate environments, such as the Internet, mobile terminals, and the like, is becoming widespread. As image compression coding methods to be employed under these low-bitrate environments, there are H.263 standardized by ITU-T, MPEG-4 standardized by ISO/IEC, and the like.
MPEG-4 is able to perform coding at a variable frame rate (hereinafter referred to as “variable-frame-rate coding”), in contrast to MPEG-2 which has been adopted for DVD and become widespread. Variable-frame-rate coding is a coding scheme which permits setting of display time intervals of arbitrary frames, and this coding scheme is generally employed when optimum coding is carried out according to the complexity of a video signal in video signal compression, the property of the video signal (e.g., presence or absence of motion), and the like.
FIG. 10 shows an example of coded data at a variable frame rate. In FIG. 9, a video signal of 30 Hz is coded such that the first half is coded at 15 Hz by skipping every other frame while the second half is coded at 10 Hz by skipping two of every three frames.
Hereinafter, a coding process at a variable frame rate as shown in FIG. 10 will be described with reference to FIGS. 11 to 16.
Initially, the construction of a conventional data coding apparatus will be described with reference to FIG. 11.
FIG. 11 is a block diagram illustrating the construction of the conventional data coding apparatus.
In FIG. 11, the conventional data coding apparatus is provided with an image capture means 81 for capturing an image inputted from a camera; a coding judgement means 82 for judging whether a frame signal of the captured image is to be coded or not; a coding means 83 for coding the captured data; an MP4 file coding means 85 for converting the data coded by the coding means 83 into a file format standard based on MPEG-4 (hereinafter referred to as “MP4”); and a recording medium 86 on which the MP4 file is recorded.
Hereinafter, a description will be given of a sequential processing operation of the conventional data coding apparatus having the above-mentioned construction.
In the data coding apparatus, initially, the image capture means 81 converts an image supplied from the camera into an image frame signal (hereinafter, referred to as “frame”), and outputs the frame together with the frame number which increments at the frame rate of the camera (e.g., 30 Hz) from when the image capture is started, to the coding judgement means 82.
Then, the coding judgement means 82 calculates a time stamp of the inputted frame on the basis of the frame number and frame information of the camera, which has previously been given, and judges whether the inputted frame should be coded or not, on the basis of the time stamp, the total amount of data which have already been coded, and the output bit rate.
The criteria of judgement as to whether the inputted frame should be coded or not depend on various conditions of the data coding apparatus (e.g., whether the bit rate is variable or fixed, whether the buffer model defined in the standard should be respected or ignored, which is more important between the image quality of the inputted frame and the frame rate, etc.), and the applications of the data coding apparatus. However, specific criteria for judgement are not described here.
When the coding judgement means 82 judges that the inputted frame should be coded, the inputted frame and the time stamp thereof are outputted to the coding means 83, wherein coding of the frame is carried out. The time stamp of the frame is subjected to scale conversion or offset processing, and converted into an MP4 time stamp to be recorded in the MP4 file.
On the other hand, when the coding judgement means 82 judges that the inputted frame should not be coded, the frame and its time stamp are not outputted to the coding means 83 but are discarded, and the apparatus waits for a next frame to be inputted from the image capture means 81.
Generally, the above-mentioned data coding process by the coding means 83 takes much time. Further, depending on the characteristics of the image of the inputted frame, there may be cases where the coding process is not completed by the time a next frame captured by the image capture means 81 is outputted to the coding judgement means 82. In this case, the next captured frame is discarded in the coding judgement means 82. Since the discarded frame is not outputted to the coding means 83, the coded data corresponding to the frame is skipped.
Then, the coded data obtained by the coding means 83 and the amount of coded data are outputted to the MP4 file coding means 85. Further, the amount of coded data is also outputted to the coding judgement means 82. In the coding judgement means 82, the amount of coded data is used as a criterion for “bit rate fixed” or “buffer model respected”, among the above-mentioned criteria for judgement as to whether the inputted frame should be coded or not.
In the MP4 file coding means 85, the coded data and the amount of coded data are converted into an MP4 file and recorded on the recording medium 86.
Hereinafter, the structure of the MP4 file will be described with reference to FIGS. 12(a) and 12(b).
FIG. 12(a) shows an example of coded data at a variable frame rate, and FIG. 12(b) shows the structure of an MP4 file corresponding to the variable-frame-rate coded data.
The MP4 file is composed of plural Atoms obtained by tabulating information of each frame inputted, such as “Sample Size Atom” where the sizes of coded data are stored, “Sample-To-TimeStamp Atom” where the display intervals are stored, “Movie Data Atom” where the coded data are stored, and the like.
First of all, in the Sample-To-TimeStamp Atom, a pair of number-of-frames “num” and frame interval “dur” are described for each section of a fixed frame rate, successively from the head of the coded data, and the number of sections is described before the table comprising the numbers-of-frames “num” and the frame intervals “dur”. Further, in the Sample Size Atom, the sizes of the respective frames are described in the order of the frame numbers, and the number of items (i.e., the total number of frames) is described before the table comprising the frame sizes. In the Movie Data Atom, data of the respective samples are successively stored.
For example, it is assumed that the frames captured by the image capture means 81 are coded by the coding means 83 to be variable-frame-rate coded data having three different frame rates as shown in FIG. 12(a). In this case, as shown in FIG. 12(b), in the Sample Size Atom, the number-of-items “9”, and the sizes of the coded data of the respective frames from the 1st sample to the 9th sample are described in the order of frame numbers. In the Movie Date Atom, the data of the samples from the 1st sample to the 9th sample are successively stored. In the Sample-To-TimeStamp Atom, number-of-items “3”, number-of-frames “num=3” and frame interval “dur-2” for section 1, “num=2” and “dur=3” for section 2, and “num=4” and “dur=1” for section 3 are described since the interval of the first three frames of the above-mentioned coded data is “2”, the interval of the next two frames is “3”, and the interval of the next four frames is “1”. Although, in the above description, one sample corresponds to one frame for simplification, the data storage method into each Atom is the same as described above even when 1024 samples correspond to one frame.
Next, a description will be given of the MP4 file coding means 85 which converts the coded data into the MP4 file as described above, with reference to FIGS. 13 and 14.
FIG. 13 is a block diagram illustrating an example of the construction of the conventional MP4 file coding means 85, and FIG. 14 is a flowchart illustrating a sequential processing operation of the conventional MP4 file coding means 85.
Initially, the construction of the MP4 file coding means 85 will be described with reference to FIG. 13.
The MP4 file coding means 85 receives coded data, the amount of coded data, and an MP4 time stamp to be used when the coded data is recorded as an MP4 file. The MP4 file coding means 85 is provided with a coded data temporary storage means 41 for temporarily holding the inputted coded data and the amount of coded data; a mdat atom formation means 43 for temporarily forms Movie Data Atom which is a data area where the coded data in the MP4 file is stored, on the mdat atom temporary storage means 46, by using the coded data supplied from the coded data temporary storage means 41; a stsz atom formation means 44 for forming Sample Size Atom which is a data area where the coded data amount in the MP4 file is recorded, on a stsz atom temporary storage means 47, by using the coded data amount supplied from the coded data temporary storage means 41; a stts atom formation means 45 for forming Sample-To-TimeStamp Atom which is a data area where the frame-to-frame time intervals in the MP4 file are recorded, on a stts atom temporary storage means 48, by using the MP4 time stamp supplied from the outside; and a temporary data coupling means 49 for coupling the respective Atoms which are separately formed on the respective temporary storage means 46˜48 by the respective formation means 43˜45, after inputting of the coded data to be recorded is completed. In the case where no MP4 time stamp is supplied from the outside to the MP4 file coding means 85, the MP4 file coding means 85 is provided with an MP4 time stamp reading means 42 which reads the time stamp of the coded data stored in the coded data temporary storage means 41, and subjects the time stamp to scale conversion or offset processing to covert it into an MP4 time stamp which is to be recorded in the MP4 file.
Although there are various methods of forming Movie Data Atom by the mdat atom formation means 43 according to the purposes, since the formation methods are insignificant in describing the present invention, specific descriptions thereof will be omitted. Furthermore, as for the Sample Size Atom formation method by the stsz atom formation means 44, the amounts of coded data are arranged from the beginning. The Sample-To-TimeStamp Atom formation method by the stts atom formation means 45 will be described later.
When inputting of all coded data to be recorded is completed, a stream end signal is inputted to the MP4 file coding means 84, whereby each of the atom formation means 43˜45 performs processing for completing each Atom, such as writing of the number of table items. Thereafter, the respective Atoms are rearranged in appropriate positions by the temporary data coupling means 49, and the created MP4 file is outputted to the recording medium 86.
Hereinafter, a description will be given of the Sample-To-TimeStamp Atom formation method by the stts atom formation means 45 in the MP4 file coding means 85, with reference to a flowchart shown in FIG. 14.
When data processing is started in the data coding apparatus, the MP4 file coding means 85 performs initialization (step S1). In step S1, “in” indicates the total number of frames inputted to the MP4 coding means 85, “index” indicates the section number at a certain point of time in the Sample-To-TimeStamp Atom, “n” indicates the number of unprocessed frames, i.e., the number of frames which have not yet been entered in the Sample-To-TimeStamp Atom, among the frames inputted to the data coding apparatus, and “Tp” indicates the MP4 time stamp of just-previous coded data.
When the coded data, the amount of coded data, and the MP4 time stamp are inputted to the MP4 file coding means 85, the MP4 time stamp is set at “Ts”, the number-of-input-frames “in” is incremented by 1, and the number-of-unprocessed-frame “n” is incremented by 1 (step S2).
Next, it is judged whether the number-of-input-frames “in” is in≧3 or not. When in≦2, the process goes to step S6. When in≧3, the process goes to step S4. The reason of this bifurcation is as follows. Since the Sample-To-TimeStamp Atom describes the frame-to-frame time intervals, at least two frames are required. Further, since the Sample-To-TimeStamp Atom describes the number of frames having the same frame interval, writing of the first item cannot be performed unless there are at least three frames.
When in≦2, the frame interval d is calculated, and the MP4 time stamp Ts of the currently inputted frame is recorded as the MP4 time stamp Tp of the just-previous frame (step S6). Thereafter, the processes in steps S2˜S6 are repeated until a stream end signal indicating the end of coded data input is inputted to the stts atom formation means 45 (step S7).
In the above-mentioned repetition, when in≧3 in step S3, it is judged whether or not the frame interval d (=Ts−Tp) between the current frame and the previous frame matches the frame interval d which has previously been calculated (step S4). When it is judged that there is no match, a table item is added to the Sample-To-TimeStamp Atom on the basis of the definition of the Sample-To-TimeStamp Atom (step S5). That is, number-of-frames num=n−2 and frame interval dur=d are added to the Sample-To-TimeStamp Atom. After the addition of the item to the Sample-To-TimeStamp Atom, the section number “index” is incremented by 1 and the number-of-unprocessed-frames “n” is Set to “n=2”. These two frames are the first frame (just-previous MP4 time stamp Tp) and the second frame (current MP4 time stamp Ts) in the section having the latest frame interval d (=Ts−Tp).
When a stream end signal is inputted to the stts atom formation means 45 (step S7), the atts atom formation means 45 performs the processing for completing the Sample-To-TimeStamp Atom, such as writing of the number of table items.
That is, as already described with respect to step S5, at the point of time of step S7, the frame interval information relating to two frames is temporarily stored in the stts atom formation means 45, and it is not added to the table items of the Sample-To-TimeStamp Atom. Accordingly, when the stream end signal is inputted, processing for these two frames must be performed. When the number-of-input-frames “in” is smaller than 2, although it cannot happen usually, an operation different from the usual operation should be carried out.
Hereinafter, a description will be given of the case where the number-of-input-frames “in” is 0 or 1 when the stream end signal is inputted, which cannot happen usually.
When the number-of-input-frames “in” is 0 (in=0), i.e., when no frame is inputted, the process goes to step S9. In this case, of course there is no necessity of adding an item to the Sample-to-TimeStamp Atom. Then, the section number “index” is set at 0 (step S9), and the value, 0, is written in the Sample-To-TimeStamp Atom as the number of items (step S12).
When the number-of-input-frames “in” is 1 (in−1), number-of-frames “num=1” and frame interval “dur=du” (du: arbitrary number) are added and, further, the section number “index” is set at 1 (step S9), and thereafter, the value, 1, is written as the number of items in the Sample-To-TimeStamp Atom (step S12). Since the frame interval “dur” is not defined unless there are two frames, the frame interval “du” to be added as the frame interval “dur” in step S9 is not a significant value but an arbitrarily decided value.
When the number-of-input-frames “in” is equal to or larger than 2 (in≧2), which is the normal state, since n pieces of frames remain unprocessed, these frames are added to the Sample-To-TimeStamp Atom. That is, an item corresponding to number-of-frames “num=n” and frame interval “dur=d” is added to the Sample-To-TimeStamp Atom and, further, the section number “index” is incremented by 1 (step S11). Thereafter, the value of the “index” incremented by 1 is written as the number of items in the Sample-To-TimeStamp Atom (step S12), Thus, the Sample-To-TimeStamp Atom table formation process is completed.
Next, a description will be given of the case where MPEG-4 data are transmitted from a base station to a data recording apparatus according to RTP (Real-time Transport Protocol), with reference to FIG. 17.
FIG. 17 is a block diagram illustrating the construction of a conventional data recording apparatus as a mobile terminal, and MPEG-4 data from a transmitter are transmitted employing RTP.
Hereinafter, the conventional data recording apparatus will be described.
The data recording apparatus comprises an RTP receiver 91, an RTP reception buffer 92, an RTP decoder 93, an MP4 file encoder 95, and a recording medium 96. First of all, the RTP receiver 91 receives, from the RTP transmitter 90 as a base station, MPEG-4 coded data which are divided into units of video packets and stored in RTP packets, and the RTP receiver 91 stores the RTP packets into the reception buffer 92.
The above-mentioned video packets are units of data into which a frame is divided. Even when a video packet is lost or an error occurs in a video packet, other video packets can be normally decoded. Further, each RTP packet is given an RTP time stamp, and a value obtained by adding a random offset to a coded data time stamp possessed by coded data stored in the RTP packet is set. Usually, when one frame of coded data is divided into plural RTP packets, these RTP packets are given RTP time stamps of the same value. Further, these RTP packets are given sequence numbers, and the receiving end, i.e., the data recording apparatus, can confirm packet disappearance by checking the continuity of the sequence numbers.
The RTP decoder 93 receives at least one RTP packet having the same time stamp from the reception buffer 82, and restores the RTP packet to MPEG-4 data by removing an RTP header from the RTP packet. The MP4 file encoder 95 converts the MPEG-4 data from the RTP decoder 93 into an MP4 file, and stores it in the recording medium 96. Information required for conversion into the MP4 file, such as the MP4 time stamp, the frame size, and the like, can be obtained from the RTP time stamp added to the RTP packet and the size, in the RTP decoder 93.
When MPET-4 data are transmitted from the RTP transmitter 90 as a base station to the data recording apparatus as a mobile terminal by employing RTP, there may be cases where RTP packets from the RTP transmitter 90 do not reach the data recording apparatus. For example, when plural RTP packets, into which frames are divided, are transmitted from the RTP transmitter 90, if some of the RTP packets are lost or all RTP packets constituting a frame are lost, the data recording apparatus skips the data which cannot be normally received by the RTP receiver 91, and records only the data which are normally received by the RTP receiver 91, as an MP4 file on the recording medium 96.
The variable-frame-rate coding described above has the advantage over the fixed-frame-rate coding in that coding can be carried out according to the compression ratio of an image signal. On the other hand, since the variable-frame-rate coding has a degree of freedom in the frame rate, when variable-frame-rate coded data are converted into the MP4 file, the table size of the Sample-To-TimeStamp Atom in the MP4 file, where the frame intervals are recorded, depends on the coded data. Therefore, the table size cannot be assumed, and data restoration becomes difficult when an abnormal condition occurs in the apparatus. Further, the volume of data processing required when the coded data recorded as an MP4 file are decoded from the MP4 file, is increased.
To be specific, in the conventional data coding apparatus shown in FIG. 11, when the MP4 file coding means 85 converts the variable-frame-rate coded data into the MP4 file, a new item must be stored in the Sample-To-TimeStamp Atom, every time the frame interval of the coded data changes, according to the result of judgement by the coding judgement means 82. Thereby, in the MP4 file, the table size of the Sample-To-TimeStamp Atom increases as well as the Sample Size Atom and the Movie Data Atom and, therefore, more capacity of the recording medium 86 is required as compared with the case where fixed-frame-rate coded data are recorded as an MP4 file. The increase in the table size of the Sample-To-TimeStamp Atom is evident from the comparison between the MP4 file structure in the case where the fixed-frame-rate coded data are converted into the MP4 file (FIG. 15) and the MP4 file structure of the variable-frame-rate coded data (FIG. 12). That is, when the coded data are based on the fixed frame rate as shown in FIG. 15, as the frame interval is always dur=1, the number of items on the table of the Sample-to-TimeStamp Atom is one, and the table size does not change.
Further, in the above-described Sample-To-TimeStamp Atom formation method, when the variable-frame-rate coded data are converted into the MP4 file, the table of the Sample-To-TimeStamp Atom is increased according to the result of judgement by the coding judgement means 82 for every input frame, in contrast to the Sample Size Atom or Movie Data Atom formation method. Therefore, it is impossible to estimate the final table size when recording of the MP4 file of the coded data is started.
Furthermore, in the conventional data coding apparatus, when the variable-frame-rate coded data are converted into the MP4 file by the MP4 file coding means 85, the Sample Size Atom and the Movie Data Atom are formed while writing the data successively in the stsz atom temporary storage means 47 and the mdat atom temporary storage means 46 for each unit of coded data (one frame). Therefore, even when an abnormal condition occurs in the data coding apparatus, data restoration is possible. However, when the Sample-To-TimeStamp Atom is formed, since the item of the table is determined at the time when the frame rate changes, the data on the table of the Sample-To-TimeStamp Atom cannot be restored completely if an abnormal condition occurs in the data coding apparatus.
Furthermore, as the table size of the Sample-To-TimeStamp Atom in the MP4 file is increased, the volume of processing required when reproducing the data recorded on the recording medium 86 is increased, as described below.
First of all, the Sample-To-TimeStamp Atom is interpreted, and the sample number is calculated from the MP4 time stamp or the MP4 time stamp is calculated from the sample number.
Hereinafter, a description will be given of the process of obtaining a sample number N corresponding to a given MP4 time stamp (T) from the Sample-To-TimeStamp Atom, with reference to FIG. 16.
FIG. 16 is a flowchart illustrating a sequential flow of the process of searching for a sample corresponding to a given MP4 time stamp by using the Sample-To-TimeStamp Atom.
Initially, an MP4 time stamp T is set (step S21), and the section number “index”, the MP4 time stamp “T0” of the first frame in the section indicated by the section number “index”, and the number-of-frames “N0” included in the section indicated by the section number “index” are initialized (step S22). Then, the number-of-frames “num” and the frame interval “dur”, which are included in the section indicated by the current index, are extracted from the Sample-To-TimeStamp Atom (step S23), and furthermore, an end time Te of the section indicated by the current index is calculated (step S24). The end time Te of the section indicated by the current index is obtained by T0+num*dur.
Then, the end time Te of the section indicated by the current index is compared with the MP4 time stamp T to judge whether the MP4 time stamp is included in the section or not (step S25).
When T<Te in step S25, it is judged that the MP4 time stamp is included in the section indicated by the current “index”, and the sample number of the MP4 time stamp is decided (step 327). The sample number is decided as follows. Since the time up to the section indicated by the current “index” is the MP4 time stamp T0 of the first frame in the section indicated by the section number “index”, the time up to the frame indicated by the MP4 time stamp T is T−T0. Further, since the frame interval of this section is “dur”, the frame of the sample indicated by the MP4 time stamp T is the {(T−T0)/dur}-th frame from the beginning of this section. Accordingly, the sample number N of the frame is N0+(T−T0)/dur.
On the other hand, when T>Te in step S25, it is judged that the MP4 time stamp T is not included in the section indicated by the current “index”, and the next item on the table of the Sample-To-TimeStamp Atom takes place (step S26). That is, the end time Te of the section obtained in step S24 is set as the start time T0 of the next section, and the sum of the number-of-items “N0” up to this section and the number-of-frames “num” of this section is set as the number-of-items “N0” until the next section, did the section number “index” is incremented by 1. Thereafter, the process returns to step S23, and the above-mentioned processes are repeated until it is judged that T<Te in step S25. When “index” becomes equal to or larger than the number of items in step S26, it means that the frame indicated by the MP4 time stamp has not been detected.
As described above, when the number of items on the table of the Sample-To-TimeStamp Atom increases, the processes in steps S23 to S26 must be repeated until a section including the MP4 time stamp T is detected, whereby the time and effort for detecting the sample number N of the frame indicated by the MP4 time stamp T are significantly increased. Further, also in the process of obtaining the MP4 time stamp T from the sample number N, which is the inverse of the above-mentioned process, the volume of processing increases in proportion to the number of items on the Sample-To-TimeStamp Atom.
Furthermore, since packet retransmission is not performed in the data transmission using RTP shown in FIG. 17, packet delays are not accumulated and, therefore, this data transmission is suitable for real-time transmission. However, if a delay or temporary cut-off occurs in the network, some packets might be lost before reaching the data recording apparatus at the receiving end. In view of such property of RTP, in the data recording apparatus which records, as an MP4 file, the MPEG-4 data received from the RTP transmitter 90 at the base station by using RTP, there is a great possibility that the transmitted coded data may be lost. When the transmitted data are lost, the frame rate of the received coded data changes frequently, resulting in a considerable increase in the number of table items of the Sample-To-TimeStamp Atom which holds the frame-to-frame display intervals.
Furthermore, assuming that the above-described data coding apparatus or data recording medium is mounted on a mobile terminal which is significantly restricted by its physical size or power consumption or the capacity of the recording medium, the above-described problems, i.e., the increase in the data size of the Sample-To-TimeStamp Atom, the difficulty in completely restoring the data of the Sample-To-TimeStamp Atom, and the considerable volume of processing when reproducing the data, will lead to various problems.