In recent years, with the progress in information compression technology, a digital video/audio service providing video information and audio information by digital signals has been put to practical use for broadcasting media, such as ground broadcasting, satellite broadcasting, and CATV.
Under the existing circumstances, as a compressive coding method for the next generation, an object coding method has attracted attention. This object coding method is not to uniformly compress the whole image, i.e., video data corresponding to a single image, but to compress video data corresponding to d single image in units of individual objects constituting the image while paying attention to the contents of the image.
When video data corresponding to a single image is subjected to the compressive coding in object units, compressed (coded) video data is separable corresponding to the respective objects, whereby a specific object in the image can be extracted or replaced.
Meanwhile, as a method of implementing a data transmission format for making the best use of the object coding method, a method of multiplexing compressed video data, audio data, and other digital data is discussed.
There is MPEG4 as an international standard of a method of multiplexing data compressed by the object coding method (ISO/IEC JTC1/SC29WG11 N1483, "System Working Draft", November 1996). Hereinafter, a description is given of the data multiplexing method based on MPEG 4 and a method for reproducing the multiplexed data, with reference to figures.
FIG. 18 is a diagram for explaining the object coding method. In the figure, reference numeral 120 designates a scene (an image) in a series of images obtained from video data with audio. This scene 120 is composed of a plurality of objects (sub-images) making a hierarchical structure. To be specific, the scene 120 is composed of three objects: a background image (background) 121, a moving object 122 that moves in the background, and a background audio 123 attendant on the c background. The moving object 122 is composed of four objects: a first wheel 124, a second wheel 125, a body 126, and a moving object audio 127 attendant on the moving object. Further, the object of body 126 is composed of two objects: a window 128 and the other part 129. In the hierarchical structure, the objects 121.about.123 belong to the uppermost first layer L1, the objects 124.about.127 belong to the second layer L2 lower than the first layer L1, and the objects 128 and 129 belong to the third layer L3 lower than the second layer L2.
In the object coding method, scene data corresponding to the scene 120 are compressively (coded in units of the lowermost objects constituting the scene 120. In other words, scene data corresponding to the scene 120 are compressively coded for each of the objects 121, 123, 124, 125, 127, 128 and 129.
FIG. 19 is a diagram for explaining a data structure for transmitting coded data corresponding to the respective objects mentioned above, which is obtained by performing object coding to the scene data of the scene 120.
In FIG. 19, MEg shows a multiplexed bit stream having a prescribed format, obtained by multiplexing coded data of the respective objects and auxiliary data. This multiplexed bit stream MEg is transmitted as coded data corresponding to the scene data.
The multiplexed bit stream MEg is partitioned into plural packets in prescribed units, i.e., each packet having prescribed number of bytes, and coded data of the respective objects are allocated to the packets having their own values (SLC=1, 2, . . . ) as logical channels (LC).
To be specific, in the multiplexed bit stream MEg shown in FIG. 19, coded video data of object [1] is allocated to packets Pa3 and Pa6 having a logical channel SLC=3, coded video data of object [2] is allocated to packets Pa5 and Pa7 having a logical channel SLC=4, and coded audio data of object [3] is allocated to a packet 4 having a logical channel SLC=5. information relating to the byte number of packet when multiplexed, the logical channel LC of each packet, and the packet transmission order is allocated as control information to a packet having another logical channel (not shown) for transmission.
The objects [1] and [2] are the background image 121 and the moving object 122 shown in FIG. 18, respectively, and the object [3] is the background audio 123 shown in FIG. 18.
In the multiplexed bit stream MEg, allocated to the packet Pal of Logical channel SLC=1 is information relating to a scene composition method for regenerating the scene composed of the respective objects (composition stream), and allocated to the packet Pa2 of logical channel SLC=2 is information showing how the coded data of the respective objects are multiplexed (stream association table).
Accordingly, when a plurality of coded data obtained by object coding are multiplexed and transmitted, with the coded data of the respective objects, the composition stream showing the structure of a scene composed of the objects and the stream association table showing the correlation of the transmitted streams (each stream being a series of coded data corresponding to each object) are transmitted simultaneously.
FIG. 20 is a diagram for explaining a scene description according to the composition stream, illustrating a description SD corresponding to the single image (scene) 120 shown in FIG. 18.
In the scene description SD according to the composition stream, the image 120 is shown by Scene 140, and the fact that the image 120 shown by Scene 140 is composed of the background image 121, the moving object 122, and the background audio 123 is shown by Video(1) 141, Node(1) 142, and Audio(1) 143, respectively. Here, Scene 140, Video(1) 141, Node(1) 142, and Audio(1) 143 are descriptors describing the image 120, the background image 121, the moving object 122, and the background audio 123 shown in FIG. 18, respectively.
Further, in the scene description SD, the fact that the moving object 122 shown by Node(1) 142 is composed of the first wheel 124, the second wheel 125, the body 126, and the moving object audio 127 is shown by Video(2) 144, Video(3) 145, Node(2) 146, and Audio(2) 147, respectively, which are descriptors corresponding to these objects.
Further, the fact that the body 126 shown by Node(2) 146 is composed of the window 128 and the other part 129 is shown by Video(4) 148 and Video(5) 149, respectively, which are descriptors corresponding to these objects.
Each of the descriptors is given a stream index (stream id) for identifying a stream corresponding to coded data of each object in the multiplexed bit stream MEg. To be specific, as shown in FIG. 20, stream indices Sid=1.about.Sid=5 are given to the descriptors 141.about.145, respectively, and stream indices Sid=6, Sid=7, and Sid=8 are given to the descriptors 148, 149, and 147, respectively. Sid is a specific number of each stream id.
Accordingly, it can be seen from the scene description SD according to the composition stream that a scene is composed of what kinds of objects. However, the scene description SD according to the composition stream does not describe how the coded data corresponding to the respective objects are multiplexed in the actual multiplexed bit stream MEg.
FIG. 21 is a diagram for explaining the stream association table AT.
The stream association table AT shows the relationship between the stream corresponding to coded data of each object (i.e., a series of coded data corresponding to each object) and the logical channel (LC) specifying each packet which is the partition unit of coded data when multiplexed. To be specific, on this table AT, the stream indices (id) of the respective streams, the logical channel values (LC) corresponding to the respective streams, and the logical channel values (LC) corresponding to upper streams of the respective streams are correlated with each other. Here, the logical channel, LC corresponding to the upper stream of the streams (Sid=1.about.3) corresponding to the objects 121.about.123 of the first layer L1 corresponds to the logical channel LC (SLC=2) of the packet Pa2 to which the stream association table is allocated.
Accordingly, with reference to this table AT, the logical channel LC corresponding to each stream and the logical channel LC of its upper stream (host stream) can be specified.
As described above, since the stream indices (Sid) are added to the descriptors 141.about.145 and 147.about.149 of the respective objects in the scene description SD according to the composition stream shown in FIG. 20, the respective objects can be identified by the stream indices (Sid) from the composition stream and, therefore, the composition stream can be correlated with the stream association table shown in FIG. 21.
As described above, the multiplexed bit stream MEg includes the composition stream and the stream association table together with the coded data corresponding to the respective objects. Therefore, when the coded data of the respective objects are reproduced by decoding according to the multiplexed bit stream MEg, it is possible to extract or retrieve coded data of a specific object designated according to the composition stream and the stream association table. This enables, for example, edition of the objects 121 to 129 constituting the scene 120 on the reproduction end.
In the multiplexed bit stream format according to the prior art object coding, the scene description is expressed as information (composition stream) separated from information relating to the multiplexed state of the respective coded data and the logical channels corresponding to the respective streams (stream association table). The reason is as follows. In order to realize exchange of the contents of streams corresponding to the respective objects and to facilitate interface between the multiplexed bit stream and applications treating this multiplexed bit stream without changing the scene composition (i.e., the hierarchical structure of the objects constituting a scene), the structure for multiplexing, which depends on the physical layer of the multiplexed bit stream, must be separated from main information (coded data) included in the multiplexed bit stream.
However, the multiplexed bit stream format according to the prior art has the following drawbacks.
A great advantage of object coding resides in that it enables extraction of coded data of a specific object from the multiplexed bit stream, and retrieval of a specific object on the data base containing the multiplexed bit stream.
However, in order to recognize coded data of individual objects from the multiplexed bit stream MEg of the above-mentioned structure, a complicated procedure is required as follows. For example, to recognize coded data of lower-layer objects from plural objects having a hierarchical structure, initially, the scene description according to the composition stream included in the multiplexed bit stream MEg is interpreted to find an object corresponding to a node, and a stream corresponding to a lower object being a component of the object (node) is specified. Then, the stream association table AT is interpreted and, according to the stream id of the specified stream, a logical channel LC corresponding to the stream id is found. Thereby, coded data of the specified object can be extracted from the multiplexed bit stream MEg.
Furthermore, since the hierarchical relationship of the streams corresponding to the respective objects can be seen from the stream association table AT, it is possible to analogize coded data of a specific object according to the stream association table AT alone, but this analogy takes time and is not reliable.
That is, oil the stream association table AT, information relating to objects as nodes is not clearly defined. In addition, since this table AT does not show the type of stream corresponding to coded data (for example, whether a stream corresponds to video data or audio data), other information such as the composition stream should be referred to. Further, for each stream, only its upper stream is known from the table AT. So, it is impossible to uniquely know that coded data of each object is composed of which stream, and interpretation takes time.
For example, in the scene description SD according to the composition stream shown in FIG. 20, although Node(2) 146 corresponding to the subject 126 exists, a stream corresponding to Node(2) does not exist. So, on the stream association table AT shown in FIG. 21, an entry corresponding to Node(2) (i.e., stream id, LC corresponding to the stream, and LC corresponding to the stream's upper-layer stream) does not exists.
Accordingly, in order to extract the object 126 corresponding to Node(2), initially, stream indices (id) corresponding to the lower-layer objects 128 and 129 of Node(2) 146 must be decided on the basis of the scene description SD according to the composition stream (refer to FIG. 20) and, thereafter, the logical channels (LC) of packets containing the streams having the decided stream indices must be defined on the stream association table AT (refer to FIG. 21).
Further, there is a case where coded data corresponding to plural objects are transmitted without being multiplexed in a particular transmission medium, such as computer network (internet). In this case, the bit stream has a data structure including no logical channels, and does not include the stream association table.
In this case, detection of a specific object from the bit stream is carried out by interpreting the hierarchical structure of the objects on the basis of the scene description SD according to the composition stream. However, when the number of the objects increases considerably, it requires a lot of time to interpret the hierarchical structure of the objects on the basis of the composition stream, resulting in poor controllability in replacement or edition of objects in a scene.