The present invention relates to an image output apparatus and an image reproduction method. More particularly, the present invention relates to a reproduction process for decoding and compositing encoded video object data corresponding to a plurality of objects composing a predetermined image (scene), to reproduce video data corresponding to the predetermined scene.
The present invention also relates to an object composition apparatus and an object composition method and, more particularly to an object composition process for compositing object data corresponding to video data of respective objects according to auxiliary information relating to a composite image and the respective objects.
Moreover, the present invention relates to a data storage medium which contains a program for implementing the reproduction process by software and a program for implementing the object composition process by software.
In recent years, we have greeted the age of xe2x80x9cmultimediaxe2x80x9d which handles audio, video, and other data integrally. Conventional information media such as newspapers, magazines, televisions, telephones, radios, and telephones, have been adopted as the subjects of the multimedia. In general, the multimedia represents graphics, speeches, and especially images, as well as characters in relation with each other. In order to handle the conventional information media as the subject of the multimedia, it is essential that information of the conventional information media be represented in a digital format.
Let""s give information of each information medium in terms of the quantity of digital information. For example, characters require information of 1-2 bytes per character, while audio requires information of 64 kbits per second (quality for telecommunication), and a moving image requires information of 100 Mbits or more per second (quality for current television broadcasting). Hence, it is not practical to handle such enormous amount of data as it is in a digital format. For example, although visual telephones have already been put to practical use by means of an ISDN (Integrated Services Digital Network) which accommodates a transmission rate ranging from 64 kbps to 1.5 Mbps, video data of a television camera cannot be directly sent over the ISDN.
Accordingly, there is a demand for an information compression technique. In case of the visual telephones, a moving image compression technique according to H.261 or H.263 standard which is internationally standardized by an ITU-T (International Telecommunication Union Telecommunication Standardization Sector) is employed. Also, according to an information compression technique conforming to MPEG (Moving image Experts Group) 1 standard, audio and video information can be recorded in a normal CD (compact disc) for music.
MPEG is an international standard for compression of moving image data (image signal corresponding to the moving image). According to the e MPEG1 standard, moving image data is compressed into 1.5 Mbps, that is, a TV signal is compressed into about 1/100. While a transmission rate according to the MPEG1 standard is restricted to about 1.5 Mbps, according to MPEG2 standardized to meet demands of higher image quality, the moving image data is compressed into 2-15 Mbps.
Under the existing circumstances, MPEG4 is being standardized by a group (ISO/IEC JTC1/SC29/WG11) which has also standardized MPEG1 and MPEG2. The compression technique (object coding scheme) according to MPEG4 enables encoding and signal operation for each of objects composing a scene (one frame image), and new functions required for multimedia. As references for MPEG4, there is xe2x80x9cISO/IEC14496-1 MPEG-4 Systems, Final Committee Draft, May 15, 1998xe2x80x9d.
Commonly, in a coding scheme for moving pictures, a moving picture is handled as a series of still pictures (frames), and video data is compressively encoded frame by frame. On the other hand, in the object coding scheme according to MPEG4, an image having a specific shape (foreground), a background and the like included in a frame are respectively assumed to be one object and video data corresponding to the frame (one frame image)) is handled for each video object corresponding to the object. This object coding scheme enables appropriate compressive-coding process object by object, and thereby improves a data compression rate of video data in one frame. In addition, in this object coding scheme, information indicating placement of respective objects on one frame and the like, is handled independently of the object data, thereby improving convenience with which the object data is processed and edited.
In the object coding scheme according to MPEG4 as international standard, encoded video data corresponding to a plurality of objects is decoded and composited to provide reproduced data corresponding to a composite image (reproduced scene), which is to be displayed.
Encoded video data corresponding to the respective objects is packetized and transmitted. Specifically, the encoded video object data is divided into code sequences of appropriate lengths, to which additional information such as headers are added, resulting in packets to be transmitted.
According to MPEG4, encoded video object data corresponding to a plurality of objects composing a scene is packetized and multiplexed, and transmitted as a bit stream.
FIG. 16(a) shows a data structure of this multiplexed bit stream. A multiplexed bit stream Bs includes, for example, packets P(n), P(n+1), and P(n+2). The packet P(n) comprises a header H(n) and a data part D(n). The packet P(n+1) comprises a header H(n+1) and a data part D(n+1). The packet P(n+2) comprises a header H(n+2) and a data part D(n+2).
A data part of each of the packets contains a code sequence constituting corresponding encoded video object data, and a header thereof contains identification information for identifying a content of data stored in a corresponding data part, or time management information used for decoding and reproducing the data.
The time management information is added to each access unit as a unit to-be-decoded. The time management information is called a xe2x80x9ctime stampxe2x80x9d, which includes a DTS (Decoding Time Stamp) as time management information for decoding, and a CTS (Composition Time Stamp) as time management information for composition. One time stamp (CTS) for one processing will do, because time for another processing is found by calculation. It should be noted that the DTS needs to be added as a time stamp for each frame when the order of a reproduction process for each frame including composition and display is different from that of a decoding process for each frame. The access unit is equivalent to one frame in terms of video data and is equivalent to one audio frame in terms of audio data.
Whenever a data part of a packet contains head data of an access unit, a corresponding packet header contains a time stamp for the access unit.
FIG. 16(b) shows a portion (frame data) Fd1 corresponding to one frame as the access unit of the encoded video object data corresponding to the first object, and FIG. 16(c) shows a portion (frame data) Fd2 corresponding to one frame as the access unit of the encoded video object data corresponding to the second object. These frame data Fd1 and Fd2 is called VOP (Video Object Plane) and to respective headers thereof, time stamps Ts1 and Ts2 are added, respectively.
Conventionally, as an image composition method for compositing plural object data to display one scene, there is a technique termed xe2x80x9cCGDxe2x80x9d (Computational Graceful Degradation) (hereinafter referred to as a CGD method).
An example of this CGD method is to estimate decoding ability of a decoder for decoding encoded object data and outputting decoded data and reduce the number of steps in the decoding process so that the decoding process is completed by the time when decoded data should be output. Another example is to add priority information to respective frames and thereby to adaptively reduce the number of steps (operation amount) in the decoding process frame by frame or packet by packet, according to the priority information and the processing ability of the image composition apparatus.
In these methods, according to the processing ability of the decoder, the decoding process for the encoded video object data is completed by the time when the decoded data should be output, and decoded data corresponding to respective objects is composited and the resulting reproduced data corresponding to one scene is output. Therefore, these methods are effective in performing control so that the load on the image output apparatus will not exceed its processing ability.
FIG. 17 is a block diagram showing a structure of an image output apparatus which performs such load control. Turning to FIG. 17, there is shown an image output apparatus 1150, which is adapted to receive a bit stream Bs(encoded data) supplied through transmission line of a predetermined network N, extract data corresponding to a desired object (encoded video object data) from the bit stream, perform decoding and composition of the data, and output reproduced data corresponding to a desired scene (composite image). The image output apparatus 1150 is adapted to control the decoding of the data according to a traffic of the transmission line.
The image output apparatus 1150 includes first data receiving means 1151a for selecting packets of the first object from a multiplexed bit stream Bs received through the transmission line on the network N and outputting encoded object data E1 and the time stamp Ts1 for this object in each access unit (frame), and outputting a data transmission rate Ds1 to the transmission line, and second data receiving means 1151b for selecting packets of the second object from the multiplexed bit stream Bs received through the transmission line on the network N and outputting encoded video object data E2 and the time stamp Ts2 for this object in each access unit (frame), and outputting a data transmission rate Ds2 to the transmission line. Each of the data receiving means 1151a and 1151b includes a separator 1151 for separating data from the multiplexed bit stream Bs. The multiplexed bit stream including packets of the first object may be different from the multiplexed bit stream including packets of the second object.
The image output apparatus 1150 further includes a first decoder 1152a for decoding the encoded video object data E1 according to the time stamp Ts1 and the data transmission rate Ds1 and outputting decoded data D1 corresponding to the first object, a second decoder 1152b for decoding the encoded video object data E2 according to the time stamp Ts2 and the data transmission Ds2 and outputting decoded data D2 corresponding to the second object, and video data composition means 1153 for compositing these decoded data and outputting composite data Cd corresponding to a desired scene. Each of the decoders 1152a and 1152b is adapted to decode respective frames normally when a data transmission rate on the transmission line is low and a rate at which the encoded video object data is input is low, and is adapted to decode the respective frames with reduced operation amount when a data transmission rate on the transmission line is high and a rate at which the encoded video object data is input is high.
The image output apparatus 1150 still further includes a buffer 1154 for storing the composite data Cd output from the video data composition means 1153 at predetermined timing, image display means 1155 which reads data Bd from the buffer 1154 according to information DTr indicating predetermined display timing (scheduled display time) and outputs read data as reproduced data Td to a display 1150a, and control means 1156 which determines an image display period according to processing ability of the image output apparatus 1150 and outputs the information DTr indicating the scheduled display time according to the image display period.
In the image output apparatus 1150 so constructed, when the bit stream Bs including packets storing the encoded video object data corresponding to the first and second objects is input, the first and second data receiving means 1151a and 1151b select packets of the corresponding objects, and output the encoded video object data E1 and the time stamp Ts1,and the encoded video object data E2 and the time stamp Ts2 to the first and second decoders 1152a and 1152b, respectively, frame by frame. In this case, the first and second data receiving means 1151a and 1151b detect the data transmission rates Ds1 and Ds2 of the bit stream including the packets corresponding to the respective objects on the transmission line and output the information Ds1 and Ds2 to the first and second decoders 1152a and 1152b, respectively.
The decoders 1152a and 1152b decode the encoded video object data E1 and E2 frame by frame, at decoding processing times determined by the time stamps Ts1 and Ts2, and output decoded data D1 and D2, respectively. These decoding processes are controlled according to the data transmission rates Ds1 and Ds2, respectively. Specifically, the decoders 1152a and 1152b decode respective frames normally when a data transmission rate on the transmission line is low and a rate at which the encoded video object data is input is low, and decode the respective frames with reduced operation amount when a data transmission rate on the transmission line is high and a rate at which the encoded video object data is input is high.
When the respective decoded data D1 and D2 are input to the video data composition means 1153, the composition means 1153 generates composite data Cd corresponding to the desired scene and outputs the composite data Cd to the buffer 1154. In this case, in the image display means 1155, the data Bd stored in the buffer 1154 is read therefrom according to the information DTr indicating the scheduled display time from the control means 1156, and reproduced data Rd corresponding the desired scene is output to the display 1150a. Thereby, the display unit 1150a displays the image corresponding to the scene based on the reproduced data Rd.
FIG. 18 is a block diagram for explaining another structure of the image output apparatus using the CGD method. Turning to FIG. 18, there is shown an image output apparatus 1160 which is adapted to control the decoding process according to operation load and processing time of the decoders.
The image output apparatus 1160, like the image output apparatus 1150, includes a separator 1161 for selecting packets of the first and second objects from a multiplexed bit stream Bs which has been received through a transmission line of a network N, and extracting encoded video object data E1 and E2 and corresponding time stamps Ts1 and ts2 of the respective objects.
The image output apparatus 1160 further includes a first decoder 1161a for decoding the encoded video object data E1 according to the time stamp Ts1 and a decoding control signal Cn1 and outputting decoded data D1 corresponding to the first object, a second decoder 1161b for decoding the encoded video object data E2 according to the time stamp Ts2 and a decoding control signal Cn2 and outputting decoded data D2 corresponding to the second object, video data composition means 1163 for compositing these decoded data and outputting composite data Cd corresponding to the desired scene, and decoding amount estimation means 1162a and 1162b for obtaining operation load and data processing time of the respective decoders 1161a and 1161b from monitor signals Dm1 and Dm2 and controlling the respective decoders 1161a and 1161b by using the decoding control signals Cn1 and Cn2, respectively. The estimation units 1162a and 162b are adapted to control the respective decoders 1161a and 1161b in such a way that the operation amount per unit time in the decoding processes of the respective decoders 1161a and 1161b is small when operation loads placed on them are high or time required for processing a predetermined amount of data is long and the operation amount is large when the operation loads placed on them are low or the time is short.
The image output apparatus 1160, like the image output apparatus 1150, further includes a buffer 1164 for storing the composite data Cd output from the video data composition means 1163, image display means 1165 which reads data Bd from the buffer 1164 according to information DTr indicating predetermined display timing (scheduled display time) and outputs read data as reproduced data Rd to a display unit 1160a, and control means 1166 which determines an image display period according to processing ability of the image output apparatus 1160 and outputs the information DTr indicating the scheduled display time according to the image display period.
In the image output apparatus 1160 so constructed, when the bit stream Bs including packets storing the encoded video object data corresponding to the first and second objects is input through the transmission line on the network N, the separator 1161 selects packets of the corresponding objects and outputs the encoded video object data E1 and the time stamp Ts1, and the encoded video object data E2 and the time stamp Ts2 to the first and second decoders 1161a and 1161b, respectively, frame by frame.
The decoders 1161a and 1161b decode the encoded video object data E1 and E2 frame by frame, at decoding times determined by the time stamps Ts1 and Ts2, and output decoded data D1 and D2, respectively. At this time, the first and second estimation means 1162a and 1162b measure the operation loads and the processing times according to the monitor signals Dm1 and Dm2 and output the control signals Cn1 and Cn2 according to the measured operation loads and processing times, to the decoders 1161a and 1161b, respectively. Thereby, the respective decoders 1161a and 1161b are controlled in such a way that the operation amount per unit time in the decoding processes of the respective decoders 1161a and 1161b is small when operation loads placed on them are high or time required for processing a predetermined amount of data is long and the operation amount is large when the operation loads placed on them are low or the time is short.
When the respective decoded data D1 and D2 are input to the video data composition means 1163, the composition means 1163 generates composite data Cd corresponding to the desired scene and outputs the composite data Cd to the buffer 1164. In this case, in the image display means 1165, the data Bd stored in the buffer 1164 is read therefrom according to the information DTr indicating the scheduled display time from the control means 1166, and reproduced data Rd corresponding the desired scene is output to the display 1160a. Thereby, the display 1160a displays the image corresponding to the scene based on the reproduced data Rd.
Subsequently, a description will be made to explain a case where a plurality of objects composing the scene includes an object corresponding to object data which is repeatedly reproduced.
In the above-described object coding scheme, auxiliary information (program information) including composition information with which the scene (frame) is recomposed of the plurality of objects and side information relating to display of the respective objects are used when the object data is composited and a composite image is reproduced and displayed. Also, when processing or editing the object data, the program information as well as the respective objects is used.
The composition information-is information including the above placement information of the respective objects. According to MPEG4, as the composition information, scene description languages similar to VRML (reference: ISO/IEC 14772-1, Virtual Reality Modeling Language, 1997) is being standardized. According to MPEG4, as the side information of respective objects, object descriptors OD are being standardized.
Hereinafter, a description will be made to explain one scene (one frame image) composed of the plurality of objects and the composition information (scene description data) represented by the scene description language.
FIG. 27(a) shows a scene of a series of images (moving picture) obtained from video data accompanied by audio data, FIG. 27(b) shows a hierarchical structure of objects which compose the scene, and FIG. 27(c) shows scene description corresponding to the scene.
As shown in FIG. 27(a), a scene 20 as one frame image of a moving picture is composed of a plurality of objects (small images) of a hierarchical structure. The scene 20 is composed of a background object 21 corresponding to a background image, an audio object 22 corresponding to background music, a moving object 23 corresponding to a moving object moving in the background, a character object 26 corresponding to logo (Let""s start) displayed on the background image, and first and second wheel objects 24 and 25 corresponding to forward and backward wheels of the moving object.
The scene 20 is one node, to which the background object 21 and the audio object 22 belong. The background object 21 is also one node, to which the moving object 23 and the character object 26 belong. Further, the moving object 23 is one node, to which the first and second wheel objects 24 and 25 belong.
The scene description (composition information) according to MPEG4 describes how the scene is composed of the respective objects. The hierarchical structure of the scene 20 is represented by scene description SD shown in FIG. 27(c).
xe2x80x9c2D objectxe2x80x9d A1 shows that the video object 21 and the audio object 22 are included in a first layer and a second layer indicated by xe2x80x9c2D objectxe2x80x9d A2 exists. The xe2x80x9c2D objectxe2x80x9d A2 shows that the text object 26 and the video object 23 are included in a second layer, and a third layer indicated by xe2x80x9c2D objectxe2x80x9d A3 exists. The xe2x80x9c2D objectxe2x80x9d A3 shows that the video object 24 and the video object 25 are included in the third layer. The xe2x80x9c2D objectxe2x80x9d A1-A3 respectively show that the objects included in the first to third layers are two-dimensional objects.
In the scene description SD, object descriptor identifiers (OD ID=10) for identifying the objects which belong to respective layers, and detailed information CI1-CI5 (Loop=TRUE) such as flags each indicating whether or not the corresponding object is repeatedly reproduced.
FIG. 28 illustrates detailed information of a part of the scene description (see FIG. 27(C)). This description shows the scene 20 includes a two-dimensional video object whose object descriptor (OD_ID) is OD_ID=10, and a two-dimensional video object whose object descriptor (OD_ID) is OD_ID=20. Since xe2x80x9cLOOP=TRUExe2x80x9d as a LOOP flag is set to a node corresponding to the two-dimensional object (OD_ID=10), it is shown that this object is repeatedly reproduced. Since xe2x80x9cLOOP=FALSExe2x80x9d as a LOOP flag is set to a node corresponding to the two-dimensional object (OD_ID=20), it is shown that this object is normally reproduced rather than repeatedly reproduced. In the repeated reproduction, after data of a last frame of an object is reproduced, data of a first frame of the object is reproduced.
In the scene description, locations of objects are identified by the object descriptor IDs (OD_ID) in the corresponding node, they may be specified by URL (uniform resource locators). Also in this case, each of the LOOP flags indicates whether or not the corresponding object is repeatedly reproduced.
FIGS. 29(a) and 29(b) are diagrams showing object descriptors standardized as the side information. Herein, an object descriptor DO24 corresponds to the video object 24 identified by the object descriptor (OD_D=10) (see FIG. 29(a)), and an object descriptor DO21 corresponds to the video object 21 identified by the object descriptor (OD_ID=20) (see. FIG. 29(b)).
In each of the object descriptors, CU(composition unit) duration time is used as information indicating a frame updating period of a corresponding object. This CU duration time means that one frame image of the corresponding object should be updated in each CU duration time.
For instance, the CU duration time of the video object 24 (composition Unit Duration=100) (see FIG. 29(a)) indicates that a frame updating period is 100 millisecond (msec), and the CU duration time of the video object 21 (composition Unit Duration=80) (see FIG. 29(b)) indicates that a frame updating period is 80 msec.
In the conventional object composition apparatus, video object data, and corresponding composition information and side information are input to a section comprising the video data composition means 1153, the buffer 1154, the image display means 1155, and the control means 1156 which are included in the image reproduction apparatus shown in FIG. 17, and locations of respective objects to be displayed on a frame or information about whether or not data of the respective objects is repeatedly reproduced, is obtained from the composition information, and the frame updating period information and the like is obtained from the side information.
Then, according to the frame updating periods for the individual objects composing the scene, the object data is composited (frame updating of a composite image).
However, using the conventional object composition apparatus, the following problems arise.
In the conventional CGD method which has been proposed conventionally, loads placed on the video data composition means and the image display means in subsequent stages of the decoders or time required for processing by them are not taken into account, like the method for controlling the decoding process according to the traffic on the transmission line which is performed by the image output apparatus 1150 shown in FIG. 17, or the method for controlling the decoding process according to the loads or processing timed of the decoders which is performed by the image output apparatus 1160 shown in FIG. 18.
For this reason, an image output apparatus of high processing ability is capable of performing normal decoding, composition, and display of all the frames if the frame rate of the input video object data is high, whereas an image output apparatus of low processing ability is sometimes incapable of all processing for display of encoded video object data by the time when the image corresponding to the respective frames is to be displayed, which is determined by the processing ability of the image output apparatus, if the frame rate is high.
Therefore, the image output apparatus of high processing ability is capable of outputting video data at appropriate time, whereas the image output apparatus of low processing ability is incapable of outputting video data at appropriate time (scheduled display time). This lacks synchronization between video data output behind the scheduled display time and audio data output at the scheduled display time, which is less desirable to viewers.
In addition, in the conventional object composition apparatus, when the plurality of object composing the scene includes an object of a different frame updating period, a composite image cannot be displayed preferably.
As mentioned previously, in the object coding scheme according to MPEG4, the frame updating period is set for each of the objects composing the scene, the object composition apparatus is capable of updating the frame of the composite image at timings based on the frame updating periods of all the objects. However, such frame updating process for updating the period of the composite image such that the frame updating periods of all the objects are thus satisfied, causes enormous amount of data processed by the composition apparatus.
FIG. 30 shows timings when frames of 3 objects Ob1-Ob3 are updated and timings when a frame of a composite image Cs composed of these objects according to the frame updating periods of respective objects, by using 0 msec as a reference. The frame updating period of the object Ob1 is 100 msec, the frame updating period of the object Ob2 is 900 msec, and the frame updating period of the object Ob3 is 95 msec.
In this case, the frame of the composite image Cs is updated three times at intervals of 5 msec from 90 msec, three times at intervals of 10 msec from 180 msec, three times at intervals of 15 msec from 270 msec, and three times at intervals of 20 msec from 360 msec.
The frame updating process performed according to the frame updating periods of all the objects causes enormous amount of data processed by the data composition apparatus, because the frame is updated many times within shorter time.
Accordingly, in the conventional object composition apparatus, the frame updating period of the composite image is determined according to processing ability of the composition apparatus, and the frame of the composite image is updated according to the determined frame updating period.
In this composition process, the frame updating period of the composite image does not always match the frame updating period obtained from the side information corresponding to each of the objects composing the scene. For this reason, the composite image (reproduced scene) including the plurality of objects is not displayed correctly.
This problem will be discussed below.
The object (OD_ID=10) 24 is composed of 4 frames (frames A-D) and the frame updating period of this object is, as shown in FIG. 31(a), 100 msec. The frame updating period (100 msec) is described in an object descriptor OD24 shown in FIG. 29(a) as xe2x80x9cComposition Unit Duration=100xe2x80x9d.
When the object data of the object 24 is displayed through the following object composition processes [1]-[4] of different frame updating periods (display of the composite image of different frame updating periods), corresponding display states are shown in FIGS. 31(b)-31(e). Suppose that the frame updating periods of the composite image in the object composition processes [1], [2], [3], and [4] are 100 msec, 200 msec, 300 msec, and 400 msec, respectively.
As shown in FIG. 31(b), when the frame updating period is 100 msec, this matches the frame updating period of the object 24, and therefore an image of the object 24 is preferably displayed.
On the other hand, as shown in FIGS. 31(c)-31(e), the frame updating periods of the composite image are longer than 100 msec, some of frames of the object 24 are skipped.
When the frame updating period of the composite image is 200 msec, 2 of the 4 frames, i.e., B and D of the object 24 are skipped. When the frame updating period of the composite image is 400 msec, 3 of 4 frames, i.e., B-D of the object 24 are skipped. When the frame updating period of the composite image is 300 msec, the 4 frames of the object 24 cannot reproduced in a correct order.
Such frame skipping.occurs in the object normally reproduced. The object normally reproduced is preferably displayed according to the frame updating period of the object with some of the frames of the object skipped, which makes viewers less displeased with the composite image. This is because the object normally reproduced needs to be synchronized with another audio object or video object.
However, as for the object repeatedly reproduced, disorder of the display period of the object itself adversely affects the image, and consequently, skipping becomes problematic.
The present invention is directed to solving the above problem, and it is an object of the present invention to provide an image output apparatus and an image reproduction method which are capable of reproducing encoded video object data appropriately depending on its data processing ability, and outputting reproduced data for image display at scheduled display time determined by the data processing ability, and a data storage medium which contains a program for making a computer perform processing according to the image reproduction method.
It is another object of the present invention to provide an object composition apparatus and an object composition method which are capable of displaying a composite image composed of a plurality of objects, i.e., a reproduced scene, preferably without significantly increasing operation load in reproduction when the plurality of objects composing the scene includes an object corresponding to object data which is repeatedly reproduced, and a data storage medium for storing a program making a computer perform processing according to the object composition method.
Other objects and advantages of the invention will become apparent from the detailed description that follows. The detailed description and specific embodiments described are provided only for illustration since various additions and modifications within the spirit and scope of the invention will be apparent to those skill in the art from the detailed description.
According to a 1st aspect of the present invention, there is provided an image output apparatus which receives encoded video object data with display time set in each frame as a display process unit, which is obtained by encoding video object data respectively corresponding to individual objects composing a predetermined image, and decodes and composites the encoded video object data, to output reproduced data used for displaying the predetermined image, and the apparatus comprises: decoders for decoding the encoded video object data corresponding to respective objects and outputting a plurality of decoded data; video data composition means for compositing the plurality of decoded data to generate composite data corresponding to a frame, frame by frame; a buffer for storing the composite data of a predetermined number of frames; image display means for selecting composite data corresponding to a specified frame according to a result of comparison between set display time of respective composite data stored in the buffer and scheduled display time determined by display process ability, and outputting the selected composite data as the reproduced data; and means for determining a video data composition period, which determines a composition period of a composition process according to the result of comparison between the set display time and the scheduled display time and outputs composition period information, wherein the video data composition means perform the composition process according to the composition period indicated by the composition period information. Thereby, reproduction is performed while maintaining synchronization between audio data and video data, irrespective of processing ability of the image output apparatus.
According to a 2nd aspect of the present invention, there is provided an image output apparatus which receives encoded video object data with display time set in each frame as a display process unit, which is obtained by encoding video object data respectively corresponding to individual objects composing a predetermined image, and decodes and composites the encoded video object data, to output reproduced data used for displaying the predetermined image, and the apparatus comprises: decoders for decoding the encoded video object data corresponding to respective objects and outputting a plurality of decoded data; video data composition means for compositing the plurality of decoded data to generate composite data corresponding to a frame, frame by frame; a buffer for storing the composite data of a predetermined number of frames; image display means which selects composite data corresponding to a specified frame according to a result of comparison between set display time of respective composite data stored in the buffer and scheduled display time determined by display process ability, and outputs the selected composite data as the reproduced data; and means for determining a number of frames to-be-decoded, which determines the number of frames of the respective objects which are to be decoded by the decoders per unit time, according to the result of comparison between the set display time and the scheduled display time, and outputs information indicating the number-of frames to-be-decoded of the respective objects, wherein the decoders respectively perform decoding such that decoding amount of the respective objects per unit time corresponds to the number of frames per unit time according to the information indicating the number of frames to-be-decoded. Thereby, decoded data is generated according to processing ability of a circuit which composites the decoded data or outputs composite data to the display, whereby an image is reproduced appropriately according to processing ability of the image output apparatus.
According to a 3rd aspect of the present invention, in the image output apparatus of the 2nd aspect, the decoders, when changing the number of frames to-be-decoded per unit time according to the information indicating the number of frames to-be-decoded of the respective objects, determine frames to be decoded and frames to be dropped which are not decoded, according to types of encoding processes performed for the encoded video object data. Therefore, when reducing the number of frames to-be-decoded, frames which are not to be decoded are sequentially selected starting from the frame which affects an image quality least, and thereby an image is reproduced appropriately according to processing ability while suppressing degradation of the image quality.
According to a 4th aspect of the present invention, there is provided an image output apparatus which receives encoded video object data with display time set in each frame as a display process unit, which is obtained by encoding video object data respectively corresponding to individual objects composing a predetermined image, and decodes and composites the encoded video object data, to output reproduced data used for displaying the -predetermined image, and the apparatus comprises: decoders for decoding the encoded video object data corresponding to respective objects and outputting a plurality of decoded data; video data composition means for compositing the plurality of decoded data to generate composite data corresponding to a frame, frame by frame; a buffer for storing the composite data of a predetermined number of frames; image display means which selects composite data corresponding to a specified frame according to a result of comparison between set display time of respective composite data stored in the buffer and scheduled display time determined by display process ability, and outputs the selected composite data as the reproduced data; and means for determining a number of frames to-be-decoded, which determines the number of frames of respective objects which are to be decoded by the decoders per unit time according to waiting time before the video data composition means writes the composite data to the buffer, and outputs information indicating the number of frames to-be-decoded of the respective objects, wherein the decoder s respectively perform decoding such that decoding amount of the respective objects per unit time corresponds to the number of frames per unit time according to the information indicating the number of frames to-be-decoded. Thereby, decoded data is generated according to processing ability of a circuit which composites the decoded data or outputs composite data to the display, whereby an image is reproduced appropriately according to processing ability of the image output apparatus.
According to a 5th aspect of the present invention, in the image output apparatus of the 4th aspect, wherein the decoders, when changing the number of frames to-be-decoded per unit time according to the information indicating the number of frames to-be-decoded of the respective objects, determine frames to be decoded and frames to be dropped which are not decoded, according to types of encoding processes performed for the encoded video object data. Therefore, the effects of the 3rd aspect are achieved.
According to a 6th aspect of the present invention, there is provided an image output apparatus which receives encoded video object data with display time set in each frame as a display process unit, which is obtained by encoding video object data respectively corresponding to individual objects composing a predetermined image, and decodes and composites the encoded video object data, to output reproduced data used for displaying the predetermined image, and the apparatus comprises: decoders for decoding the encoded video object data corresponding to respective objects and outputting a plurality of decoded data; video data composition means for compositing the plurality of decoded data to generate composite data corresponding to a frame, frame by frame; a buffer for storing the composite data of a predetermined number of frames; image display means which selects composite data corresponding to a specified frame according to a result of comparison between set display time of respective composite data stored in the buffer and scheduled display time determined by display process ability, and outputs the selected composite data as the reproduced data; and means for determining a number of frames to-be-decoded, which determines the number of frames of respective objects which are to be decoded by the decoders per unit time according to the result of comparison between the set display time and the scheduled display time, and waiting time before the video data composition means writes the composite data to the buffer, and outputs information indicating the number of frames to-be-decoded of the respective objects, wherein the decoders respectively perform decoding such that decoding amount of the respective objects per unit time corresponds to the number of frames per unit time according to the information indicating the number of frames to-be-decoded. Thereby, decoded data is generated according to processing ability of a circuit which composites the decoded data or outputs composite data to the display, whereby an image is reproduced appropriately according to processing ability of the image output apparatus.
According to a 7th aspect of the present invention, in the image output apparatus of the 6th aspect, the decoders, when changing the number of frames to-be-decoded per unit time according to the information indicating the number of frames to-be-decoded of the respective objects, determine frames to be decoded and frames to be dropped which are not decoded, according to types of encoding processes performed for the encoded video object data. Therefore, the effects of the 3rd aspect are achieved.
According to an 8th aspect of the present invention, there is provided an image reproduction method which decodes and composites encoded video object data with display time set in each frame as a display process unit, which is obtained by encoding video object data respectively corresponding to individual objects composing a predetermined image, to output reproduced data used for displaying the predetermined image, and the method comprises: a video data composition step for compositing decoded data obtained by decoding the encoded video object data corresponding to respective objects to generate composite data corresponding to a frame, frame by frame; a buffering step for storing the composite data of a predetermined number of frames in a buffer; a reproduced data output step for selecting composite data corresponding to a specified frame according to a result of comparison between set display time of respective composite data stored in the buffer and scheduled display time determined by display process ability, and outputting the selected composite data as the reproduced data; and a video data composition period determination step, for determining a composition period of a composition process according to the result of comparison between the set display time and the scheduled display time and outputting composition period information, and in the video data composition step, the composition process is performed according to the composition period indicated by the composition period information. Therefore, the effect of the 1st aspect are achieved.
According to a 9th aspect of the present invention, there is provided an image reproduction method which decodes and composites encoded video object data with display time set in each frame as a display process unit, which is obtained by encoding video object data respectively corresponding to individual objects composing a predetermined image, to output reproduced data used for displaying the predetermined image, and the method comprises: a decoding step for decoding the encoded video object data corresponding to respective objects and outputting decoded data corresponding to the respective objects; a composition step for compositing the decoded data corresponding to the respective objects to generate composite data and storing the composite data of a predetermined number of frames in a buffer; a reproduced data output step for selecting composite data corresponding to a specified frame according to a result of comparison between set display time of the composite data of respective frames stored in the buffer and scheduled display time determined by display process ability, and outputting the selected composite data as the reproduced data; and a determination step for determining a number of frames to-be-decoded, which determines the number of frames of the respective objects which are to be decoded by the decoders per unit time, according to the result of comparison between the set display time and the scheduled display time, and outputs information indicating the number of frames to-be-decoded of the respective objects, and in the decoding step, the encoded video object data is decoded such that decoding amount of the respective objects per unit time corresponds to the number of frames per unit time according to the information indicating the number of frames to-be-decoded. Therefore, the effects of the 2nd aspect are achieved.
According to a 10th aspect of the present invention, there is provided an image reproduction method which decodes and composites encoded video object data with display time set in each frame as a display process unit, which is obtained by encoding video object data respectively corresponding to individual objects composing a predetermined image, to output reproduced data used for displaying the predetermined image, and the method comprises: a decoding step for decoding the encoded video object data corresponding to respective objects and outputting decoded data corresponding to the respective objects; a composition step for compositing the decoded data corresponding to the respective objects to generate composite data and storing the composite data of a predetermined number of frames in a buffer; a reproduced data output step for selecting composite data corresponding to a specified frame according to a result of comparison between set display time of the composite data of respective frames stored in the buffer and scheduled display time determined by display process ability, and outputting the selected composite data as reproduced data; and a determination step for determining a number of frames to-be-decoded, which determines the number of frames of respective objects which are to be decoded by the decoders per unit time, according to waiting time before the composite data is written to the buffer, and outputs information indicating the number of frames to-be-decoded of the respective objects, and in the decoding step, the encoded video object data is decoded such that decoding amount of the respective objects per unit time corresponds to the number of frames per unit time according to the information indicating the number of frames to-be-decoded. Therefore, the effects of the 4th aspect are achieved.
According to an 11th aspect of the present invention, there is provided an image reproduction method which decodes and composites encoded video object data with display time set in each frame as a display process unit, which is obtained by encoding video object data corresponding to individual objects composing a predetermined image, to output reproduced data used for displaying the predetermined image, and the method comprises: a decoding step for decoding the encoded video object data corresponding to respective objects and outputting a plurality of decoded data corresponding to the respective objects; a composition step for compositing the plurality of decoded data to generate composite data and storing the composite data of a predetermined number of frames in a buffer; a reproduced data output step for selecting composite data corresponding to a specified frame according to a result of comparison between set display time of the composite data of respective frames stored in the buffer and scheduled display time determined by display process ability, and outputting the selected composite data as the reproduced data; and a determination step for determining a number of frames to-be-decoded, which determines the number of frames of respective objects which are to be decoded by the decoders per unit time, according to the result of comparison between the set display time and the scheduled display time, and waiting time before the composite data is written to the buffer, and outputs information indicating the number of frames to-be-decoded of the respective objects, and in the decoding step, the encoded video object data is decoded such that decoding amount of the respective objects per unit time corresponds to the number of frames per unit time according to the information indicating the number of frames to-be-decoded. Therefore, the effects of the 6th aspect are achieved.
According to 12th aspect of the present invention, there is provided a data storage medium for storing a program which makes a computer perform processing of video data, the program being an image reproduction program for making the computer reproduce data according to an image reproduction method of the 8th aspect. The video data is output according to the reproduction method of the 8th aspect is implemented by using the computer.
According to a 13th aspect of the present invention, there is provided a data storage medium for storing a program which makes a computer perform processing of video data, the program being an image reproduction program for making the computer reproduce data according to an image reproduction method of the 9th aspect. Therefore, the effects of the 12th aspect are achieved.
According to a 14th aspect of the present invention, there is provided a data storage medium for storing a program which makes a computer perform processing of video data, the program being an image reproduction program for making the computer reproduce data according to an image reproduction method of the 10th aspect. Therefore, the effects of the 12th aspect are achieved.
According to a 15th aspect of the present invention, there is provided a data storage medium for storing a program which makes a computer perform processing of video data, the program being an image reproduction program for making the computer reproduce data according to an image reproduction method of the 11th aspect. Therefore, the effects of the 12th aspect are achieved.
According to a 16th aspect of the present invention, there is provided an image output apparatus which receives a plurality of video object data respectively corresponding to a plurality of objects composing a predetermined image, composites the plurality of video object data and outputs reproduced data used for displaying the predetermined image, and the apparatus comprises: object composition means for compositing the plurality of video object data with reference to object period information indicating periods according to which frames of respective objects are updated and composite image period information indicating a period according to which a frame of a composite image is updated; and period information changing means for changing one of the object period information and the composite image period information so that a corresponding frame updating period has a value according to a control signal, wherein the composition means composites the plurality of video object data with reference to changed period information which has replaced corresponding period information before change. Therefore, for the composite image composed of the plurality of objects, the frame updating period is set according to processing ability of the object composition apparatus, while for individual objects composing the scene, the frame updating period is set according to the display method of the video object data.
According to a 17th aspect of the present invention, there is provided an image reproduction method which receives a plurality of video object data respectively corresponding to a plurality of objects composing a predetermined image, composites the plurality of video object data and outputs reproduced data used for displaying the predetermined image, and the method comprises: a composition period determination step for determining a period according to which the plurality of video object data is composited, with reference to object period information indicating periods according to which frames of respective objects are updated and composite image period information indicating a period according to which a frame of a composite image is updated; and a period information changing step for changing one of the object period information and the composite image period information so that a corresponding frame updating period has a value according to a control signal, wherein in the composition period determination step, changed period information which has replaced corresponding period information before change, is referred to. Therefore, the effects of the 16th aspect are achieved.
According to an 18th aspect of the present invention, there is provided a data storage medium for storing a program which makes a computer perform processing of video data, the program being an image reproduction program for making the computer reproduce data according to an image reproduction method of the 17th aspect. Therefore, this processing is implemented by using a general computer.
According to a 19th aspect of the present invention, there is provided an object composition apparatus for compositing video object data respectively corresponding to individual objects composing a predetermined image according to auxiliary information associated with the predetermined image; and the apparatus comprises program information storage means for storing program information as the auxiliary information, including object period information of respective objects indicating periods according to which frames of the respective objects are updated and display method information indicating methods for displaying the video object data; decision means for deciding a method of displaying the video object data object by object, according to the program information stored in the program information storage means and outputting a decision signal indicating a decision result; period information updating means which receives the decision signal and performs an information updating process in which object period information of a target object included in the program information on which decision has been made is changed, according to the decision result; and composition means for compositing the video object data corresponding to the respective objects periodically by using the program information which has been subjected to the information updating process. Therefore, for the individual objects, the frame updating periods are set according to the display method of the video object data, and the composite image including the object corresponding to the video object data which is repeatedly reproduced, is preferably displayed.
According to a 20th aspect of the present invention, in the object composition apparatus of the 19th aspect, the program information includes composition information used for compositing the video object data corresponding to the individual objects to reproduce the predetermined image, the display method information being included in the composition information, the program information storage means includes a composition information memory for storing the composition information included in the program information, and the decision means receives the composition information stored in the composition information memory and decides the method for displaying the video object data object by object, according to the display method information included in the composition information. Therefore, for the individual objects, the frame updating periods are set according to the display method of the video object data by using the program information.
According to a 21st aspect of the present invention, in the object composition apparatus of the 19th aspect, the program information includes side information associated with the individual objects, the display method information being included in the side information, the program information storage means includes a side information memory for storing the side information of the respective objects included in the program information, and the decision means receives the side information stored in the side information memory and decides the method for displaying the video object data object by object, according to the display method information included in the side information. Therefore, for the individual objects, the frame updating periods are set according to the display method of the video object data by using the side information.
According to a 22nd aspect of the present invention, in the object composition apparatus of the 19th aspect, the display method information of each object which is included in the program information is a flag indicating whether or not video object data corresponding to a frame of a corresponding object needs to be repeatedly reproduced. Therefore, depending upon whether or not video object data corresponding to each object needs to be repeatedly reproduced, the frame updating period is set, and the composite image including the object corresponding to video object data which is repeatedly reproduced, is preferably displayed.
According to a 23rd aspect of the present invention, in the object composition apparatus of the 19th aspect, the display method information of each object which is included in the program information is a flag indicating whether or not object period information of a corresponding object can be changed. Therefore, for the object of the object composition period which is changeable, an appropriate frame updating period is set, and the composite image including the object corresponding to the object data repeatedly reproduced, is preferably displayed.
According to a 24th aspect of the present invention, in the object composition apparatus of the 19th aspect, the period information updating means updates the object period information of the target object such that a value of the frame updating period of the target object becomes an integer multiple of a composition period of the composition means for compositing the video object data. Therefore, the composite image is preferably displayed while suppressing skipping of frames.
According to a 25th aspect of the present invention, in the object composition apparatus of the 24th aspect, the period information updating means updates the object period information of the target object such that the value of the frame updating period of the target object becomes a value of (the composition periodxc3x971), when the value of the object period information is not larger than the value of the composition period. Therefore, the composite image is preferably displayed while suppressing skipping of frames.
According to a 26th aspect of the present invention, in the object composition apparatus of the 24th aspect, the period information updating means updates the object period information of the target object such that the value of the frame updating period of the target object becomes a smallest value of integer multiples of the value of the composition period, which is not smaller than the value of the object period information, when the value of the object period information is larger than the value of the composition period. Therefore, the composite image is preferably displayed while suppressing skipping of frames.
According to a 27th aspect of the present invention, in the object composition apparatus of the 24th aspect, the period information updating means, when the value of the object period information is larger than the value of the composition period, calculates a first candidate value as a largest value of integer multiples of the value of the composition period, which is not larger than the value of the object period information and a second candidate value as a smallest value of the integer multiples of the composition period, which is not smaller than the value of the object period information, and updates the object period information of the target object such that the frame updating period of the target object has one of the first and second candidate values which is closer to the value of the object period information. Therefore, the composite image is preferably displayed while suppressing skipping of frames, and simultaneously, variation before and after updating the frame updating period of the object corresponding to the object data to be repeatedly reproduced, is suppressed.
According to a 28th aspect of the present invention, there is provided an object composition method for compositing video object data respectively corresponding to individual objects composing a predetermined image according to auxiliary information associated with the predetermined image; and the method comprises: a decision step for deciding a method of displaying video object data object by object, according to program information as the auxiliary information, including object period information of respective objects indicating periods according to which frames of the respective objects are updated and display method information indicating methods for displaying the video object data; a period information updating step in which object period information of a target object included in the program information on which decision has been made is changed according to a decision result; and a composition step for compositing the video object data corresponding to the respective objects periodically by using the program information which has been changed in the period information updating step. Therefore, the effects of the 19th aspect are achieved.
According to a 29th aspect of the present invention, in the object composition method of the 28th aspect, the display method information of each object which is included in the program information is a flag indicating whether or not video object data corresponding to a frame of a corresponding object needs to be repeatedly reproduced. Therefore, the effects of the 22nd aspect are achieved.
According to a 30th aspect of the present invention, in the object composition method of the 28th aspect, the display method information of each object which is included in the program information is a flag indicating whether or not object period information of a corresponding object can be changed. Therefore, the effects of the 23rd aspect are achieved.
According to a 31st aspect of the present invention, in the object composition method of the 28th aspect, in the period information updating step, the object period information of the target object is updated such that a value of the frame updating period of the target object becomes an integer multiple of a composition period of composition means for compositing the video object data. Therefore, the effects of the 23rd aspect are achieved.
According to a 32nd aspect of the present invention, in the object composition method of the 31st aspect, in the period information updating step, the object period information of the target object is updated such that the value of the frame updating period of the target object becomes a value of (the composition periodxc3x971), when the value of the object period information is not larger than the value of the composition period. Therefore, the effects of the 23rd aspect are achieved.
According to a 33rd aspect of the present invention, in the object composition method of the 31st aspect, in the period information updating step, the object period information of the target object is updated such that the value of the frame updating period of the target object becomes a smallest value of integer multiples of the composition period, which is not smaller than the value of the object period information, when the value of the object period information is larger than the value of the composition period. Therefore, the effects of the 26th aspect are achieved.
According to a 34th aspect of the present invention, in the object composition method of the 31st aspect, in the period information updating step, when the value of the object period information is larger than the value of the composition period, a first candidate value as a largest value of integer multiples of the composition period which is not larger than the value of the object period information and a second candidate value as a smallest value of the integer multiples of the composition period, which is not smaller than the value of the object period information, are calculated and the object period information is updated such that the frame updating period of the target object has one of the first and second candidate values which is closer to the value of the object period information. Therefore, the effects of the 27th aspect are achieved.
According to a 35th aspect of the present invention, there is provided a data storage medium for storing a program which makes a computer perform an object composition process for compositing video object data respectively corresponding to individual objects composing a predetermined image, according to auxiliary information associated with the predetermined image, and the program comprises: a decision step for deciding a method of displaying video object data object by object, according to program information as the auxiliary information, including object period information of respective objects indicating periods according to which frames of the respective objects are updated and display method information indicating methods for displaying the video object data; a period information updating step in which object period information of a target object included in the program information on which decision has been made is changed according to a decision result; and a composition step for compositing the video object data corresponding to the respective objects periodically by using the program information which has been changed in the period information updating step.
Therefore, when performing the composition process by software, for the individual objects, the frame updating periods are set according to the display method of the video object data, and the composite image including the object corresponding to the video object data which is repeatedly reproduced, is preferably displayed.