1. Field of the Invention
The present invention relates to a user interface which uses scene description information, a scene description information generating device and method for generating scene description information, a scene description information converting device and method for converting scene description information, and a sending medium for sending scene description information and a recording medium for recording scene description information.
2. Description of the Related Art
There are contents described with scene description enabling interaction by user input, such as digital TV broadcasting and DVD (Digital Video/Versatile Disk), Internet home pages described with HyperText Markup Language (hereafter referred to as “HTML”) or the like, Binary Format for the Scene (hereafter referred to as “MPEG-4 BIFS”) which is a scene description format stipulated in ISO/IEC14496-1, Virtual Reality Modeling Language (hereafter referred to as “VRML”) which is stipulated in ISO/IEC14472, and so forth. The data of such contents will hereafter be referred to as “scene description”. Scene description also includes the data of audio, images, computer graphics, etc., used within the contents.
FIGS. 18 through 20 illustrate an example of scene description, taking VRML as an example.
FIG. 18 illustrates the contents of scene description. With VRML, scene description is text data such as shown in FIG. 18, and MPEG-4 BIFS scene description is the text data that has been encoded to binarized data. VRML and MPEG-4 BIFS scene description is represented by basic description units called nodes. In FIG. 18, nodes are underlined. Nodes are units for describing objects and the linkage relation of objects and the like, containing data called fields for illustrating the properties and attributes of the nodes. For example, the Transform node 302 in FIG. 18 is a node capable of specifying three-dimensional coordinates conversion, and the amount of parallel movement of the point of origin of the coordinates can be specified in the Translation filed 303. There are also fields capable of specifying other nodes, so the configuration of the scene description has a tree configuration such as shown in FIG. 19. In FIG. 19, the ovals represent nodes, dotted lines between the nodes represent event propagation paths, and solid lines between the nodes represent the parent-child relations of the nodes. A node which represents a field of the parent node thereof is called a child node. For example, in the Transform node 302 in FIG. 18, there is a Children field 304 indicating a child node group which is subjected to coordinates conversion by the Transform node, and the TouchSensor node 305 and Shape node 306 are grouped as children nodes. A node which thus groups child nodes to a Children filed is called a grouping node. In the case of VRML, a grouping node is a node defined in ISO/IEC14772-1 Section 4.6.5, and indicates a node having a field comprising a list of nodes.
That is, a grouping node has a field containing a list of children nodes. Each grouping node defines a positional space for children. This positional space relates to the positional space of the nodes wherein grouping nodes are children. Such a node is referred to as a parent node. This means that conversion descends the scene chart hierarchy. As defined in ISO/IEC14772-1 Section 4.6.5, There are special exceptions wherein the field name is not Children, but in the following description, the Children field will be understood to encompass such exceptions as well.
In order to position an object to be displayed within a scene, the node representing the object is grouped along with a node representing attributes, and further grouped with a node indicating positional location. The object which the Shape node 306 in FIG. 18 represents has parallel movement specified by the Transform node 302 which is the parent node thereof applied thereto, and positioned in the scene. The scene description in FIG. 18 contains a Sphere node 307 representing a sphere, a. Box node 312 representing a cube, a cone node 317 representing a cone, and a Cylinder node 322 representing a cylinder, with the results of decoding and displaying the scene description being such as shown in FIG. 20.
The scene description may also contain user interaction. The ROUTE shown in FIG. 18 represents propagation of events. The ROUTE 323 indicates that in the event that the TouchTime field of the TouchSensor node 305 to which an identifier called TOUCHS has been appropriated changes, the value thereof is propagated as an event to the StartTime field of the TimeSensor node 318 to which an identifier called TIMES has been appropriated. In the event that the user has selected the Shape node 306 which has been grouped with the Children field 304 of the Transform node 302 which is the parent node of the TouchSensor node 305, the TouchSensor node 305 outputs the selected time as a TouchTime event. A sensor which is grouped with and works with a Shape node attached thus by a grouping node will be referred to as a Sensor node. A Sensor node is what ISO/IEC14772-1 Section 4.6.7.3 calls Pointing-device sensors, and attached Shape nodes are Shape nodes grouped with the parent node of a Sensor node. That is, a Pointing-device sensor is for detecting a pointing event wherein the user clicks on a shape such as a touch sensor, for example.
On the other hand, for one second from startTime, the TimeSensor node 318 outputs the elapsed time as a fraction_changed event. The fraction_changed event which represents the elapsed time output from the TimeSensor node 318 is propagated by the ROUTE 324 to the set_fraction field of the ColorInterpolator node 319 to which an identifier called COL has been appropriated. The ColorInterpolator node 319 has functions for linear interpolation of RGB color-space values. The key and keyValue fields of the ColorInterpolator node 319 represent that in the event that the value of the set_fraction field which is input is 0, event output of the RGB value [000] as value_changed is made, and that in the event that the value of the set_fraction field which is input is 1, event output of the RGB value [111] as value_changed is made. In the event that the value of the set_fraction field which is input is between 0 and 1, event output of a value subjected to linear interpolation of the RGB value between [000] and [111], as value_changed, is made. That is to say, in the event that the value of the input set_fraction field which is input is 0.2, there is event output of the RGB value [0.2 0.2 0.2] as value_changed. The value value_changed as the results of linear interpolation is propagated by the ROUTE 325 to the diffuseColor field of the Material node 314 to which has been appropriated an identifier called MAT. This diffuseColor represents the diffusion color of the object surface which the Shape node 311 to which the Material node 314 belongs represents. Event propagation by the above ROUTE 323, ROUTE 324, and ROUTE 325 realizes user interaction wherein the RGB values of a displayed cube change from [000] to [111] for one second immediately following the user selecting the displayed sphere.
FIG. 21 shows an example of a system for viewing and listening to scene descriptions of contents described with a scene description method enabling including interaction by user input, such as digital TV broadcasting and DVD, Internet home pages described with HTML, MPEG-4 BIFS, VRML, and so forth.
The server C01 takes the scene description C00 as input, and in the event that the server itself comprises a decoding device for scene description, the scene description C00 is decoded and displayed on the display terminal C13. Examples of the server C01 include a scene description re-distributing device or home server, digital TV broadcast setup box, personal computer, and so forth. Normally, a user input device C09 such as a mouse or keyboard is used to enable user input for a scene description containing user interaction. There are also cases wherein scene description is distributed to an external remote terminal C07. At this time, the remote terminal C07 may not necessarily have sufficient decoding capabilities and display capabilities for the scene description, and also there is the problem that sufficient sending capacity may not be secured for distribution.
The remote terminal C07 may have capabilities as a user input device. In such cases, the user input information C11 which has been input on the remote terminal C07 is transmitted to the server C01, reflected in the decoding of the scene description at the server C01, and consequently the decoded scene C12 which reflects the user input is also displayed on the display terminal C13.
FIG. 22 shows the configuration of a user interface system comprising the remote terminal having user input capabilities shown in FIG. 21.
In the event that the server D01 comprises a scene description decoding device D04, the scene description input D00 is decoded and the decoded scene D12 is displayed on the display terminal D13. On the other hand, the server D01 transmits the scene description D00 to the remote terminal D07 via the transmitting/receiving device D06. The scene description D00 may be temporarily stored in the scene description storing device D05.
The remote terminal D07 receives the scene description D00 with the transmitting/receiving device D06b, decodes with the scene description decoding device D04b, and displays with the display device D08. The scene description D00 may be temporarily stored in the scene description storing device D05b. In the event that the remote terminal D07 has user input functions, the remote terminal D07 accepts user input D10 from the user input device D09, and sends this as user input information D11 representing user-selected position and the like to the scene description decoding device D04b. The scene description decoding device D04b decodes the scene description D00 based on the user input information D11, thereby displaying decoded results reflecting the user input on the display device D08. On the other hand, the remote terminal D07 may transmit the user input information D11 to the server D01 via the transmitting/receiving device D06b. In the event that the server D01 comprises a scene description decoding device D04, the scene description decoding device D04 of the server D01 decodes the scene description D00 based on the user input information D11, thereby displaying decoded scene D12 reflecting the user input D10 on the display device D13.
With regard to viewing and listening to contents described with a scene description method enabling including interaction by user input, such as digital TV and DVD, Internet home pages described with HTML and the like, MPEG-4 BIFS, VRML, and so forth, there is demand for arrangements wherein decoding and display can be performed on terminals with inferior decoding capabilities and display capabilities. At the time of re-distribution with low-capacity media, there has also been the problem that scene descriptions with large data could not be sent, or required sending media with large capacity.
Also, with regard to viewing and listening to contents comprising scene description containing user interaction, there has been the need for the user to operate an input device such as a mouse toward the screen (in the case of TV). Or, in the case of receiving with a PC, the user has had to sit by the screen and operate a keyboard or mouse. Accordingly, a user interface system could be conceived wherein all contents are displayed on the remote terminal for the user to make input on the remote terminal, but a great deal of the contents usually are not directly related to user input, and transmitting all of this data to the user terminal would necessitate great sending capacity for sending to the remote terminal, and further require high decoding capabilities and display capabilities for the remote terminal.