A hypermedia is such that a connection called a hyperlink is defined between media such as moving images, still images, audios and texts, and reference can be made mutually or from one to another. For example, in a homepage described in HTML which can be browsed using the Internet, texts and still pictures are arranged, and links are defined throughout the texts and the still pictures. When the link is specified, relevant information as a link destination can be immediately displayed. When an interesting expression is directly specified, access can be made to relevant information, and therefore, the operation is easy and intuitive.
On the other hand, in a hypermedia in which moving images, not texts and still pictures, are main, there is defined a link from an object, such as a person or a thing, appearing on the moving image to relevant content, such as a text or a still picture, for explaining it, and when a viewer specifies this object, the relevant content is displayed. At this time, in order to define the link between a spatio-temporal region of the object appearing on the moving image and its relevant content, data (object area data) expressing the spatio-temporal region of the object in the moving image is required.
As the object area data, it is possible to use a mask image series having a value of a binary value or higher, arbitrary shape coding of MPEG-4, a method of describing a trajectory of a feature point of a figure explained in patent document 1 (JP-A-2000-285253), a method explained in patent document 2 (JP-A-2001-111996), and the like. In order to realize the hypermedia in which the moving image is main, in addition to this, data (operation information) describing an operation to display other relevant content when an object is specified, and the like are required. These data other than the moving image are called metadata.
As a method of providing a moving image and metadata to a viewer, first, there is a method of producing a recording medium (video CD, DVD, etc.) in which both the moving image and the metadata are recorded. Besides, in order to provide the metadata of the moving image which has already been owned as a video CD or a DVD, only the metadata may be downloaded from a network or delivered by streaming. Further, both data of the moving image and the metadata may be delivered through a network. At this time, it is desirable that the metadata has such a format as to be capable of efficiently using a buffer, to be suitable for random access, and to be resistant to data loss in the network.
In the case where switching of the moving image frequently occurs (for example, in the case where moving images taken in plural camera angles are prepared, and the viewer can freely select the camera angle, such as a multi-angle video of a DVD video), the metadata must be switched at high rate correspondingly to the switching of the moving image.
Further, it is necessary that an object having a complicated shape of a spatio-temporal region can be easily described.
In the metadata relevant to a moving image owned by a viewer and delivered by streaming to the viewer through a network, or in the metadata owned by the viewer and reproduced, it is desired that spatio-temporal region information of a complicated object can be easily described.
The present invention is therefore made to solve the above problem.