Recent increase in storage capacity of video recording apparatuses and widespread use of high-speed/broadband networks have created an environment that can provide the user with a huge amount of video images at one time. In order for the user to efficiently utilize the provided video data, a function of searching for desired data of video image or partial video image (shot, cut) through the video data at high-speed in a simple manner is required. As one such function, an approach has been proposed in which information to be added to the video data (hereinafter referred to as additional information) related to the video image is prepared in advance and the search is done aided by the additional information. By this function, it is possible to search for a desired one from a catalogue of video titles and to provide the corresponding video data, or it is possible to prepare additional information representing contents of the video images in grater detail, allowing the user to search based on the contents.
FIG. 17 shows an example of the additional information describing the contents of video images presented by certain video data, applicable to the prior art and to an embodiment of the present invention. As can be seen from FIG. 17, the additional information is represented by a tree-structure that corresponds to a logic structure of the video image.
Generally, a movie content is made up of one or a plurality of scenes, and each scene is made up of one or a plurality of sub-scenes (or shots), or still another scene, that is, the content has a hierarchy. For higher convenience in searching through video images, search information representing characteristics of a sub-scene is added to each sub-scene, which information is described using a tree-structure expression such as shown in FIG. 17, to express the hierarchy mentioned above. In the example of FIG. 17, the video as a whole consists of two scenes, which are made up of two sub-scenes and one sub-scene, respectively. For each sub-scene, related information of the sub-scene, in this example the time information (start time/length), title and performer or performers appearing at the sub-scene, is described. In the tree-structure, the video data as a whole is the root (root node) RN, and nodes representing types of elements such as the scene and sub-scene as well as the time information, title and performer are the nodes N. Among the nodes N, the level of scenes corresponds to the first branch destination when viewed from the root node RN, and hence, the scenes are the nodes N of the first level. Sub-scenes are the nodes N of the second level. The data representing the time information, title and performer as the related information of each sub-scene correspond to the leaves L of the tree-structure.
The information added as a leaf L will be hereinafter referred to as “element information” or simply “element” of the search information.
Here, in order to tell which element information is which leaf L of the tree-structure, it is necessary to store the value of the element and, at the same time, to store the information representing the position of the element in the tree-structure. In other words, information specifying to which sub-scene of which scene the element belongs is required. In FIG. 17, numbers “1” and “2” allotted to distinguish scenes or sub-scenes in one scene from each other, and numbers “1”, “2” and “3” allotted to distinguish a plurality of performers from each other in the sub-scenes are pieces of information for uniquely identifying the positions of the elements in the tree-structure as described above, which information is referred to as “position information.” The position information is expressed in accordance with an order relation defined in the tree structure. An example will be described in the following.
In the tree structure, all nodes N positioned below a node N (or root node RN) are denoted by serial numbers 1, 2, . . . from the left to the right of the tree. Position information of a certain leaf L is given by listing the serial numbers allotted to all nodes N from the root node RN to the certain leaf L.
Referring to FIG. 17, the specific manner of generating the position information in accordance with the defined order will be described, taking the third from the left element “Akira ABE” as an example. When the tree is scanned starting from the upper to the lower level and on one same level from the left to the right, nodes N (or root node RN or leaves L) appear one after another, and the position information is allotted to each node in the order of appearance.
Starting from the first level, the scene to which the element “Akira ABE” belongs is the leftmost node N hanging from the video of root node RN. Therefore, the position information of the first level is “1”. On the second level, the sub-scene to which the element “Akira ABE” belongs is the first from the left node among the nodes N hanging from scene “1” of the upper level node N. Therefore, the position (order of appearance) of the second level is “1”. Next, on the third level, the leaf L of the element “Akira ABE” is the first from the left node among the nodes N of “performers” hanging from sub-scene “1”. Therefore, the position (order of appearance) among the elements “performers” of the third level is “1”. Therefore, in this example, the position information of element “Akira ABE” is, from the first to third levels combined, (1,1,1). The position information of every leaf L can uniquely be expressed in this format.
The search proceeds in the following manner. Referring to FIG. 17, first, the sub-scene at which “Wataru WADA” appears is searched for. Specifically, query “performer=Wataru WADA” is entered, and among the pieces of element information (leaves L) representing performers, one that matches with the query “Wataru WADA” is searched for. When an element that satisfies the condition is found, a piece of information representing the sub-scene to which it belongs (“sub-scene 1 of scene 1”) is returned as a search result. The sub-scene is reproduced with reference to the time information added to the leaf L of the sub-scene.
FIG. 17 is a very simple example of additional information, while the actual types of elements are not limited to these, and complicated video images have complicated tree-structures. Therefore, the additional information generally has a large size. For this reason, it is often the case that the additional information is handled in a pre-divided form, and at the time of use, only the necessary piece of information is obtained and used.
FIG. 18 shows an example in which the tree-structure of the additional information shown in FIG. 17 is divided into easier-to-handle forms (divided sub-scene by sub-scene). Each divided tree is referred to as a “sub-tree” or a “fragment”. Further, in order to allow “performer” based search described above, a table such as shown in FIG. 19 is prepared as auxiliary information. The table of FIG. 19 stores values of all pieces of element information (performers) of the tree-structure as well as pieces of fragment information representing which element information belongs to which partial tree (fragment).
Here, the search for a video image using the additional information proceeds in the following manner. First, in response to a request, the system transmits the table shown in FIG. 19 to the user. The user searches through the table for a match of the query “Wataru WADA” and reads corresponding fragment information. Then, the data of the fragment (“fragment 1”) indicated by the read fragment information is requested. In response to the request, the system transmits the data of “fragment 1” to the user, the user receives the data of “fragment 1” and uses the received data of “fragment 1”. The sub-scene is reproduced with reference to the “time information” of the obtained fragment data.
Specific example of the auxiliary information shown in FIG. 19 may include “Indexing” technique of TV Anytime Phase 1, Part 3-2 standard (ESTI TS 102 822-3-2).
Though information for indicating individual fragment is provided separate from the additional information of FIG. 17 in the examples of FIGS. 18 and 19, such information may be replaced by the position information included in the additional information of FIG. 17.
The table of FIG. 19 is always transmitted, no matter whether it is actually used or not. Therefore, considering transmission efficiency, it is preferred to have the table compressed and coded before transmission. In this respect, Japanese Patent Laying-Open No. 2003-092757 (Patent Document 1) discloses, as a method of encoding a plurality of pieces of position information with high rate of compression, a method of differentially coding pieces of position information having continuous values in the order of magnitude. In this differential coding method, the position information of a certain leaf L (or a certain node N) of the tree structure is not directly described, but the position information is encoded using only the deviation (difference) between the position information of interest and the position information of a leaf L (or node N) of one previous order in the tree-structure. This effectively enables more efficient coding using smaller number of bits.
Patent Document 1: Japanese Patent Laying-Open No. 2003-092757