It is often necessary to read out certain contents from a bit stream according to a query previously formulated by a user or to determine with regard to certain contents whether the contents are in fact contained in the bit stream at all. In this case, a query, defined by a user, can be formulated using a query language such as SQL (see reference [1]) or XPATH (see reference [2]). It is advantageous here if the entire bit stream does not have to be searched for the desired contents but instead if the information is stored in an indexing list or in an indexing tree, so that only that part of the bit stream in which the indexing tree or the indexing list is stored has to be searched.
One problem of reading out data from a bit stream arises in the case of a document produced with the aid of the XML language (XML=Extensible Markup Language) and represented in the MPEG7 BiM format. With regard to the MPEG7 BiM format of an XML document, reference is made to [3] ISO/IEC 15938-1 Multimedia Content Description Interface—Part 1: Systems, Geneva 2002. Under this configuration the generated bit stream is subdivided into a plurality of units (access units) which consist of a plurality of fragments (fragment update units). The units are coded and where necessary sent in the form of an MPEG7 BiM stream to one or more recipients.
With regard to the querying of information from XML documents, a multiplicity of query languages are already known which permit searches to be made in the document for specific information. Reference may be made at this juncture by way of example to the already mentioned query language XPATH (see reference [2]). The query language XPATH can be used to define selection criteria for filtering desired information within an XML document. In this case the purpose of a query can be to assess whether a unit of the bit stream is important for the recipient. Also a query can be used in a targeted manner to access specific desired information in the XML document. During the generation of the bit stream of an XML document the MPEG7 coding method has provided no mechanisms which enable random access to specific elements of the XML document. The MPEG7 bit stream must therefore be decoded for the purpose of searching for elements. A document in XML format is then obtained once again, which document can be searched by means of the query language XPATH. The decoding and subsequent processing of an XML document in order to search for specific contents is therefore very time-consuming, and thus unacceptable for certain time-critical applications. Furthermore the problem can arise that the memory in the decoder is limited, with the result that the bit stream cannot be fully decoded. In addition, the overhead involved in decoding was unnecessary if the XPATH query executed on the decoded XML document ends with a negative result.
Within the framework of TV-Anytime (TVA), which is described in [4] TV-Anytime Specification Series S-3 on Metadata, Part B, Version 13, an index structure is used which permits random access to certain elements of a data fragment. The index structure consists of a plurality of parts and comprises what is referred to as a “key index list” in which all indexed paths of a document are stored. When a query is submitted, those paths are compared in turn with the query until a matching entry is found in the key index list. Based on the information that is stored in the key index list in relation to this entry, the positions in a description stream at which the indexed entry is present in coded form can be determined. As a result of using the key index list it is no longer necessary to decode irrelevant data fragments, so that less memory space is required during a query. However, the linear processing of the key index list is time-consuming and the transfer of all the indexed paths is laborious and resource-intensive.
The document Lam S. W. et al., “Representing lexicons by modified trie for fast partial string matching”, Character Recognition Technologies, San Jose, 1-2 February 1993, Bellingham, SPIE, pages 229-237, describes a fast lexical search method wherein an input sequence can have both an indefinite length and also several non-specified letters.
The document Wong R. K. et al., “An XML repository for molecular sequence data”, Proceedings IEEE International Symposium on Bio Informatics and Biomedical Engineering”, pages 35-42, describes a method wherein a large set of data can be efficiently searched with the aid of a “skip tree”.