In the case of full text indexing known from the field of Internet search engines, the whole text of the web pages held on various web servers is indexed automatically (such as at AltaVista, HotBot or Infoseek). The relevant information is provided by “robots” or “spiders”, i.e. programs which work independently to “track down” resources on the Internet by following references (hyperlinks) from already known documents.
Each new document found by such a robot is automatically “headlined” in a respective search engine's database. The way in which this takes place is dependent on the respective robot: some index the HTML title or the first paragraphs in a document, while others sift through the entire document and index each individual word letter by letter. In this context, most search engines do not store the collected documents as a full copy, since this requires a gigantic computational and storage involvement which currently only AltaVista tackles. Normally, searching is performed by creating an index table which portrays the words held on a web page in a yes/no structure.
If a search service based on full text indexing is used to search for an arbitrary term, the search engine points to all of the documents which it has searched and which contain the search term. The result which the search service immediately outputs is the URLs of the documents found in the form of hyperlinks. As such, the document found can immediately be looked up and viewed.
Depending on the search term, the search engine may not report just one hit, but rather several thousand hits. To simplify selection from a plurality of hits, most full-text based search engines automatically weight the search results, which is known as “ranking”. In this context, the search engine weights the results on the basis of a mathematical method which, inter alia, evaluates the relative frequency of a search term in the documents found. The search result shows the list of hits for many search services with a percentage weighting, the documents with the highest numbers of hits being shown at the beginning of the list.
A method for producing inverted indices for indexing full-text documents is described in the article “A survey of information retrieval and filtering methods” (technical report, information filtering project, University of Maryland, College Park, Md., 1996) by C. Faloutsos and D. Oard. In this case, a distinction is drawn between indexing by a human user, semiautomatic methods and fully automatic methods. In documents which are not structured or are only slightly structured, the main difficulty with automatic indexing methods is recognition of the keywords and their context and also the exclusion of nonrelevant search terms (e.g. articles, pronouns, prepositions, conjunctions, interjections etc.).
In alternative methods based, by way of example, on a vector model for grouping similar documents (clustering), the same basic problems apply. In all cases, nonrelevant search terms are removed by using “stop word lists” (“negative dictionaries”). Further problems in the automatic production of inverted indices for full-text documents are the recognition of synonyms and of the context in which a search term, including two or more search terms, arises and also the attribution of declined nouns or adjectives and conjugated verbs to common word stems.
Conventional text-based image retrieval methods used for retrieving binary-coded image files are normally based on a simple full text search using suitable search terms. For this purpose, the content of these image files is described by a generally small set of keywords which are stored in an annotation file. In this case, the main drawback of this procedure is the reduction of complex image content to a few terms which are often able to portray the content of the image only unsatisfactorily. Thus, by way of example, the opportunities for linguistic expression for the purpose of precise verbal description of patterns, topologies, surface structures etc. are extremely limited.
For this reason, content-based image retrieval methods are necessary which automatically extract the fundamental features of an image and use them as a descriptive basis for content-based searching for images which are stored in a digital image archive. Such methods may be used in numerous fields, e.g. in medical diagnosis when comparing extracted parameters for x-ray images taken from a patient with stored image parameters for images of pathological tissue structures from an image database, in the field of remote satellite sensing for the purpose of assessing the effects of a pest attack on the forests in a region or in crime prevention for identifying perpetrators, e.g. by comparing electronically stored fingerprints with the fingerprints of a suspect which have been taken from a crime scene or by comparing the pictures from a surveillance camera with the faces stored in an electronic image archive for police criminal records.
These image databases manage large collections of images and allow searching for a number of images which are similar to a reference image or satisfy user-defined conditions. The main objective in this context is to reduce the quantity of results to a small number of suitable images which are then visualized by the user.
An overview of the image retrieval systems which exist today is given in the article “Study on non-text-based information retrieval—state of the art” (EU, ELPUB 106 study, 1996) by B. Lutes inter alia and also in the article “A review of content-based image retrieval systems” (Technical report jtap-054, University of Manchester, 2000) by C. C. Venters and M. Cooper. A few known image retrieval systems which are currently still at the research stage are the QBIC, Surfimage and Visualseek systems described in the articles “Automatic and semi-automatic methods for image annotation and retrieval in QBIC” (Proc. of storage and retrieval for image and video databases III, pp. 24-35, 1995) by J. Ashley et al., “Surfimage: a flexible content-based image retrieval system” (Proc. of ACM Multimedia, 1998, pp. 339-344) by C. Nastar et al. and “VisualSEEK: a fully automated content-based image query system” (Proc. of ACM Multimedia, 1996, pp. 87-98) by J. R. Smith and S.-F. Chang.
Conventional image description and retrieval methods are normally associated with a high level of involvement and are often unsatisfactory for adequate content description. For this reason, image databases today are expected to have the capability of content-based image retrieval. In this context, the standard approach for retrieval is based on automatic extraction and comparison of previously defined features which can be derived directly from the raw data. In particular, these highlight properties of the image content, such as dominant colors and their distribution, important shapes and textures or the global image layout. They can be weighted and can be combined with one another in different ways. Thus, it is possible to achieve an intermediate representation of the image data at a higher abstraction level. In this case, the image retrieval systems developed during research differ in terms of the methodical approach pursued in each case:                The color-based approach involves the images to be indexed being divided into individual search spaces. The similarity of the colors identified in these search spaces is compared with the colors defined in a search query. These also include the content-based retrieval of segmented images which involves images divided into individual segments being examined according to their color distributions in the respective segments. In this context a grid including squares of selectable size is placed over an image. A color histogram is then used to determine the predominant color for each grid element, and the square in question is completely filled with this color. If a plurality of grid elements in the same color are situated next to one another, this area is combined. The information regarding the image positions of individual color areas, their color and size is then stored in an annotation file.        Texture analysis involves division into individual image objects. In this context, the homogeneity and contrast level of an image are also measured. Using the grid split, not only the colors but also the significant features of the grid elements (e.g. contrast, two-dimensional nature, directionality etc.), which are likewise used for indexing an image, are stored as values in an annotation file.        The edge-based approach involves evaluation of the light/dark transitions in an image, which normally arise wherever objects adjoin one another. In order to depict these contours, it is first necessary to calculate all of the edge points using an edge detector. Once all of the edge points have been located, they are combined into closed contours. These are then matched to prescribable basic geometrical shapes (e.g. triangles, squares, circles, ellipses etc.). The information obtained in this manner is then stored in the annotation file.        
The similarity between a query image Bi and a number J of reference images Bj (for 1<j<J) stored in a digital image archive is ascertained using a pattern recognition algorithm. It corresponds to a suitably defined interval dimension dij for the interval between the image parameters, which are in the form of an N-dimensional feature vector xi and have been extracted from the query image Bi, with the image parameters which are in the form of N-dimensional reference vectors mi and have been extracted from the stored reference images Bj. This interval is normally calculated using a “similarity function”. This is normally a modification of the known Minkowski interval metric—a generalization of the quadratic Euclidian interval ||Δxij||22 between the respective feature vector xi and the individual reference vectors mi in an N-dimensional feature space:
                              d          ij          2                :=                                            d              2                        ⁡                          (                                                                    x                    _                                    i                                ,                                                      m                    _                                    j                                            )                                =                                                                                      Δ                  ⁢                                                                          ⁢                                                            x                      _                                        ij                                                                              2              2                        =                                          Δ                ⁢                                                                  ⁢                                                      x                    _                                    ij                  T                                ⁢                Δ                ⁢                                                                  ⁢                                                      x                    _                                    ij                                            =                                                ∑                                      n                    =                    0                                                        N                    -                    1                                                  ⁢                                  Δ                  ⁢                                                                          ⁢                                      x                                          ij                      ,                      n                                        2                                    ⁢                                                                          ⁢                                      ∀                                          j                      ⁢                                                                                          ⁢                      where                                                                                                                              (                  1          ⁢          a                )                                          Δ          ⁢                                          ⁢                                    x              _                        ij                          :=                                                            x                _                            i                        -                    ∈                                    ℝ              N                        .                                              (                  1          ⁢          b                )            
In this case, a reference vector mj is frequently obtained by averaging the Mj stored feature vectors xqj in a class j obtained through cluster formation which are close to one another in the feature space:
                                          m            _                    j                :=                              1                          M              j                                ·                                    ∑                              q                =                1                                            M                j                                      ⁢                                                            x                  _                                qj                            .                                                          (        2        )            
The result of the above method is a sorted list containing J quadratic Euclidian intervals. In this case, the subscript index j for the first elements in this list refers to the reference images Bj from the image archive which are most similar to the respective query image Bi and which can then be presented to a user as hits.
In this case, the decision regarding the class to which a feature vector xi belongs is made using a minimum interval classifier which assigns the respective feature vector xi to a particular class k. For J classes with the reference vectors mi (for 1<j<J), J interval metrics d2 ij then need to evaluated in line with the following decision rule:
                                                                                                              x                    _                                    i                                ∈                                  class                  ⁢                                                                          ⁢                  k                                            ,                                                                                                            when                  ⁢                                                                          ⁢                                      d                    ik                    2                                                  =                                                                            min                      j                                        ⁢                                                                  (                                                  d                          ij                          2                                                )                                            ⁢                                                                                          ⁢                                              i                        .                        e                        .                                                                                                  .                                                                                                  ⁢                                                                              d                            2                                                    ⁡                                                      (                                                                                                                            x                                  _                                                                i                                                            ,                                                                                                m                                  _                                                                k                                                                                      )                                                                                                                                <                                                                                    d                        2                                            ⁡                                              (                                                                                                            x                              _                                                        i                                                    ,                                                                                    m                              _                                                        j                                                                          )                                                              ⁢                                                                                  ⁢                                          ∀                      j                                                                                  ,                              j                ≠                                  k                  .                                                                                        (        3        )            
Since the features are actually extracted when the images are stored in the database, this method can be used to attain relatively short response times. During the execution time, it is thus now necessary to calculate only the interval metrics, so that the overall time required for image retrieval is significantly shortened. In addition, the method can easily be integrated into conventional database systems.
One drawback, however, is that most of the features extracted from the individual image files are highly abstract and thus cannot be used by users without specialist knowledge. Since conventional static feature extraction algorithms normally deliver a large quantity of irrelevant information which is not needed for automatic pattern comparison, methods based on dynamic feature extraction are being increasingly used today for object searching, that is to say for producing search queries in the form
“Find all images Bj ε β with the marked object X from the set β:={Bj|1<j<J} of images stored in a digital image database”,
e.g. dynamic object searching using wavelet transformation.
In this context, the user selects a particular image region, which is subsequently analyzed and described by various features. This representation is then “shifted” over all of the reference images stored in an image archive and is compared with the image portions underneath. The other image regions and the object background are ignored, which means that the search can concentrate on the respective image region selected.
Since an exemplary embodiment of the present invention is based on the method for automatically indexing multimedia data archives which is preferably intended to be used in the field of medical text and image retrieval, the text below gives a short presentation of two of the main communication standards used today for describing, storing, transferring and interpreting medical image data and hence linked context information—DICOM SR (“Digital Imaging and Communication in Medicine—Structured Reporting”) and HL7 (“Health Level Seven”).
The communication standard DICOM, whose third part is described in detail in the specialist article “Digital imaging and communications in medicine (DICOM)” (PS 3.3-2003, Rosslyn, Va.), is a standard for interchanging and managing medical image data and other related data which has been developed in the field of radiology and will also be supported as a standard in all other medical specialist fields in future.
A DICOM document includes two subregions: header data including the “Report title”, comprising DICOM code, and the “Document Content Sequence”, which contains a medical data part which is coded on the basis of the SNOMED (Systemized Nomenclature for Medicine) standard. SNOMED is a description language with a thesaurus including more than 50 000 terms, the description language being used to code, index and retrieve data in patient records. The coding schemes used in this context include mnemonic, hierarchic, group-sequential, incremental and combination codes. Besides SNOMED, the DICOM standard uses numerous other coding schemes (e.g. ICD and LOINC).
The HL7 CDA standard, described in detail in specialist article “HL7 Clinical Document Architecture Framework” (Release 1.0, 2000), is an international communication standard for interchanging, managing and integrating data which are required for patient treatment.
As compared with unstructured full text documents, HL CDA and DICOM SR documents are distinguished by an explicitly coded document structure which is characterized, for example, by coded chapter and section names. In this context, for each data element for which an entry is provided, the context information associated with this data element can be read from a library file. This context information is not retained in conventional methods for indexing full text documents, which reduces the accuracy of the search process. As a result, conventional indexing provides no possible way of ensuring that a search query is supplying all of the documents relevant to a particular search query.
Structured objects stored in the DICOM SR or HL7 CDA format do not themselves contain any image objects (including header data and binary-coded image data), but rather “Unique Identifiers” (UIDs) which are used to reference image data and other objects (such as biosignal data). By way of example, DICOM SR uses UIDs which are used to denote the type and the instance of referenced objects. Within the document, these UIDs are in the context of further descriptive data, e.g. codes, which are used to denote an examination method more closely. These meta data can be used to describe the “content” of a particular referenced object and also observations in this regard.
For structured SGML documents and object-oriented databases, extensions to the query language are known, as explained in the article “From Structured Documents to Novel Query Facilities”, (SIGMOD RECORD, 23(2): 313-324, June 1994) by V. Christophides, S. Abiteboul, S. Cluet and M. Scholl. For XML documents, there is the query language XQuery described in “XQuery 1.0: An XML Query Language” (W3C Working Draft 2002), this query language being suitable for information retrieval applications and using the language XPath, described in the specification “XML Path Language (XPath) version 1.0” (W3C Recommendation 1999), for addressing portions of an XML document. XPath is able to select document nodes by indicating various criteria and to implement fundamental manipulations on character strings, Boolean values and node sets and contains a simple function library which can be extended by user-defined functions. Without producing a suitable inverted index, however, these queries are limited to individual structured documents, and the search for relevant documents is inefficient.