Numerous applications require the annotation of documents with a fixed set of terms. Examples include video annotation (where the documents are, for example, key frames of a video) and library cataloging (where the documents are, for example, mainly books and magazines). Examples of annotation terms include “outdoors,” “face” and “monologue” for videos, and “antiquities,” “meteorology” and “fiction” for library catalogs.
Current annotation systems require the annotator to memorize and pick from a large (typically hierarchical) lexicon of terms. Besides the fact that this is a time-consuming process, lexica keep changing and growing over time, requiring the annotator to keep up-to-date. For example, the Library of Congress introduces close to 1,000 new or changed subject headings each week.
A different approach mainly used for text documents, automatically or semi-automatically finds matching annotations. This is achieved via ontology-based text analysis and machine learning techniques, see, e.g., M. Erdmann et al., “From manual to semi-automatic semantic annotation: About ontology-based text annotation tools,” Proceedings of the COLING 2000 Workshop on Semantic Annotation and Intelligent Content, Luxembourg, August 2000. An example of such a system is the S-CREAM system, as described in S. Handschuh et al., “S-CREAM—Semi-automatic CREAtion of Metadata,” 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), 2002.
However, these techniques can have high annotation error rates that necessitate human supervision, since they do not use a knowledge base in making the annotation decision. On the other hand, approaches such as are described in C. A. Goble et al., “Describing and Classifying Multimedia Using the Description Logic GRAIL,” SPIE, 1996, annotate and retrieve documents using a well-defined description logic. Even though this approach improves the retrieval quality, it does not free the document repository maintainer from annotating the documents.
U.S. Pat. No. 6,397,181, entitled “Method and Apparatus for Voice Annotation and Retrieval of Multimedia Data,” transforms voice annotations into a word lattice and indexes the word lattice. Even though such an approach tries to simplify the annotation process, the approach focuses on the indexing process and does not try to match the voice annotations with a given set of allowed annotations.
Therefore, a need exists for improved document annotation techniques.