1. Field of the Invention
The present invention generally relates to (content) annotation lexicon development. Specifically, the present invention relates to a computer-implemented method, system and program product that analyzes content annotations to improve an annotation lexicon and its corresponding ontology.
2. Related Art
Content indexing/annotation is rapidly becoming a valuable resource in tracking and managing content (e.g., video broadcasts, audio broadcasts, Internet content, electronic mail messages, etc.). To annotate content, annotators (known in the art as Ontologists) will attach descriptive terms or concepts to the content. Such terms are typically drawn from an annotation lexicon. Unfortunately, in annotating content, annotators tend to ignore the most common terms. For example, few annotators have consistently annotated the type of background that is present in a piece of content, instead focusing more on the foreground. Even then, the annotators tend to ignore terms that are almost always present, such as “people” or “people-action” types of terms such as “walking”. Conversely, when uncertain or frustrated, annotators tend to invent terms just so that they can annotate something. The result can be annotations that are unnecessarily long: “Fortieth anniversary of the Freedom Rides” or “Princess Diana car wreck”.
Additionally, there is a trade off between the use of high-level terms of human value, such as “negotiating” or “planning”, and low-level machine-computable terms, such as “periodic texture” or “oscillating motion”. However, in general, annotators have little knowledge of machine capabilities, and system builders have little knowledge of what is most useful to people using content (e.g., videos) for their own (human) purposes. Existing approaches for lexicon creation depend heavily on heuristics, such as “if a term has more than a dozen sub-terms, then an intermediate term may be necessary.” Unfortunately, the existing approaches admit that there is no single correct class hierarchy for any given domain. Moreover, existing approaches fail to comment on the specialized properties of specifically visual terms. Still yet, the existing approaches fail to provide any automated tools for refining or clustering the annotation lexicon. Given that most or all annotations are drawn from the annotation lexicon, continual development (e.g., improvement) thereof could greatly improve the quality of annotations.
In view of the foregoing, there exists a need for an approach that allows an annotation lexicon to be developed and/or improved.