The present invention refers to a dynamic taxonomy process for browsing and retrieving information in large heterogeneous data bases.
Information retrieval on this type of data bases (for example those available on the Internet) is nowadays a slow task, sometimes impossible to realize due to the enormous amount of data to be analyzed, and that can be implemented with difficulty with the currently available tools. The following documents deal with the prior art in this field: Hearst M. et al: xe2x80x9cCat-a-cone: an interactive interface for specifying searched and viewing retrieval results using a large category hierarchy,xe2x80x9d Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, US, New York, N.Y.: ACM, 1997, pages 246-255; EP-A-0 694 829 (XEROX Corp.); U.S. Pat. No. 5,644,740 (Kiuchi Itsuko); Gert Schmeltz Pedersen: xe2x80x9cA browser for bibliographic information retrieval, based on an application of lattice theory,xe2x80x9d Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, US, New York, ACM, vol. CONF., 16, 1993, pages 270-279; and Story G. et al: xe2x80x9cThe Rightpages image-based electronic library for alerting and browsing,xe2x80x9d Computer, US, IEEE Computer Society, Long Beach, Calif., US, vol. 25, no. 9, 1 Sep. 1992, pages 17-25.
Dynamic taxonomies are a model to conceptually describe and access large heterogeneous information bases composed of texts, data, images and other multimedia documents.
The following documents deal with prior art in this field: Hearst M. et al: xe2x80x98Cat-a-cone: an interactive interface for specifying searched and viewing retrieval results using a large category hierarchyxe2x80x99, Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, US, New York, N.Y.: ACM, 1997, pages 246-255; EP-A-0 694 829 (XEROX Corp.); U.S. Pat. No. 5,644,740 (Kiuchi Itsuko); Gert Schmeltz Pedersen: xe2x80x98A browser for bibliographic information retrieval, based on an application of lattice theoryxe2x80x99, Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, US, New York, ACM, vol. CONF., 16, 1993, pages 270-279; and Story G. et al: xe2x80x98The Rightpages image-based electronic library for alerting and browsingxe2x80x9d, Computer, US, IEEE Computer Society, Long Beach, Calif., US, vol. 25, no. 9, 1 Sep. 1992, pages 17-25.
As disclosed in Hearst, a dynamic taxonomy is basically a IS-A hierarchy of concepts, going from the most general (topmost) to the most specific. A concept may have several fathers. This is a conceptual schema of the information base, i.e. the xe2x80x9cintensionxe2x80x9d. Documents can be freely classified under different concepts at different level of abstraction (this is the xe2x80x9cextensionxe2x80x9d). A specific document is generally classified under several concepts.
Dynamic taxonomies enforce the IS-A relationship by containment, i.e. the documents classified under a concept C are the deep extension of C, i.e. the recursive union of all the documents classified under C and under each descendant Cxe2x80x2 of C.
In a dynamic taxonomy, concepts can be composed through classical boolean operations. In addition, any set S of documents in the universe of discourse U (defined as the set of all documents classified in the taxonomy) can be represented by a reduced taxonomy. S may be synthesized either by boolean expressions on concepts or by any other retrieval method (e.g. xe2x80x9cinformation retrievalxe2x80x9d). The reduced taxonomy is derived from the original taxonomy by pruning the concepts (nodes) under which no document d in S is classified.
A new visual query/browsing approach is supported by dynamic taxonomies. The user is initially presented with the complete taxonomy. He/she can then refine the result by selecting a subset of interest. Refinement is done by selecting concepts in the taxonomy and combining them through boolean operations. She/he will then be presented with a reduced taxonomy for the selected set of documents, which can be iteratively further refined.
The invention described here covers the following aspects of dynamic taxonomies:
1. additional operations;
2. abstract storage structures and operations on such structures for the intension and the extension;
3. physical storage structures, architecture and implementation of operations;
4. definition, use and implementation of virtual concepts;
5. definition, use and implementation of time-varying concepts;
6. binding a dynamic taxonomy to a database system;
7. using dynamic taxonomies to represent user profiles of interest and implementation of user alert for new interesting documents based on such profiles of interest.
The above and other objects and advantages of the invention, as will appear from the following description, are obtained by a dynamic taxonomy process as claimed in claim 1. Preferred embodiments and non-trivial variations of the present invention are claimed in the dependent Claim.