1. Field of the Invention
The present invention relates to the field of information management. More specifically, the present invention, in an exemplary embodiment, relates to a system and method of operation for personalization of categorization of information and summaries of larger documents.
2. Description of the Related Art
Knowledge base management tools are similar to data-mining and search tools. Data mining focuses on so-called legacy data that is more transactional and financial in nature. For example, a massive database of sales figures can reveal trends or allow a user to “drill down” to specific territories, products, and customers. Search tools may ferret out information known to be possibly present, but do not necessarily reveal anything inherent in the data. Knowledge base management tools go a step further by enabling a user to collect and organize information, search for what the user needs, and share the user's findings with others.
In the art, most data provision services—i.e., search or data retrieval services—provide a set interface to their data. A user may only manipulate that interface in a manner dictated by the provider. For example, a user wishing to access services such as the LEXIS/NEXIS™ service which is provided by a division of Reed-Elsevier, Inc. or WESTLAW™ operated by the West Group may use software that resides at a user computer and that has access to data either resident on a CDROM, a hard drive local to the user computer's environment, or via a data communications network such as the Internet. Other users may access those services directly through the Internet. Many information sources do not provide a user level taxonomy at all, relying, if at all, on an Internet browser or other software to provide some user level utility to organize information, e.g. once data are accessed, storage of the data is limited by the browser or operating system to folders and searching within folders limited as well.
A problem that exists with these access methods is that the means and ability to categorize the data accessed is dictated by the provider. For example, WESTLAW uses its proprietary head notes taxonomy that WESTLAW alone maintains and creates. Educational services such as ERIC provide a rigid classification system.
For many users, these taxonomies are either inadequate because they are not tailorable to the user's specific needs or ignored because they represent a view of categorization with which the user is neither familiar nor inclined to learn or use. Moreover, given the level of current art searching methods, use of the provider supplied and imposed taxonomy is not as attractive for searching as it may have once been.
However, many users would benefit from an ability to categorize and organize data in a manner comfortable to that user, and perhaps to that user alone. Such capability will aid that user in accessing the data, extracting information relevant to that user from that data, and later retrieving that information rapidly and cost-effectively. A problem with such categorization is that such an ability may incur large support costs or be impracticably unwieldy.
As is known in the art, so called knowledge bases may comprise an indexed, searchable set of queries or frequently asked questions (FAQs) coupled with a search engine. Some methods proposed in the prior art deal with mining generalized sequential patterns from large databases of raw data, taking into account user specified constraints such as taxonomies. U.S. Pat. No. 5,742,811 issued to Agrawal, et al. for “METHOD AND SYSTEM FOR MINING GENERALIZED SEQUENTIAL PATTERNS IN A LARGE DATABASE” is illustrative.
The prior art also has much written on queries of databases. U.S. Pat. No. 5,826,260 issued to Byrd, Jr., et al. for “INFORMATION RETRIEVAL SYSTEM AND METHOD FOR DISPLAYING AND ORDERING INFORMATION BASED ON QUERY ELEMENT CONTRIBUTION” is illustrative and teaches an information retrieval system wherein a query issued by a user is analyzed by a query engine into query elements. After the query has been evaluated against the document collections, a resulting hit list is presented to the user, e.g., as a table. The presented hit list displays not only an overall rank of a document but also a contribution of each query element to the rank of the document. The user can reorder the hit list by prioritizing the contribution of individual query elements to override the overall rank and by assigning additional weight(s) to those contributions. However, the prior art has not adequately addressed using queries as a method of capturing, condensing, and presenting raw data or its summarization according to a user defined, user configurable taxonomy.
In addition, the prior art teaches methods of summarization of raw data. U.S. Pat. No. 5,918,240 issued to Kupiec, et al. for “AUTOMATIC METHOD OF EXTRACTING SUMMARIZATION USING FEATURE PROBABILITIES” is illustrative. Kupiec teaches a method of automatically generating document extracts that makes use of feature value probabilities generated from a statistical analysis of manually generated summaries to extract the same set of sentences an expert might. The method is based upon an iterative approach. However, Kupiec does not disclose and does not teach, suggest, or motivate towards use of a user tailorable taxonomy when doing its summarization or association of the generated summarization with one or more elements in the taxonomy.
Accordingly, there is a need for a user configurable interface comprising raw data, information summarized and derived from the raw data, and a user defined and maintained taxonomy to organize the information.