1. Field of the Invention
This invention relates generally to an improved system and method for aiding users in the development of a query string; and more specifically, to a system and method for allowing a user to control the development of a concept-based natural language search query by controlling the manner in which a hierarchical concept tree is structured and traversed.
2. Description of the Prior Art
Today""s data processing systems are capable of storing large volumes of data. The density of storage devices continue to increase at the same time the prices for such devices are falling. In addition, current networking capabilities allow multiple storage devices and file servers to be interconnected so that databases can be shared across systems. As a result, users of current computer systems are provided with access to an unprecedented amount of information.
The internet is a prime example of this information explosion. It is estimated that between 30 and 50 million pages are currently available to users of the internet. In addition to this publicly-available documentation, many users are also provided with other proprietary sources of information, such as those that are available via corporate intranet sites. Information may be obtained from still other sources, such as newsgroups.
To take advantage of the large amount of information made available by technological advances, the information must be readily accessible. Users must be able to locate and retrieve the documents that are needed in a timely manner. To do this, information retrieval systems must be developed that allow users to identify the best or most relevant information associated with a user request.
Many challenges exist when developing an information retrieval system that is capable of aiding users in finding meaningful information from a large body of electronic data. Often times, the users of such systems are only familiar with general topics of interest, and are not able to specify the actual terminology used within the textual information that is relevant for a desired topic. Additionally, the user may not be familiar with the keyword descriptors used to index the documents. As a result, the user-provided query may be incomplete or inaccurate. Other factors, for example regional language dialects, may further influence the construction of a search query. For all of these reasons, the information retrieved during a search may only yield a small number of documents that are actually relevant.
One way to improve search results is to provide a mechanism for automatically expanding a user-provided query string to include terms that do not appear in the query, but which may correspond to, or be associated with, the user-provided query terms. A system of this type is disclosed in U.S. Pat. No. 5,265,065 to Turtle. This patent describes a system in which words of a natural language input query are replaced with phrases from a database in a manner that expands the query. The expanded query is then utilized for information retrieval. The problem with many prior art systems of this type is that no opportunity is provided for the user to interactively participate in the query development process. If the search is unsuccessful, the user must simply re-execute the search with a different query string.
One method which does provide an iterative technique for allowing a user to interactively refine a query is provided in U.S. Pat. No. 5,278,980 to Pedersen et al. This patent describes a process whereby a user-provided query is developed into a search string that is further used to locate a list of matched phrases from a corpus of documents. Words from the returned phrases that are not included in the original query can be used to refine the query. This process can be repeated to retrieve documents that are increasingly focused on the desired topic. Although this system provides an opportunity for the user to exert control over the query development, the user intervention is only allowed after a search has been performed. This may waste processing resources, and does not provide any insight into the manner in which the user-provided query is developed into the actual search string.
Other types of search tools have been developed which seek to expand user-provided queries by employing context-based analysis techniques. These search tools analyze a phrase provided by a user in attempt to xe2x80x9cunderstandxe2x80x9d the user""s intent. The search tools place various search terms within the context of other search terms so that the concept behind the query can be determined. While these types of tools can result in the retrieval of more relevant documents, these tools have the above-mentioned disadvantage of not allowing a user to participate in the search development process. That is, these tools utilize predetermined algorithms that can not be influenced by the user. Once the user provides an initial query string, additional query analysis and development is under the control of the tool, and the user is not provided any ability to control the algorithm or the lexicon employed in increasing the scope of the search. Thus, the number of irrelevant documents retrieved may actually be increased instead of decreased.
Another problem with existing search tools is that the user is not allowed to specify, with any degree of definiteness, the extent to which a query should be expanded. For example, a user may want to specify a given topic like xe2x80x9cMexican Cookingxe2x80x9d for document retrieval. The user may further want the same search to return documents on growing peppers. Prior art search engines do not allow the user to exert control over the precise scope of the search so that documents concerning multiple related, yet distinct, topics can be retrieved using the same search.
Yet another drawback associated with prior art tools involves the limited visibility provided into the actual query development process. Because the user is not allowed to view the manner in which search expansion is accomplished, it is difficult to determine how a query string should be revised to retrieve more relevant documentation. If a search is unsuccessful, the user is left to guess as to how the query might be modified to obtain more meaningful results.
Still another problem with current document retrieval systems is that both the lexicon and the algorithm employed to expand a search is fixed. That is, the user is not allowed to modify the content or the arrangement of the index employed to develop a query. Thus, query development can not be customized to account for professional terminology, business or company-specific acronyms, newly-coined expressions, and the like, that may be included in a particular user""s corpus of documentation.
What is needed is a flexible search system that allows users to closely control the expansion of a search query. The system should also provide the capability for a user to modify the manner in which particular query expansions. This capability should include the ability to modify both the system lexicon, and the lexicon organization. The system should allow users to add user-specific terminology such as regional jargon, slang terms, foreign language representations, or any other particularized phrases included in the body of material to be searched.
It is a primary object of the current invention to provide an improved system for aiding users in developing search queries;
It is another object of the invention to provide a system that allows users to interactively control the manner in which a search query is expanded;
It is yet another object of the invention to provide a system wherein an iterative, interactive process is utilized to allow a user to expand a search query while exploring the lexicon of the system;
It is still another object of the invention to provide a system for cataloging search terms in a manner that is controllable by users;
It is yet another object of the invention to provide a system for aiding in the development of search queries, wherein the system has an expandable lexicon that may be modified by the user;
It is still another object of the invention to provide a system for aiding in the development of search queries wherein the scope of search expansion is controlled by the user;
It is another object of the invention to provide a system for aiding in the development of search queries wherein the extent of user interaction is controllable by the user;
It is yet another object of the invention to provide a system for aiding in the development of search queries wherein the extent to which a lexicon is used to expand a query is controlled by the user; and
It is yet another object of the invention to utilize an object-oriented repository to implement a hierarchical concept tree for use in interactive query development, and wherein a query developed by the system may be utilized to search other objects stored in the object-oriented repository.
Still other objects and advantages of the present invention will become readily apparent to those skilled in the art from the following detailed description of the preferred embodiment and the drawings, wherein only the preferred embodiment of the invention is shown, simply by way of illustration of the best mode contemplated for carrying out the invention.
The forgoing objects and other objects and advantages are provided in the current invention, which is a computer-implemented system and method for allowing users to interactively develop searches. The system utilizes a hierarchical concept tree stored in memory to develop a query. The nodes of the concept tree, which are grouped according to broad application areas called Application Domains, represent concepts that might describe any given search topic. Relationships are created between the concepts. The relationships, which exist as the branches of the hierarchical concept tree, represent the manner in which the concepts interrelate. A concept may be related to one or more xe2x80x9cparentxe2x80x9d concepts existing one level higher in the tree structure, or may be related to one or more xe2x80x9cchildxe2x80x9d concepts existing one level lower in the tree structure. Concepts having the same parent are said to be xe2x80x9csiblingsxe2x80x9d. The level of generality of the topics stored within the concept tree ranges from most general at the top of the tree structure, to progressively more specific at the lower-levels of the tree structure.
Each concept stored as a node in the tree may be related laterally to one or more other nodes storing character strings, and which may be referred to as xe2x80x9cword elementsxe2x80x9d. Each word element may further be related to one or more other nodes storing variants of the stored character string, including abbreviations, acronyms, foreign language translations, plural formats, possessive formats, and the like. These nodes may be referred to as xe2x80x9cword variant elementsxe2x80x9d. The word and word variant elements comprise the lexicon of the system, and are used to develop a query.
The system includes a user interface that allows for interactive traversal of the various relationships in the hierarchical tree structure. The word and word variant elements located during this traversal are added to a potential query string. Traversal of the tree structure begins by locating a user-provided character string in one of the word or word variant elements. Traversal of the tree continues so that related concepts are located, and further, so that all other word and word variants related to the located concepts are also located. The user is allowed to select or de-select each of the located concepts, words, or word variants for further inclusion in the query development process. Traversal of the hierarchical tree structure continues with the parents and children of the remaining initially-located concepts. After each additional level in the concept tree is traversed, the user is allowed to again specify selection or de-selection of any of the located concepts, words, or word variants. The user is also allowed to specify whether the search should be expanded to include parents, children, or siblings of a previously-located concept. This allows the user to expand query development to include concepts that would otherwise not be located during traversal of the parents and children of the initially-located concept. This iterative process continues until a selected number of levels in both the parent and child directions have been traversed in the hierarchical tree structure for the initially-located concepts. This query development method allows the user to view search terms as the terms are added to the query, and further allows control over search expansion in a manner which is not provided for in prior art systems. Because the user controls the concepts that will be included in the search query, a single search may include multiple related, yet distinct concepts.
After traversal of the hierarchical tree structure has progressed to the extent specified by the user, the word and word variant strings related to the selected concepts may be formed into a query string that includes logical operators. The query string will be formatted as required by the search tools that will receive the query string. The query string is provided to manually or programmatically invoke any number of various tools used to perform a text or file search. These tools include text editors, web-based search engines, file management systems, or object management systems used to catalog and track the development of software constructs.
According to one embodiment of the invention, the hierarchical tree structure is implemented in an object-oriented repository. The developed query string is used to search other objects stored within the same object-oriented repository.
The current system allows the hierarchical concept tree structure used in query development to be viewed and modified by a user. A user may edit the contents of, and the relationships between, the concepts, words, and word variants. Relationships may be modified by moving any of the concept nodes to a different location in the hierarchical concept tree structure. Similarly, relationships may be modified or created between word variant and word elements, and word elements and concept elements. Alternatively, relationships between nodes may be deleted. This allows the user to closely control the manner in which query development will progress. This capability allows the user to tailor the hierarchical concept tree and the lexicon used in query development to specific needs. For example, the lexicon may be modified to include terms, acronyms, product names, and the like that may be unique to a particular company, profession, or line of business. Regional dialects or personal preferences may also be reflected in the terms included in the tree.
According to one embodiment of the invention, a user is allowed to specify at the outset the number of levels of traversal that should occur within the hierarchical concept tree structure, with user intervention being required only at the termination of tree traversal. The user is then allowed to select or de-select various ones of the strings to be included in the query string, and to add logical operators to the string. This embodiment allows a user familiar with the hierarchical tree structure to develop a query string with only minimal user intervention.
According to yet another embodiment of the invention, query development may be fully automated by programmatically invoking traversal of the hierarchical concept tree structure with the selected parameters. All character strings that are located during traversal of the hierarchical concept tree are automatically formatted into a query string that may further include logical operators added using script commands.
The current invention provides a system and method that allows a user to develop a concept-based search using a concept tree and lexicon that may be closely tailored to user needs. Furthermore the level of user interaction during query development may be selected to meet user requirements, ranging from a high degree of user intervention to no intervention at all.
Still other objects and advantages of the present invention will become readily apparent to those skilled in the art from the following detailed description of the preferred embodiment and the drawings, wherein only the preferred embodiment of the invention is shown, simply by way of illustration of the best mode contemplated for carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various respects, all without departing from the invention. Accordingly, the drawings and description are to be regarded to the extent of applicable law as illustrative in nature and not as restrictive.