This invention relates to a system and method for providing interactive dialogue and iterative search functions to find information among a network of servers and to display results depicting overall distribution and relationship of results.
While many search engines currently exist, it would nonetheless be advantageous to combine techniques known by subject-matter experts who manage queries manually and apply known data management techniques to develop an improved search engine. An arena where these opportunities are particularly evident is the world wide web where one of the most popular applications is a web search engine. Many search engines are available but provide similar and limited functionality.
A problem with the currently available search engines is that they fail to provide effective results for many search problems. The failures fall into a number of categories including, but not limited to, 1) formulating a query, 2) adequately displaying, manipulating and navigating through results, 3) determining for what the user is actually looking, and 4) remembering how to locate the results again. From the point of view of web site owners, search engines have another set of shortcomings including, but not limited to 1) failure of metatags to provide sufficient information for an intelligent search approach, 2) failure of key words to formulate an intelligent search, and 3) the inherent requirement that web site owners maintain a large number of tags in inventory because of the ambiguity of language. Additionally, there exist some technical entities that can be improved upon in the course of a search. For example, it would be advantageous if redundant results could be eliminated, dead ends could be eliminated, and sources could be evaluated, tagged and screened. It would also be advantageous to provide search functions that correspond with known communities of users (or websites, etc.).
Search results, as they are presented today, are not obvious to users. Also, the results are not presented in such a fashion as to take advantage of the human""s ability to sift through data visually and to determine relationships among displayed objects. Most systems do not disambiguate search terminology well enough to determine what the user meant when typing a query. Typically, users do not have to perform many steps to initiate a search task; they will usually enter a few key words and then request a search. As such, there is a need to combine iterative configurable query techniques with a lexical dictionary function. This combination is currently not available in search engines.
Web search engines do not provide user access to restructure aspects of the search from a graphical user interface. When a search is conducted and results are displayed, the decisions of the search engine are not displayed such that the user can manipulate the branches and navigate down the decision and results tree, changing the attributes and thereby finding slightly different results. The user is not provided with any information on what the extent of the results may be. The user is not afforded any opportunity to reconfigure the search or the results to display the relationship among the items returned.
Some search engines provide policies that attempt to order search results based upon closeness to the query and provide metrics to the user indicating closeness. Metrics are based upon popularity, word frequency, word relationships, position of word in title or body, metatags, links within a web site, links to a page or web site, physical attributes of the web site, etc.
Further, in all known search engines, accurate prediction relies heavily upon the ability to analyze large amounts of data. This task is extremely difficult because of the sheer quantity of data involved and the complexity of the analyses that must be performed. The problem is exacerbated by the fact that the data often resides in multiple databases, each database having different internal file structures. Rarely is the relevant information explicitly stored in the databases. Rather, the important information exists only in the hidden relationships among items in the databases. Recently, artificial intelligence techniques have been employed to assist users in discovering these relationships and, in some cases, in automatically discovering the relationships.
Data mining is a process that uses specific techniques to find patterns in data, allowing a user to conduct a relatively broad search of large databases for relevant information that may not be explicitly stored in the databases. Typically, a user initially specifies a search phrase or strategy and the system then extracts patterns and relations corresponding to that strategy from the stored data. These extracted patterns and relations can be: (1) used by the user, or data analyst, to form a prediction model; (2) used to refine an existing model; and/or (3) organized into a summary of the target database. Such a search system permits searching across multiple databases.
There are two existing forms of data mining: top-down and bottom-up. Both forms are separately available on existing systems. Top-down systems are also referred to as xe2x80x9cpattern validation,xe2x80x9d xe2x80x9cverification-driven data miningxe2x80x9d and xe2x80x9cconfirmatory analysis.xe2x80x9d This is a type of analysis that allows an analyst to express a piece of knowledge, validate or validate that knowledge, and obtain the reasons for the validation or invalidation. The validation step in a top-down analysis requires that data refuting the knowledge as well as data supporting the knowledge be considered. Bottom-up systems are also referred to as xe2x80x9cdata exploration.xe2x80x9d Bottom-up systems discover knowledge, generally in the form of patterns, in data.
Existing systems rely on the specific interface associated with each database, which further limits a user""s ability to dynamically interact with the system to create sets of rules and hypotheses than can be applied across several databases, each having separate structures. For large data problems, a single interface and single data mining technique significantly inhibits a user""s ability to identify all appropriate patterns and relations. The goal of performing such data mining is to generate a reliable predictive model that can be applied to data sets. Furthermore, existing systems require the user to collect and appropriately configure the relevant data, frequently from multiple and diverse data sources. Little or no guidance or support for this task is produced.
There is also a system that permits a user to create a reliable predictive model using data mining across multiple and diverse databases.
There is a need to help the user collect the data from a number of servers, process the data to reveal the explicit and non-explicit information, manipulate the information, use it to refine a search, extract original documents, reformulate a query, and modify a component or policy of a query. This should be done within the knowledge base or personality of the user in mind, with text and graphical elements most likely to represent a model used by the Community of Interest the user is from, such as an investor. This should also be done in a manner so as to allow the user to borrow the knowledge of another experienced Community of Interest or expert in a given field. This should be done with or without sharing identity of the user.
The results of the search should also be organized into a summary document to reveal the predictive model, sources, salient facts of the result, and links to resulting elements. This would help create content of a document about the subject search.
The results of the search should also be organized into groups, where all of the items in a group discuss similar topics. The grouping can be using information in the item (e.g., key words), or by how the item has been used in the past. The grouping can be done a number of ways; additional details will be discussed in the xe2x80x9cSummary of the Inventionxe2x80x9d section.
Improvements in information search engines to eliminate or reduce the aforementioned difficulties, and to satisfy the aforementioned needs, are desired. As such, the present invention contemplates a new and improved system and method for providing interactive dialogue and iterative search functions to find information among a network of servers and to display results depicting overall distribution and relationship of results.
Many of the inputs suggested in this patent depend on either explicitly requested user data or passively accumulated user-level information on searching habits. We acknowledge the possible privacy implications of this but do not directly address these issues in this patent.
A system and method for providing interactive dialogue and iterative search functions to find information among a network of servers and to display results depicting overall distribution and relationship of results are provided. The system and method provide determination in fine granularity a Community of Interest (COI) and further evaluation of search results using COI and/or expert preferences to identify important knowledge, formulate, manipulate, and display results, and summarize search results into a document like entity with dynamic attributes. The invention is generally applicable to an information search on a large network of servers such as the world wide web where there is such a vast amount of information that it is becoming increasingly important to overcome the aforementioned difficulties in order to effectively deal with the overwhelming amount of data that a search engine might return on any given search.
A primary advantage of the present invention is the pre-processing function that clarifies the query intent of the search engine user before initiating a search.
Another advantage of the present invention is the disambiguation of text by the use of lexical indexes, expert databases and known mental models for subject-specific and area-specific data whereby query results can be characterized based upon target user knowledge.
Still another advantage of the present invention is the multi-layered approach to presenting results, only showing the most likely solutions in a high level display, and showing more detail as lower levels are telescoped into.
Yet another advantage of the present invention is the use of new concepts in graphically presenting search results including such things as scatter grams, showing relationships among resulting elements, and the use of color, shape and other attributes to differentiate among resulting elements.
Another advantage of the present invention is the determining of COI categories in fine granularity, representing COI categories and representing them differently for different COI""s, representing relationships among COI categories and identifying an individual and the COI or COI""s to which he or she belongs.
Another advantage of the present invention is the handling of shifting or dynamic elements over time (resources and sources, access, individual""s experiences and skill set, age, preferences) by creating an expert record, modifying taxonomy over time to reflect changes in individual or group usage, time or society structures within which an individual operates (company, school, group).
Another advantage of the present invention is the provision of a system of graphic and audio representations of data that assist in understanding (for different populations and skill sets).
Further scope of the applicability of the present invention will become apparent from the detailed description provided below. It should be understood, however, that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art.