The present invention relates generally to computerized research, and more particularly relates to a system and methods for conducting research on networked computer systems, particularly Internet-based data archives, and accumulating the research information to obtain useful computerized research results.
The Internet is an immense network. As of early 2000, there are more than 100 million users accessing over 5 million active sites with over 800 million pages of information and it grows daily. That is an astounding mountain of raw data to sift through.
The Internet""s greatest strengthxe2x80x94the immense volume of informationxe2x80x94is also the root of one of its weaknesses. Extracting specific knowledge from this vast repository of information can be frustrating and extremely time-consuming. Additionally, sites are published by thousands of people; there is no organization to this mass of information. Web pages are constantly added, deleted, updated, and moved. Finding relevant information on the Internet can be challenging in such a chaotic environment. Conventional search engines, such as ALTAVISTA.COM and YAHOO.COM, seldom find a desired answer without numerous irrelevant distractions.
There are several reasons why Internet searches are not effective. First, conventional search engines and directory services on the Internet are designed to provide instant, cursory reviews of the enormous numbers of pre-cataloged topics on the Internet. This method produces a tremendous quantity of raw and unrelated information. Generally, conventional search engines do not identify what is new or changed since the searcher last asked about the topic. Conventional search engines often return unmanageably large numbers of answers to a single question. Further, conventional engines rely on stale informationxe2x80x94sometimes weeks to months old. Conventional search engines do not retain search results; the searcher must restart each time a search is conducted. Conventional engines work only while the searcher is online; cover a mere 20% of the available content on the Internet; only show preestablished or xe2x80x9ccannedxe2x80x9d summaries that are frequently unrelated to the question; cannot report information that has frequently changing content; and do little or nothing to teach a searcher how to construct an effective query.
In order to perform a search with most search engines, a user typically submits a query containing one or more query terms. A query server program of the search engine then processes the query to identify any items that match the terms of the query. The results of the query is a set of web sites or documents which is typically presented to the user as a hypertext listing of the located items. If the scope of the search is large, the query result may contain hundreds, thousands, or even millions of items.
Due to the enormous and rapidly growing quantity and diversity of information accessible through the Internet, search engines generally maintain a tremendous amount of Internet content and pre-index the information to facilitate rapid searching. Therefore, when an Internet user enters a search, the search engine quickly looks into its index and tries to provide the user with a response within a few seconds. The accuracy of the information provided in the response, however, depends on the current state of the index, which may be incomplete and/or outdated.
Another class of search solution is the xe2x80x9cmeta-search engine,xe2x80x9d as implemented on sites such as DOGPILE.COM and METACRAWLER.COM. These meta-search services collect the search request from the user, then farm out the request to two or more pre-selected search engines or directories. The results returned are then rapidly repackaged and presented to the user. Various implementations perform differing levels of compiling the results before presentation. The simplest merely report the results from each search engine or directory separately. More advanced ones merge the results into a single report, eliminating duplicates.
However, meta-search engines are wholly dependent on traditional search engines and directories for their results. Meta-search engines use a similar model of providing the results as rapidly as possible to the user. Accordingly, such engines have the same search engine limitations cited abovexe2x80x94except that by accessing the indexes of more than one search engine, they increase the potential coverage of the Internet beyond the typical 20% of a single search engine. Further, many implementations of the meta-search concept fail to adapt or optimize the user-entered search syntax to the various search engines used. Meta-search engines also use a preset collection of search engines (some provide user selection of the specific choices) which are used for all search requests.
Although existing search engines are generally useful, users interested in acquiring and compiling focused information are often inundated with too many results. Moreover, prior art search engines are ill equipped to handle the formidable task of indexing the vast amounts of developing Internet content. Indeed, because existing search engines are tailored to giving users immediate responses, those responses are often inaccurate, irrelevant, and/or antiquated. The user ultimately takes the brunt of any errors, inaccuracies, and outdated information. Specifically, users are often presented with duplicative search results (i.e., the same found item may appear on one or more different web sites), or dead links (which generate the dreaded xe2x80x9cError 404xe2x80x9d, which means that the information, although indexed, is no longer available at the site that generated the index entry). In addition, search engines provide only one tool for actually conducting a research project.
Human beings traditionally conduct research in a manner that is not facilitated by present Internet search engines. Just like in conventional library research, people typically conduct research by (1) attempting to identify one or more authoritative sources of information, (2) locating and querying those sources, (3) inspecting manageable collections of information provided by the sources, (4) taking notes on the information (e.g. by writing on an index card), (5) xe2x80x9cfilteringxe2x80x9d the information by categorizing the cards as a function of quality or state of currency or completeness, etc., (6) selecting and retaining those items of information that satisfy the researcher""s goals, and (7) repeating the previous steps as necessary to achieve sufficient information to meet both initial research goals and to obtain informative updates over duration that the topic continues to be of interest or importance. The final research product is the result of selection of the most relevant items of information from the various sources.
As described above, existing Internet search engines only provide the user with a list of possible sources of information (i.e., a list of static items that have been indexed a day, a week, or may be a month ago). The list provided by an existing search engine is much like providing a library user with a listing from a card catalog. For example, like a card catalog, a list of sources only informs the user that there may be information available on a particular research topic. It does not provide the user with any additional assistance. The user is left to hunt down whether the information is still available and if so, determine whether it is truly relevant to the researched query.
Furthermore, while existing Internet search engines provide instantaneous responses, they do not provide users with any continuity of use, or adequate means for filtering the irrelevant information, or adequate means for determining quality of the seemingly relevant search results, or adequate means for retaining relevant findings. In other words, prior art search engines do not maintain a relationship with any one user and are therefore unable to identify one user from another. Accordingly, when a user enters a follow-up search request to obtain updated information from a past search, existing search engines will likely reproduce duplicate items. The user must then sort through all the duplicate items to determine if the search results contain any new or updated information. This has proven a difficult and tedious task for serious Internet researchers.
In addition, prior art search engines generally attempt to accomplish their request processing in the background. Some of these search engines recognize common mistakes that users make, but they attempt to program their systems to work around them. They use technological tricks such as ignoring certain words and punctuation marks to xe2x80x9csecond guessxe2x80x9d the user and form a better request. This approach, however, fails to teach users to create better requests and fails to provide them tangible feedback on what to do different to get closer to their research objectives.
Therefore, there is a need for a searching tool that is directed to the problems of finding too much irrelevant information on the Internet as well as managing the volume of information that a user gathers on the Internet.
There is a further need for a searching tool that can contemporaneously index developing Internet content.
There is yet another need for a searching tool that provides the user with additional assistance for determining the initial relevancy of each located item to the researched query.
Additionally, there is a need for a searching tool that tracks and maintains a history of each user""s searches and results.
There is still an added need for a searching tool that teaches the user to independently enter a better request.
There is a further need for an improved searching tool that can continuously provide a user with new and updated information based on a previous search request.
Briefly described, the present invention relates to systems and methods for conducting computerized research with a knowledge engine, especially suitable for the Internet environment, that operates in a manner similar to that of a human being searcher in a library. A system constructed in accordance with the invention helps searchers find and accumulate a personal library of knowledge; hence the term xe2x80x9cknowledge enginexe2x80x9d as opposed to search engine. Because the present invention was designed for research, not cursory searches, it does a lot more than just find information. A system of the invention compiles information from multiple sources, weeding out obviously bad information, storing findings in a personal library on xe2x80x9cbookshelves,xe2x80x9d and provides a searcher with a summarized, condensed, and highlighted report of the search results.
More particularly described, the present invention provides a system and methods for accumulating and displaying information items obtained via a computer network such as the Internet and World Wide Web (WWW). The system provides a plurality of selectable expert topics, each expert topic comprising one or more network computer accessible sources of information. A user inputs a user search request, a selection of one of the plurality of expert topics, and update schedule information as to when the user wishes to receive automatic updates to their search. The user search request, a selection of one of the plurality of expert topics, and update schedule information are stored at a web site server.
In accordance with the stored update schedule information, the user search request is provided to the information sources in the selected expert topic. Raw search results from the information sources are received and processed to eliminate dead links and duplicate items. The processed raw search results are stored as stored search results comprising a plurality of stored search items. A selected predetermined subset of the stored search items is selected for communication to the user. The predetermined subset of stored search items is then communicated to the user, e.g. by e-mail, pager, cell phone.
The system is further operative for receiving user commands to hide selected items of the predetermined search results, such that the user is not displayed hidden selected items, but the selected items remain stored as stored search results.
Preferably, the stored search results are updated automatically with new search results with an updated search conducted in accordance with the update schedule information. Typically, users of the updated search results will be notified by via e-mail, although other equivalent methods exist.
According to an aspect of the invention, new search item are determined in the updated search, and the new search items are identified in a communication to the user.
According to another aspect of the invention, changed search items in the updated search are determined, and the changed search items are identified in a communication to the user.
According to yet another aspect of the invention, unavailable search items in the updated search are identified (e.g. xe2x80x9cError 404xe2x80x9d), the unavailable items having been available in a prior search. The unavailable items are preferably deleted from the subset of information provided to the user.
According to yet another aspect of the invention, stored search items are analyzed, typically off-line, and information sources are identified as potential new sources.
According to another aspect, statistics are tabulated corresponding to the quality of information provided by an information source, in association with a selected expert topic. A ranking is assigned to the information source, and information items from a source of higher ranking to the user are displayed before displaying information from a source of lower ranking. Preferably, information items from an information source are displayed to the user in a collection associated with information identifying the information source.
According to yet another aspect of the invention,potential new information sources are identified from the stored search results,and tested as potential new information source. The invention provides for automatically determining the interface parameters associated with the potential new information source.
Finally, the preferred embodiment provides a user-friendly web site for interacting with the user to modify the search request to obtain better search results.
The disclosed system uses hundreds of web sites to cross-reference topics, verify Internet sites and pages, and requested information, and sort out non-pertinent links. Unlike many conventional search engines, the disclosed knowledge engine:
Eliminates dead links,
Reduces or eliminates questionable links,
Creates customized and relevant summaries,
Creates email notification and updates,
Stores research results in a xe2x80x9cpersonal library,xe2x80x9d
Is accessible anywhere the searcher can connect to the Internet,
Alerts the searcher to new and changed items, and
Accesses pages with rapidly changing content that cannot be indexed by search engines.
More particularly described, the present invention is directed to a system and methods for continuously accumulating information. More specifically, the present invention of a xe2x80x9cknowledge enginexe2x80x9d emulates and automates the process that a human researcher uses in gathering information. For purposes of this discussion, the term xe2x80x9cknowledge enginexe2x80x9d will be used in connection with discussion of the present invention in order to distinguish the present invention from prior art xe2x80x9csearch engines.xe2x80x9d
Preferred embodiments of the present invention are constructed around a computer system operated by an Internet-based research service provider, the system including an Internet World Wide Web (WWW) front end that allows users to enter a search request. To start with, the system receives a user""s search request and provides the user with query feedback and recommendations on how to optimize that request. For example, the recommendations may involve syntax, spelling, and/or recommendations on the use of terms. These recommendations help the user optimize the search query and gain better results.
The knowledge engine then submits the search request to a number of sources it has previously identified on the Internet and retrieves the information available from those sources. For the purposes of this discussion, a xe2x80x9csourcexe2x80x9d is considered to include any computer-accessible site that can be queried via a network connection. In other words, a source must provide information and the capability to search that information (whether indexed or via other means). According to one aspect of the invention, the knowledge engine can be viewed as a dynamic xe2x80x9cindex of indexes.xe2x80x9d For example, the knowledge engine will index the index provided on sites such as CNN.COM, BRITANNICA.COM, and ESPN.COM. Some sites such as ESPN.COM, which can itself be a source, and other sites that present continuously changing content provide contemporaneous updates, such sites are therefore considered xe2x80x9cdynamic.xe2x80x9d Other sources do not change as rapidly, or may not change at all (e.g. archived content), and may be considered xe2x80x9cstatic.xe2x80x9d The preferred knowledge engine has access to both static and dynamic Internet information.
To better understand the significance of dynamic indexing, consider the following example. Assume that a knowledge engine constructed in accordance with the invention receives a request at noon for information on an event that happened in the world earlier that day. Traditionally, if CNN.COM posts the desired information on its site at 11:00 A.M., existing search engines that index static information may have to wait a week before they are able to provide that new information. This is because existing search engines must retrieve, store, and index the information before it can be provided to a user. In contrast, a knowledge engine system constructed in accordance with the invention does not have to obtain and index the information in advance and can therefore provide the user with current information through dynamic indexing.
In accordance with the preferred embodiment of the present invention, the knowledge engine not only dynamically searches the Internet to collect the sites where information may be found, it also examines the content of those sites to determine the information""s relevancy and accessibility. Thus, the user is not flooded with copious results containing duplicate sites, dead links, and inapplicable content.
According to one exemplary aspect of the present invention, the knowledge engine transmits the ensuing results to the user and displays them in a context sensitive fashion. In other words, the results are provided with highlighted portions of each site""s most relevant content. This allows the user to make a ready determination as to the importance and/or relevance of each finding. Moreover, the knowledge engine provides the user with additional research management functions that allow the user to efficiently manage the received information. For example, such research management functions include the capability of hiding and unhiding items on a list of search results. Hidden items are not deleted from the list of search results (much as one might not throw away the collection of research note cards). Rather, the items are preferably selectively obscured from view so that the user can concentrate on selected visible items. By hiding items instead of deleting them, the present invention prevents those items from reappearing by being xe2x80x9cre-foundxe2x80x9d when research is updated for the user.
According to yet another aspect of the invention, after the search results have been provided to the user, the knowledge engine may be configured to periodically reevaluate the sources to determine whether they contain additional information that might be relevant to the user""s initial search. Additional information is provided to the user through various means, for example, via electronic mail (e-mail) updates that are scheduled on a periodic basis or that notify the user to visit a master web site containing the user""s collection of search items. The reader should also appreciate that the user may be contacted in alternative ways, such as via an Internet site, PDA, telephone, pager, or other equivalent communication means.
According to yet another aspect of the invention, the knowledge engine employs an intelligent automated process that searches the Internet for additional sources, in addition to transmitting periodic updates to the user. All sources are maintained in a sources database containing source address information (e.g. URL), as well as source quality information and expert topic categories, and expert topic relevance. Sources are re-evaluated from time to time, and their associated quality information and expert topic categories revised.
Further still, once a potential new source is discovered, the knowledge engine determines how to interface with the source by testing and evaluating the source. More particularly, the knowledge engine finds the source and issues a command to find the xe2x80x9csearch box,xe2x80x9d which is typically the field that is used to enter a search request to the source. Once the search box is found, the knowledge engine may iteratively enter model search requests, receive the results from the issued requests, and analyze the results. These analyses allow the knowledge engine to determine how to interface with the source (e.g., how to communicate back and forth, pass the results, and how to deal with the information that is received from the source).
According to still another aspect of the invention, a plurality of information sources are precategorized into predetermined xe2x80x9cexpert topicsxe2x80x9d so as to facilitate targeted research that pertains to selected topical area. For example, selected sources may be arranged into expert topics on law, science, medicine, computers, communications, history, business, etc. Then, on a regular basis, the sources within an expert topic database are reevaluated to determine if the source site setup has changed; and, if it has changed, in most cases, the knowledge engine automatically detects the new structure and adapts to it. As a result of the process of finding more sources, evaluating and testing those sources, and reevaluating the existing sources, the knowledge engine generally locates additional potential sites. These sites are then considered for inclusion into the source database.
In the preferred embodiment of the present invention, once a source is located and tested, it is placed within an xe2x80x9cexpert topic.xe2x80x9d An expert topic is a group of sources that have a common theme. For example, all medical sources may be grouped together in a medical expert topic. Since there are hundreds of sources, it is not feasible, practical, or productive for every request to be submitted to every source available to the knowledge engine. Therefore, the knowledge engine submits the search to concentrated subject groups known as expert topics. The reader should appreciate that it is not necessary for the knowledge engine to mandate the expert topic because the user may choose to select the user""s preferred topic. Moreover, it is also possible for the users to contribute and suggest additional expert topics.
Other goals, features, and advantages of the present invention will become apparent upon reviewing the following detailed description of the preferred embodiments of the invention, when taken in conjunction with the drawings and the appended claims.