The present invention relates to systems and methods for document searching and organizing. More particularly, the present invention relates to systems and methods for retrieving relevant documents from a computer network in response to a user's search query and organizing the retrieved documents into categories.
Search engines are used to explore the World Wide Web and build indices of available web pages. Search engines typically have three major elements: the spider (crawler), the index, and search engine software. The spider visits web pages to extract information from them to build an indexed database for the search engine. The spider searches for new web pages, as well as changes in web pages that have already been indexed by the search engine. Typically, search engines will run several spiders that explore the web as a team. The index serves as a storage space for the information found by the spider. The search software component of the search engine allows users to look for web pages containing information related to one or more search terms entered in a search query. The results of the search are displayed, and are typically ranked by a method that involves the location and the frequency of the search terms within the web pages.
Search engines are distinct from search directories. Search directories require individuals to submit information about a web site to the search directory. Editorial staffs maintain the search directory and classify the submitted web page information. A search directory user is only able to select from sites listed in the directory. While this approach typically produces high quality indices and allows for classification of web sites in a directory structure, the growth of the Internet makes the task of covering a large percentage of the Internet increasingly difficult for the editorial staff. As a result, searches performed using search directories often return too little useful information.
Keyword searching is the most common form of searching used by web search engines. Keywords indicating the content of a web page may be specified using meta tags. If a web page does not have meta tags, the search engine must determine the keywords of a web page. Search engines generally pull out and index words that are believed to be significant. Words that appear near the top of a document and words that appear frequently throughout a document are more likely to be considered important.
There are several problems associated with keyword searches. Keyword searches typically have a difficult time distinguishing between words that are spelled the same way, but have different meanings. Thus, a keyword search can produce results that are irrelevant to the intention of a user's query. In addition, search engines that use keyword searching typically do not perform “stemming,” which is a process for determining the root word from a user-entered search term. Searching for relevant documents using root words can produce different results than a search conducted with non-root words. Furthermore, many search engines employing a keyword search do not return documents containing keywords that have the same meaning but are not listed in a user's query. Thus, documents potentially relevant to the user's query are not retrieved.
Unlike keyword search systems, concept-based searching attempts to determine the intended subject matter that the user is requesting in a query. A concept-based search engine generally returns a list of documents that are related to the subject of the search, even if the words in the document do not directly match the words in the query. Concept-based searching often involves “clustering,” where the meanings of words are examined in relation to the words found nearby. However, search engines that use concept-based searching have exhibited varying levels of effectiveness in retrieving documents relevant to a user's search query.
Relevancy ranking is becoming increasingly critical to users as the volume of information available on the web grows. Users typically do not have time to sift through hundreds of documents or links to determine relevance. Some search engines use search term frequency as a method of determining whether a document is relevant. However, if the search term entered by the user in a search query is relatively common, or has multiple meanings, a search engine can produce search results which a user considers irrelevant to the user's intended search.
Accordingly, it may be desirable to provide systems and methods for interactively searching, retrieving, categorizing, and summarizing documents from computer networks that are relevant to a user's search query. Furthermore, it may be desirable to provide systems and methods for minimizing the opening, closing and reading of documents.