One method of searching is performed over the World Wide Web (WWW). This type of searching is commonly referred to as web searching and is normally performed by a search engine. The term search engine is used to refer to an information retrieval system designed to help find information stored on a computer system. Search engines help to minimize the time required to find information and the amount of information which is subsequently consulted. One type of conventional search engines are Web search engines that search for information on the public WWW. Other types of conventional search engines may include enterprise search engines that search on private intranets, personal search engines, and mobile search engines. Typically, search engines provide an interface to enable users to specify criteria about an item of interest and have the engine find the matching items within the stored information. The items of interest of interest are typically documents and the criteria are the words or concepts that the document may contain. A document, as used herein, is a bounded physical representation of a body of information designed with the capacity to communicate information. Documents may be digital files in various formats, including web pages, word processing documents, images, or the like.
One prior art technique of Web search engine is to use a Web crawler. A web crawler, also known as a web spider, web robot or web bot, is a program or automated script which browses the WWW in a methodical, automated manner. This process is called web crawling or spidering. Many search engines uses spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by the search engine that will index the downloaded pages to provide fast searches. A web crawler typically starts with a list of Uniform Resource Locator (URL), and upon visiting these URLs, the web crawler identifies all hyperlinks in the page and adds them to the list of URLs to visit. These URLs can be recursively visited according to a set of policies. By indexing collected documents, or metadata about the documents, the search engine can provide a set of matching items quickly. For example, a library search engine may determine the author of each book automatically and add the author name to a description of each book. Users can then search for books by the author's name. The metadata collected about each item (e.g., document) is typically stored in the form of an index. The index provides a way for the search engine to calculate the relevance, or similarity, between the search query and the set of items.
A limitation of this prior art method is that the collected information is a copy of the entire document, and the index is organized according to the collected documents, such as by the metadata that corresponds to the document. As a result, this prior art method has the disadvantages of processing the entire collected documents, such as to extract or generate metadata related to the collected documents, and organizing the document information (e.g., metadata) according to documents, not according to the items of interest.
Another prior art technique of a personal search engine is to use a desktop search tool. A desktop search tool is a tool that searches the contents of a user's own computer files, rather than searching other computers, or searching the Internet. These tools are designed to find information about documents on the user's computer, including web browser histories, e-mail archives, text documents, audio files, images, video, or the like. The search index for the desktop search tool resides on the user's computer. The search index is also organized according to the documents, not according to the items of interest.
A limitation of this prior art method is that the desktop search tool only collects information from a user's computer, not other computers. Another limitation of this prior art method is that the desktop search tool only collects information on the user's computer and does not discover other devices from which to collect information. As a result, this prior art method has the disadvantage of being limited in the types of information sources from which to collect information and the types of information that can be collected.