The World Wide Web (the Web) represents all of the computers on the Internet that offer user access to information on the Internet via interactive documents or Web pages, which are digital content resource files. Web information resides on Web servers on the Internet or within company networks. Web client machines running Web browsers or other Internet software can access these Web pages via a communications protocol known as Hypertext transport protocol (HTTP). With the proliferation of information on the Web and information accessible in company networks, it has become increasingly difficult for users to locate and effectively use this information. The reason for this is that there is too much information for search engines to update their index to reflect every change on every digital content data resource in a timely fashion.
The full text index is created by the search engine's software from digital content resources retrieved by their crawler software from the site. It enables the returned digital content resource to be searched by keywords, which point to the original site from which the digital content with that keyword was obtained. Search engines have proprietary algorithms which order the search results for a given keyword by relevance and display the sites in the order of most relevant to least relevant. Different algorithms can yield widely differing results and even the best algorithms have trouble determining the context of the search term. The process of retrieving the digital content from many different web sites and creating a full text index is resource and time intensive requiring significant computer resources and bandwidth when a large number of site indexes need to be updated. Public search engines contain a very large number of sites, which makes updating their index a significant and expensive endeavor. The general solution to this problem by the search engines is to allocate their scarce resources by limiting the update frequency of most of the sites in their index. Only the sites deemed most “important” by the search engine will have their indexes updated frequently. The information in the index of commercial search engines currently in the practice of the art for the vast majority of sites will be one to three months old. Even though a given site may not be important to the commercial search engines currently in the practice of the art that site could be very important to a searcher who might require the most current information from that site.
The result of these circumstances is that sites not highly rated by automatic algorithms used by commercial search engines currently in the practice of the art may not be updated in a timely manner, even if files on the site have been updated. Since commercial search engines use automatically executed algorithms to determine which sites are suitable for inclusion in their index, and how often to refresh the index of any given site, Web sites containing useful information may not be listed, listed early enough, or listed in a sufficiently timely manner in the index or the search results to be discovered by interested searchers. As a result, searchers may not be able to find important information because it is not in the search engine index and therefore cannot be retrieved or the relevancy of the result is ranked so low by the search engine that the searcher would have to go through hundreds or even thousands of listings to find the information they want. This is highly inefficient from the searcher's perspective. The current invention solves these problems for the searcher.
In addition, commercial search engines refresh their index data at a rate that is not suitable for many applications that require timely information, such as finance, political issue tracking, business news analysis and other subjects, such as those pertaining to health issues.
There are currently two types of digital content index and search available, non-customizable search and customizable search. Both place limits and burdens on the searcher that may result in available information not being found.
Standard search engines give the searcher no control over the information resources (digital content files) that are included in the search engine, how frequently the index is updated or the depth of the links included in the index. Searchers can suggest digital content resource data files to be included but there is no guarantee that they will be included. Instead, the search engine's management controls which digital content resource data files will be included. Search and ranking is usually done using some proprietary algorithm. These algorithms are frequently changed without notification of search users, and, consequently, Web sites can disappear without notice from a searcher's list of results. Furthermore, because a proprietary algorithm is used, the effects of these changes cannot be accurately understood, or compensated for, by searchers. As a result the searcher may not be aware of important information because it is not in the search engine's index of results.
There are some customizable search engines. However, all of the customizable search engines have limitations and/or create burdens for the searcher. Customizable search engines which use personal computers as the platform for the search software permit the searcher to choose the digital content information resources (e.g., magnetic or optically stored files) to include in the index. The user can also designate when to refresh the index and can set the link depth to include in the index, however there are burdens. The user must install the “customizable search engine software” on their computer. When the software retrieves digital content resource data files for the index it may overload the user's network connection or make the connection unavailable for other uses, potentially for long periods. As a result, such software is highly likely to prevent users from performing other tasks while the software runs. The search and index function in personal computer-based customizable search software uses the personal computer's processor, memory and hard disk, which limits search power and index size to the hardware on that personal computer. Thus, personal computer-based customizable search software may keep the user from being able perform other functions on that personal computer due to insufficient resources. Due to the previously enumerated burdens, this type of customizable search software has serious usability drawbacks for the user.