1. Field of the Invention
The present invention relates generally to systems and methods for computer-based searching.
2. Background of the Invention
Full-text searching of unstructured and semi-structured data is becoming more and more important in the world of computing. The amount of information available through the World Wide Web is voluminous. As of 2006, Google used 2 petabytes of disk space (en.wikipedia.org/wiki/Petabyte). To improve search accuracy of web and desktop searches, efforts have been invested in the improvement of page ranking in terms of relevance. Despite these efforts, however, a large gap still exists between the results returned by a search, and the results desired by a user.
Currently, advanced efforts involve the utilization of a network of computer processors for the calculation of document result sets, which calculation is based on the interconnectivity of documents (“The Anatomy of a large-Scale Hypertextual Web Search Engine”; Sergey Brin et al.; 2000, pp. 1-29). The performance of the system is optimized by re-ranking the results (see U.S. application Ser. No. 10/351,316, filed on Jan. 27, 2003 by Krishna Bharat).
A drawback of this method is that only documents that are frequently cited are found. Thus, relevant information from new or special sources is ignored. For example, a self-help group that provides relevant information on a private home page does not have any “Pagerank” until the home page is cited.
Search routines can also be based on novel approaches in the field of artificial intelligence that utilize so-called artificial neural networks (ANN) in order to provide search results that consider semantic concepts and correlations and/or associations between terms and documents. One drawback of previously known neural network solutions for text mining is that the software only operates on a single computer, and thus is limited to a certain amount of data.
Several ANNs utilize unsupervised clustering algorithms. Unsupervised clustering algorithms fall into hierarchical or partitional paradigms. In general, similarities between all pairs of documents must be determined, thus making these approaches un-scalable.
In order to overcome the above-discussed drawbacks, it is desirable that the document result sets of a search engine be calculated based on the content of websites (and/or documents). Moreover, a need exists for a scalable neural network architecture that allows the generation of neural networks on a scale so that that even petabytes of data can be computed.
Thus, a need remains for additional optimization techniques that use distributed neural networks and virtual indexes.