In general, public information sources, such as the Internet, present challenges for information retrieval. The volume of information available via the Internet grows daily, and search engine technologies have scaled dramatically to keep up with such growth. Conventionally, search engines, such as those provided by Yahoo, Google, and others, utilize data collection technologies, such as spiders, bots, and web crawlers, which are software applications that access web pages and trace hypertext links in order to generate an index of web page information. The data collected by such software applications is typically stored as pre-processed data on which search engines may operate to perform searches and to retrieve information.
Additionally, a vast amount of data exists that is not accessible to the public Internet (e.g., “dark web” data, internal data, internal application data, private data, subscription database data, other data sources, or any combination thereof). Such data can often be searched via private access interfaces, private search tools, other application program interfaces, or any combination thereof. Such information may be segregated from other information sources, requiring multiple interfaces, multiple protocols, multiple formats, and different database drivers to access the data. Accordingly, information retrieval can be complicated by the variety of data sources.
To improve the quality of search results and to remove “junk results,” search engines may include logic or tools to fine-tune the search results. In some instances, such fine-tuning may be based on relevance to other users, on a number of links from other web pages to a particular resource, or on a combination of information that is not specific to a user's interests (i.e. the user's search and the question related to the user's search). Additionally, with the volume of search results, even after fine-tuning, it often remains difficult to identify desired information. Hence, there is a need for an improved system and method of searching distributed data sources.