Researchers are increasingly faced with the problem of searching a large network of databases with a massive amount of documents with highly variable content for a very small subset of information that matches the current interest of the analyst. Given the physical constraints of the available infrastructure, and the inability to reasonably funnel all documents through a central filter, distributed search mechanisms based on efficient heuristics have to be applied.
There exists a formal proof that any heuristic search mechanism on average performs only as well as a random search if no assumptions about the structure of the underlying search space can be made. This is due to the fact that a heuristic approach essentially determines the order in which the space is searched, with the hope that the solution is located in the higher-valued region of the search space. If the structure of the space is not known, then this hope is unfounded.
Since one cannot make any assumption regarding the structure of the space in the search for documents that match an analyst's interest, any heuristics-based search may potentially be very inefficient. One can hope to improve the efficiency of the search by either learning more about the underlying document domain (e.g., document assignments among branches of intelligence services), or more generally, structure the search space to design a heuristic that performs well on this structure.
Despite ongoing improvements in this area, the need remains for a more efficient process for searching databases and analyzing abstract documents to identify desirable information on a timely basis.