Many applications require collection and searching of large amounts of data. A timely example (though not the only one) is that the intelligence community is interested in collecting relevant information from vast amounts of streaming data, such as packet traffic on network routers, on-line news feeds, on-line chat rooms, message boards, on-line search requests, and potentially terrorist-related websites. With such vast amounts of data, it is virtually impossible to store it all. Therefore, typically the streaming data is filtered from multiple data streams using search criteria in an online environment, wherein most of the data is filtered out as irrelevant, leaving a much smaller amount of relevant data to be processed. The relevant data is retained based on the search criteria that have identified the data as potentially relevant. The relevant data is then transferred to a classified/secured environment for private analysis. However, this method cannot necessarily keep the search criteria private/classified.
Preferably the search criteria is classified, because otherwise adversaries could simply avoid using terms within the search criteria, and thus prevent their communications from being identified and analyzed. Therefore, another current practice is to collect all streaming data at issue into a secured environment, and then filter out the unwanted/irrelevant data within the secured environment, leaving the relevant data for further analysis within the secured environment. However, this approach is extremely burdensome in terms of the time and storage required, and further involves a risk that the data transfer of such a vast amount of data into the secured environment will be interrupted, causing further delay and potentially even data loss or data corruption.
Therefore, it is desirable for a method that allows searching and filtering of streaming data in a non-secured and/or distributed environment, in such a manner that the search criteria as well as the results of the searching and filtering remain classified (i.e., hidden even from the person who's machine may be executing a program embodying the invention), even when the relevant data is transferred from the non-secured environment to the secured environment. Such a method would be particularly useful if capable of being executed in a distributed environment, because the searching and filtering could then be outsourced publicly to multiple, even non-trusted computers and locations, resulting in virtually limitless resources. The method would be further desirable if it could be implemented with a computer program having a size independent of the data stream size.