A search is a methodology to find a match to a particular pattern. In the commercial context, a well known search engine, such as Google, parses a set of search terms and returns a result list of items (web pages in the typical Google search) that are sorted in some manner. In the government context, search systems exist that attempt to search through a vast amount of information to detect instances of suspicious activities in which a technique known as segment matching may be used. Thus, a search may be characterized as essentially a bottom-up matching problem where the form of the query drives what the basic segment matching strategy needs to do. In bottom-up matching, one describes patterns (i.e. queries) that allow us to take the matches from lower levels in a query and further constrain the results (i.e. match context) using additional information and patterns. For example, we can define constraints or additional relationships (e.g. edges) on elements from lower levels.
It is desirable to provide a mechanism for searching large data sets using pattern matching. The need for pattern matching in large data sets has been steadily increasing in both the intelligence communities as well as in the commercial setting. In many cases, the size of collected data sets present significant challenges for any type of search technology. Additionally, there exists a constant tension between several of the characteristics of search techniques. Specifically, a tension exists for most search technologies between computational efficiency, search query expressiveness, and the representational fidelity of a data set.
It has been shown in research that using graph-based representations of information is nicely applicable in a wide range of situations. However, typical graph-based search processes do not scale to large data sets in practice. Additionally, mechanisms exist in the graph search area that allow a person to specify search in the context of graph-based data, but many of these mechanisms also do not scale to large data sets. For example, sub-graph isomorphism is a well-defined method for specifying a search in a graph; it is also well-known that, in general, sub-graph isomorphism is NP-complete so that it is computationally inefficient.
Most research in graph algorithms has focused on creating, manipulating, and maintaining a complete graph data structure, and therefore the data is assumed to be contained in the main memory of a computer system. Due to this assumption, the large data sets cannot fit into the main memory of the computer system and therefore the large data sets have outgrown the known graph processes. It is desirable to perform a graph search on very large data sets. None of the currently available systems are able to handle larger searches such as 100 million elements. Furthermore, it is desirable to provide a search system that can handle the large searches without specialized hardware or software so that a typical relational database may be used with the search system. Thus, it is desirable to provide a search system and method that achieves these goals and it is to this end that the present invention is directed.