Since the early 1990's, the number of people using the World Wide Web and the Internet has grown at a substantial rate. As more users take advantage of the services available on the Internet by registering on websites, posting comments and information electronically, or simply interacting with companies that post information about others (such as online newspapers), more and more information about the users is available. There is also a substantial amount of information available in publicly and privately available databases, such as LEXISNEXIS. Sending a query to the one or more of the above resources, using the name of a person or entity and other identifying information, may return highly dimensional data sets that occupy large amounts of memory. The large data sets may consume excessive system resources to process or may even be large enough that it is not feasible to contain the data set in virtual memory.
Additionally, there can be many “false positives” in the returned data set, because of the existence of other people or entities with the same name. False positives are search results that satisfy the query terms, but do not relate to the intended person or entity. The desired search results can also be buried or obfuscated by the abundance of false positives. Also, the desired search results may actually be shaped by the order of the resources searched.
To reduce the number of false positives, one may add additional search terms from known or learned biographical, geographical, and personal terms for the particular person or other entities. This can reduce the number of false positives received, but many relevant documents may be excluded.
Finally, some of the queried information resources may include unstructured data. Unstructured data typically does not have a pre-defined data model, and may not fit well into relational tables. Typically, unstructured data is text heavy, but may also contain dates, numbers, and other facts. This composition may be difficult to search using traditional computer programs, versus data tagged and stored in databases.