Data warehouses are computer-based databases designed to store records and respond to queries generally from multiple sources. The records correspond with entities, such as individuals, organizations and property. Each record contains identifiers of the entity, such as for example, a name, address or account information for an individual.
Unfortunately, the effectiveness of current data warehouse systems is diminished because of certain limitations that create, perpetuate and/or increase certain data quality, integrity and performance issues. Such limitations also increase the risk, cost and time required to implement, correct and maintain such systems.
The issues and limitations include, without limitation, the following: (a) challenges associated with differing or conflicting formats emanating from the various sources of data, (b) incomplete data based upon missing information upon receipt, (c) multiple records entered that reflect the same entity based upon (often minor) discrepancies or misspellings, (d) insufficient capability to identify whether multiple records are reflecting the same entity and/or whether there is some relationship between multiple records, (e) lost data when two records determined to reflect the same entity are merged or one record is discarded, (f) insufficient capability to later separate records when merged records are later determined to reflect two separate entities, (g) insufficient capability to issue alerts based upon user-defined alert rules in real-time, (h) inadequate results from queries that utilize different algorithms or conversion processes than the algorithms or conversion processes used to process received data, and (i) inability to maintain a persistent query in accordance with a pre-determined criteria, such as for a certain period of time.
For example, when the identifiers of an individual are received and stored in a database: (a) the records from one source may be available in a comma delimited format while the records of another source may be received in another data format; (b) data from various records may be missing, such as a telephone number, an address or some other identifying information; or (c) two records reflecting the same individual may be unknowingly received because one record corresponds to a current name and another record corresponds to a maiden name. In the latter situation, the system may determine that the two records ought to be merged or that one record (perhaps emanating from a less reliable source) be discarded. However, in the merging process, current systems typically abandon data, which negates the ability to later separate the two records if the records are determined to reflect two separate entities.
Additionally, when the identifiers are received and stored in a database, the computer may perform transformation and enhancement processes prior to loading the data into the database. However, the query tools of current systems use few, if any, of the transformation and enhancement processes used to receive and process the received data, causing any results of such queries to be inconsistent, and therefore inadequate, insufficient and potentially false.
Similarly, current data warehousing systems do not have the necessary tools to fully identify the relationship between entities, or determine whether or not such entities reflect the same entity in real-time. For example, one individual may have the same address of a second individual and the second individual may have the same telephone number of a third individual. In such circumstances, it would be beneficial to determine the likelihood that the first individual had some relationship with the third individual, especially in real-time.
Furthermore current data warehousing systems have limited ability to identify inappropriate or conflicting relations between entities and provide alerts in real-time based upon user-defined alert rules. Such limited ability is based upon several factors, including, without limitation, the inability to efficiently identify relationships as indicated above.
Furthermore, current data warehousing systems cannot first transform and enhance a record and then maintain a persistent query over a predetermined period. A persistent query would be beneficial in various circumstances, including, without limitation, in cases where the name of a person is identified in a criminal investigation. A query to identify any matches corresponding with the person may initially turn up with no results and the queried data in current systems is essentially discarded. However, it would be beneficial to load the query in the same way as received data wherein the queried data may be used to match against other received data or queries and provide a better basis for results.
As such, any or all the issues and limitations (whether identified herein or not) of current data warehouse systems diminishes accuracy, reliability and timeliness of the data warehouse and dramatically impedes performance. Indeed, the utilization with such issues may cause inadequate results and incorrect decisions based upon such results.
The present invention is provided to address these and other issues.