The use of payment cards for a broad spectrum of cashless transactions has become ubiquitous in the current economy, accounting for hundreds of billions of dollars in transactions per year. For example, MasterCard International Incorporated, one example of a payment card network operator, processes millions of transactions per hour across roughly 230 countries. Aspects involved with the use of payment cards typically include the authentication of the payor/consumer using the payment card, as well as the authorization of the transaction based upon the availability of monies in the payor's/consumer's bank account.
During this cashless transaction process, a large amount of transaction data is generated and collected, often rapidly. In a traditional relational database, such transaction data is typically stored in a normalized fashion, and may be retrieved using index keys for efficiency. More recent databases have started adapting from this traditional approach, for example, with the ability to store and execute on unstructured data and without the use of index keys. One example of such a system is a Hadoop® network. Hadoop networks are typically designed to provide parallel computing for “big data” in applications such as social media and the like, in which the data grows exponentially and tends to be difficult to timely (i.e., rapidly) collect in a structured manner.
While the parallel, speedy, and scalable nature of Hadoop networks can be useful for aggregating and organizing large data sets in real-time when structure is not critical, these types of networks often lack sufficient structure to be optimized for real-time and directed analytic applications. Therefore, once data is collected in Hadoop-type networks, real-time access and retrieval of the data can become difficult and awkward. In particular, existing Hadoop-like networks do not allow for efficient searching or parsing of collected data, and sequential searches and subsequent extracting or retrieval of data stored using these networks is often slow and is typically limited to smaller, less robust datasets.