Those of skill in the art can appreciate that in the modern world of data processing, there are two main types of processing analytics performed, namely transactional processing, which is speedy but not easily accessible or useful for long-term use and predictive modeling, and batch processing, which affords the latter, but is not made for real-time interaction. Innovators are often left with a trade-off between these two processing analytics, as there has been a need for meaningful integration of their respective features and functions in order to derive more meaningful, accessible information from the data. Additional needs in the technology include universal and easy ingestion of input data regardless of the source, as well as easy and useful accessibility of output.
Consequently, a data processing computer architecture was developed that was called Lambda processing; a Lambda processing architecture combined both high speed transactional processing with batch layer processing. In a traditional lambda architecture, data is received by both a batch and speed layer (layers, in this context, refers to a set of processing modules that are inter-related), and a master set of the received data is permanently stored. In the batch layer, query functions are responded to, resulting in batch views that are stored in the serving layer. These batch views can be constantly re-computed. In the serving layer, the batch views are indexed and stored in a scalable database such that they can be retrieved relatively quickly. New batch views are swapped in as they are generated. It is generally appreciated that the throughput of the batch layer is generally in view of a “long term” approach; consequently, the speed layer compensates for the high latency of the updates for the batch views. Fast incremental algorithms were developed to read/write databases to product substantially real time views. The real time views were then also indexed and cached such that customer or external application inquiries could result in values being obtained relatively quickly. In addition, queries from the customers, or external applications, can be obtained from both the batch and real time views (of the speed layer).
As those of skill in the art can appreciate, however, problems developed with certain applications and Lambda processing. For example, in some cases, the long term batch layer eventual consistency approach led to havoc with real-time command and controls applications. In some cases, data, or devices represented by data, disappeared while being regenerated by the batch layer. In addition, in some applications, very large data costs were incurred via continuous repeat batch operations. By way of still another example, some clients used Amazon Web Services (AWS), which is a collection of remote computing services, also called web services, that make up a cloud computing platform by Amazon.com. The most central and well-known of these services are Amazon EC2 and Amazon S3. AWS provides a large computing capacity (potentially many servers) much faster and cheaper than building a physical server farm. However, if large “batch-level” quantities of data are moved in and out of, or through AWS, the costs can add up quickly.
Thus, there is a need for use of the major components of a Lambda architecture in such a manner that maintains the data analysis benefits, but without sacrificing the integrity of real time application processing.