As software moves towards a model of “data-available-anywhere-anytime”, the burden of storing and processing of the data moves to information servers. Fast storage and retrieval of the data becomes essential for these services to scale and host multitudes of clients using these services. The traditional file system would have sufficed for many cases. However, with sensitive data such as financial records, the data may require encryption and storage. In cases of a distribution center or a data warehouse, data may be compressed in order to conserve bandwidth before transmission to a storage device. As new proprietary formats are developed, different types of data transformers (that provide different transformations of data such as encryption and compression) may be required.
With the prior art, a data system typically loads the complete stream in memory, performs the transformation and persists it to some storage. Although this solution has some appeal because of its simplicity, it does not scale, (i.e. an application does not expand in a continuous fashion and the application's performance may not keep up (linearly) with the load), in a data-warehouse environment. The problem is compounded when multiple data transformations are required. As an example to illustrate the problem, assume that the size of a payload is 1 MB (Megabyte). The complete payload is stored into memory. In the example, assume that two data transformations (e.g. data inflation and data encryption) are required. The complete payload is retrieved from memory, inflated, and stored. Because the payload is inflated, assume that 10 MB of additional memory is required to store the inflated payload. The entire inflated payload is retrieved from memory, encrypted, and stored. Assuming that the payload is not further inflated by the encryption transformation, an additional 10 MB of memory is required. Thus, the total memory for processing one 1 MB payload is 21 MB.
The memory demands are exacerbated if a typical payload is larger and if more data transformations are required to process the payload. In a financial data system, a typical payload may be 20 MB. In the example above, the increased size of the payload corresponds to a total memory demand of 420 MB for each payload. In such a case, with 2 GB of memory, a financial data system may support only four payloads at one given time. If the number of payloads in a unit of time corresponds to more memory than can be supported by the data system, the processing of payload may need to be throttled. Moreover, the number of payloads that need to be processed by the data system may vary appreciably, particularly during the end of a financial period. Capacity planning is thus compounded with larger payloads.
The approach of prior art, as described heretofore, increases demands on the memory resources of a data system as the size of payload and the number of payloads increase. When the limits of available memory are reached, the operator may need to upgrade the memory resources. Moreover, if the payload traffic is associated with a large degree of variability, capacity planning for the data system becomes more difficult. Thus, it would be an advancement in the art to make the required amount of memory less dependent upon the size of the payload, the number of payloads, and the number of data transformations that are applied to each payload.