The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Computer systems have finite limits in terms of both storage capacity and processing capacity. When either or both of these capacities are reached, performance of the computer system suffers. To prevent or mitigate loss of performance, additional computing hardware may be added to increase the processing and/or storage capacities. This process is called scaling, and different types of workloads present different scaling challenges.
One approach to scaling is to parallelize computing processes among multiple computer systems, which then interact via a message passing interface (MPI). The MPI may allow parallel computing systems to coordinate processing to avoid conflicts between changes made by one system and changes made by another system. MPI has been implemented in a number of languages, including C, C++, and Fortran. The separate computing systems may be in separate physical enclosures and/or may be multiple processors within a single computer chassis or even multiple cores within a single processor. MPI may allow for high performance on massively parallel shared-memory machines and on clusters of heterogeneous distributed memory computers.
Another scaling approach uses distributed storage for structured query language (SQL) databases. However, transactional operations in distributed SQL databases are generally slowed because of the need to keep separate computers synchronized. Even when fast networks, such as INFINIBAND® protocol communication networks, are used, synchronization may impose limits on performance and scalability. Further, an additional limitation of the SQL database approach is that often data processing is not executed on the SQL database server but on another server. This increases latency because of the transportation of data from the SQL server to the computing server and back again to the SQL server.
Parallel processing is beneficial for large data sets, portions of which can be spread across different nodes and processed independently. However, transactional processing, where some or all transactions may depend on one or more previous transactions, is not as easily parallelized. For example, computers may synchronize access to a portion of data by locking the portion of data before processing of the transaction begins and unlocking the data upon successful completion of the transaction. While the data is locked, other computers cannot change the data, and in some instances cannot even read the data. As a result of the locking/unlocking process, there may be significant latency as well as significant variations in latency.