Any discussion of the prior art throughout the specification should in no way be considered as an admission that such prior art is widely known or forms part of common general knowledge in the field.
In database systems, data is typically stored as rows. These rows collectively define one or more data tables.
In many parallel database systems, such as those making use of MPP architecture, each row is assigned to a respective access module. This access module—commonly known as an Access Module Processor (AMP) or unit of parallelism—accesses the row or rows assigned to it. In particular it reads from, writes to, or searches within the row or rows.
Typically, the rows of a given table are hashed across a plurality of available access modules in accordance with a predefined hashing protocol. This is designed to provide efficient access to the data. That is, where a query requires searching of the rows of a data table, the plurality of access modules each search in parallel the row or rows respectively assigned to them. The net result is that a plurality of the rows are searched in parallel, which reduces processing time.
Such systems are prone to scalability problems insofar as small tables or single row operations are concerned.
In the case of small tables, the distribution of rows across access modules is often skewed. For example: where a table has a number of rows that is less than the number of available access modules, this implies that some access modules have no rows assigned to them. Others perhaps have only one or few rows assigned to them. The cost of coordinating and processing in all of the access modules far exceeds the actual access operation involved in accessing the small table. This means overheads for operations on these types of tables do not scale.
Single row operations include the likes of an ATM transaction fetching the balance for an account. Such an operation incurs a coordination overhead when processed in a shared nothing MPP system. This overhead includes processing in multiple access modules, messaging between the access modules, lack of batching effect on multi-row transactions, and coordinating the results across the access modules.