Software applications are now being used to analyze large amounts of data in real time or near real time. It is desirable that such applications be extremely efficient and capable of providing analyses on very large relational databases having information stored in tables in a distributed manner on multiple nodes. A node may be a server, virtual server, or other type of computer system, and multiple nodes may be connected via a computing bus, a local area network (LAN), a wide area network (WAN), a storage area network (SAN), the Internet, or the like. These nodes may reside in the same location, or they may be stored in remote locations. When an application executes a transaction on the database, it may need to access multiple database tables. The multiple tables needed to execute the transaction may reside on a single node of the network, or the tables may reside on multiple nodes.
Performance of database-dependent software applications can be improved using various techniques. For instance, the technique known as “sharding” is one way to improve performance of a database-dependent application. Under the sharding technique, large database tables are partitioned based on some logic, and the partitions are stored on separate hardware. However, the sharding technique only provides significant performance gains when all of the data required to perform a transaction is present on the same node of a multi-node distributed database system.