When a data item is stored in a single database or data store that is accessible over a network, it is often the case that multiple servers or clients will require access to that data item. Traditionally, this requires a hit to the database each time the data item is accessed. Each hit to a database is relatively resource intensive and relatively inefficient.
One way of overcoming some of the efficiency and scalability problems is to store a local copy of the data item in cache memory. A server or client can then use that local copy if future access to the data item is needed. This process may be appropriate and efficient for data items that never change, but problems can arise when a data item is updated in the database.
If a data item in the database is updated, a copy of that data item stored in a local cache on the network will be different than the item in the database, as the cache will not automatically receive the update. The problem intensifies when there are local copies on multiple servers and/or clients on the network. Since each of these local copies is created at a different time, there can be multiple versions of the data item on the network. If a user tries to update or view the data item, the copy accessed by the user may not be current and correct.
Such a problem with data latency can cause serious problems for applications that require near real-time accuracy, such as web sites that offer “real time” stock prices. Such an application might utilize a database table having at least two columns, one column containing stock symbols, which can be used as primary keys for the table, and one column containing the current price of each stock. In such an application, most of the activity involves users accessing the site and reading the current stock values. There is typically also activity involving back-end applications or systems that come in periodically, such as once every minute, with updated stock prices. These back-end systems need read/write access to the database in order to update the data.
Most access to the system will be read only. For these read-only users, the system can cache data to provide faster access. The system can update the cached information periodically, such as every fifteen minutes. In such a “read-mostly” situation, however, it may be preferable to give a user the most recent data. A fifteen minute delay in providing accurate information may be undesirable for many applications. It is typically desirable to give users information that is as accurate as possible.
One way to ensure that users get accurate information, or at least information that is current with data stored in the database, is to pull the information from the database for each request instead of reading a cached copy. This can be very expensive for many applications, as a hit to a database is much more time and resource intensive than reading a value from memory.
For people updating the data in the database, it may be desirable to wrap as many updates as possible into a batch transaction in order to improve performance. Wrapping updates into a single transaction also ensures that either all the updates occur or none of the updates occur. Problems arise, however, in how to update cached copies for each item updated in a transaction.