Memory in a data processing system is organized as an array of bytes of 8 bits in length and is allocated as an integral number of words, where a word is a unit of memory of a particular size that the data processing system manipulates in one go. For example in a 32-bit data processing system, the size of a word is 32 bits. Thus memory allocated in a 32-bit data processing system has an integral number of 32-bit words, with each word corresponding to a set of four 8-bit bytes. When allocated, memory is referenced using a memory address, which is a pointer, stored at a specific location in the memory of the data processing system, to the first byte of a first word in the allocated memory.
Memory update operations are preferably atomic. An atomic update operation is a sequence of operations or instructions that are non-interruptible to update a memory location. In other words, when a first computer process starts to update a data value there must be no chance that a second process can read the memory location holding the data value between the point at which the first process reads the memory location and the point at which the first process writes the updated value back to the memory location. If such a read operation by the second process occurred, then the first and second processes would each have the same value, and this may lead to an error or system crash. Atomically updating a shared memory location prevents multiple computer entities or processes from performing the same operation and/or destroying work done by another computer entity or process.
To achieve high performance on multiprocessor systems, many applications make use of multi-threaded or multi-process code, communicating through storage. Memory models known as ‘weakly consistent’ have been developed. These allow the parallelization of storage operations among the internal processing units, the cache, and main storage, in a number of ‘pipelines’. One result is increased speed and performance due to optimised throughput. However, another result is that the processor might perform memory accesses in a different order than the load and store instructions in the program.
One method for atomically updating a shared memory location and for dealing with weak consistency is to acquire locks on the memory location in order to limit modification of the shared memory location to the lock holder. Although this method ensures atomic updating of the memory location, performance is reduced both due to the overhead involved in the acquisition and release of locks on the memory location and the fact that other processes must wait to update the memory location until the lock on the memory location is released. In the case of a weakly consistent system with multiple pipelines running on a processor, the processor must empty its pipelines and make sure that memory accesses are completed for all program instructions which are ordered before the special locking instruction in the program sequence. This significantly slows down the system and therefore there is a need for ways to reduce the number of locks that must be acquired.
A method of atomically updating a memory location that achieves optimal performance uses low-level instructions to perform the update. Machine code instructions, such as compare-and-swap, fetch-and-add, and test-and-set, work on one word at a time and are carried out atomically.
One way the speed of data updates can be increased is by embedding metadata or flags relating to a piece of data, often called a ‘data structure’, into its memory address. A common technique is to embed flags into the pointer (memory address) making use of the boundary alignment of the data processing system. Word alignment depends on the word size of the data processing system. For example, if a first memory word is referenced as being at address 0 (0 in binary), in a 32-bit data processing system the start of a second memory word will be displaced four bytes from the start of the first memory word and can be referenced as being at address 4 (100 in binary). Subsequent memory words in the 32-bit system will be spaced at four byte intervals and can be referenced as having addresses 8 (1000 in binary), 12 (1100 in binary) and so on. It can be noted that there is a zero in the two least significant bits of each address. These bits can be used to store additional information relating to the allocated memory. However, the number of bits available for the storage of such additional information is limited and is dependent on the word size of a particular data processing system. For example, in a 32-bit data processing system a word is a set of four bytes and memory addresses have a zero in the two least significant bits. In a 64-bit data processing system a word is a set of eight bytes and memory addresses have a zero in the three least significant bits. Thus, computer software written for a 64-bit data processing system and utilizing the three least significant bits of a memory address to store additional information may not be portable to a 32-bit data processing system where only the least two significant bits of memory are available for such use.
Thus computer software written using word or boundary alignment may not be portable to different data processing systems and this may lead to errors. Often this is overcome by only using the last two bits of a memory address for storing additional information such as flags. However, two bits are often not enough to embed the desired additional data. Moreover, as using machine instructions to atomically update a value is only possible when the value on which they are working is a single word, if the value is contained in two or more words, then a locking mechanism must be used.
Therefore, there exists a need for an improved method which addresses the above-mentioned problems.