1. Field of the Invention
The present invention relates to a processing system and in particular to a processing system having means for storing a plurality of items defining a queue.
2. Related Art
Queues are fundamental data structures and are known. For example, in a multi processor system, there will be a plurality of computer processing units (CPU) each of which may be executing a different stream of instructions simultaneously. Each stream is generally referred to as a thread. Each thread may wish to perform a queue operation on a queue data structure which resides in the memory of the system. When a plurality of threads attempt to access the system memory at the(same time, an arbiter will choose the order in which the requests are forwarded to the system memory. Generally, the arbiter controls the access in a manner which cannot be predicted in advance by any of the threads.
Multi threaded processors are also known. In a multi threaded processor, a CPU will execute several threads and, in particular, switches rapidly between the threads so that each thread appears to be executed concurrently with the other threads. Again, each of the threads cannot determine or predict the order of their execution with respect to the other threads. Accordingly, a multi threaded processor has to deal with the same problems relating to concurrent access as occur in the case where there are a number of processors.
A parallel queue is one which has a plurality of readers and/or a plurality of writers. It is often provided in operating systems where it facilitates resource sharing, Known implementations of parallel queues normally use a mutex or a spin lock to sequence access to the queue and ensure that it remains in well defined states. A mutex (mutual exclusion) is a software device which allows a processor exclusive access to a resource. Mutexes normally take the form of two procedures: get exclusive access and release exclusive access. The “get” routine returns true if it is successful in gaining access. Otherwise false is returned. False normally leads to “get” being retried at a later time. True enables the processor to use the resource until the processor opts to give up exclusive access with a release.
Spin locks are related to mutexes and use a lock bit to determine if the resource is free or not. A process will “spin” (i.e. repeatedly read a lock bit) until the lock bit indicates that it can acquire the lock. The disadvantage of using these two processed is that they do not allow for parallel operation very easily. Given multiple processors, these processes will allow only one, (or a limited number of) processors to access a resource.
Current implementations of parallel queues normally sequentialise access so that multi accesses to the system memory are not permitted. For large multi processor machines, this can be disadvantageous in that a bottle neck is formed. This makes the use of the known queues inefficient. A further disadvantage is that the performance achieved does not scale with the size of the machine. For example, with very large machines, this inefficiency may limit the performance for the entire machine.
It has been proposed to provide CPU architectures which have been designed to support multi processing and which use an atomic read-modify-write operation to implement data structures which permit scalable concurrent use. To implement this, algorithms are provided which use only use the more complex read-modify-write memory operations to implement the queue. These can include complex variations on the fetch-and-add primitives.
Fetch and acid primitives are a class of atomic read modify write operations which are popularly implemented on multiprocessor machines. The simplest is fetch and add one which atomically reads a memory word and increments that word by one. If, for example, the word at some address A contains the number 5, then the instruction fetch and add one would fetch the value 5 to the CPU register while leaving the value in A as 6. Furthermore it would not be possible for ally other device which had access to the memory to access A between the value 5 being returned but before it had been incremented to 6. This is the sense in which it is atomic.
Such primitives are costly to implement as they require arithmetic capability at the memory interface where there would otherwise be none.
The known arrangements use fetch and add primitives to implement queues because commonly multiprocessors provide such facilities which lead to very simple software. Thus, software simplicity is gained at the cost of hardware complexity and hence machine cost.