1. Technical Field of the Invention
The present invention relates in general to the computer systems field and, in particular, to a method for increasing the efficiency of a plurality of computer processors having shared memory in a telecommunication system.
2. Description of Related Art
The existing start-up procedures used to launch multiple processors simultaneously in a multiprocessing system are typically complicated and problematic. For example, during start-up of a conventional multiprocessing system, each processor initially reads the software code located in the first memory address accessed. In a multiprocessing system with distributed memory, a processor""s software code is located in a physical memory location associated with that processor. In other words, each processor""s code can reside in a different address space. Consequently, at start-up, each processor has no problem with accessing its own code.
However, in a multiprocessing system with shared memory, all of the processors"" software code can share the same address space or a range of multiple address spaces located in a common physical memory. Consequently, an important design goal of existing multiprocessing systems has been to enable start-up of all processors so that each processor can initially access its correct software code. A problem for multiprocessing system designers is that this design goal has been difficult and costly to achieve.
Another problem with existing multiprocessing systems is that the distribution of interrupts between processors can be uneven, which reduces a system""s efficiency. For example, using one method called static interrupt distribution, each processor handles a unique interrupt or set of interrupts. One approach is to hardwire different segments of an interrupt vector to specific processors for handling. Consequently, using this method, the overall distribution of interrupts always remains the same.
The static interrupt distribution method has been implemented in multiprocessing systems in a number of ways. For example, an existing approach is to use one processor to handle all of the interrupts in the system, while the other processors are used to execute just the non-interrupt software code. An advantage of this method is that new interrupts are always distributed to a known processor. Consequently, the system can be designed with less complexity, because there is no need to account for different processors handling different interrupts. Furthermore, only one interrupt controller is needed in such a system.
Nevertheless, there is a significant disadvantage of such a static interrupt distribution approach using one processor in a multiprocessing system. The statistical distribution of the interrupts can make this one processor very busy at certain times and not so busy at others. In that regard, a more level processing load is preferred. One solution to this problem is to divide the interrupts evenly between processors. For example, as illustrated by the diagram shown in FIG. 1A, one processor (P0) can be given the xe2x80x9chighestxe2x80x9d segment of the interrupt vector to handle, a second processor (P1) can be given the next xe2x80x9chighestxe2x80x9d segment of the interrupt vector to handle, and so on to the lowest segment. However, with this approach, the interrupts are not evenly distributed, because as illustrated by FIG. 1A, an interrupt can occur while its dedicated processor is not available, and there is no provision to execute that interrupt by another processor in the meantime.
An existing static interrupt distribution approach distributes the interrupts evenly in accordance with different interrupt priorities. As illustrated by the diagram shown in FIG. 1B, this interrupt priority distribution approach appears to produce a relatively even interrupt workload. However, as described above, a static interrupt distribution approach requires the development of very reliable distribution statistics, such as, for example, the amount of time it takes to handle individual interrupts and how often they are executed.
In any event, the existing static interrupt distribution approaches are relatively simple to implement, but their most significant disadvantage is that the designs are inflexible. For example, using an existing static interrupt distribution approach, if a processor""s workload has changed over time for some reason (e.g., software evolution, etc.), then the interrupt hardware needs to be redesigned. However, each new design requires a study about the interrupt distribution statistics involved, which can be a time-consuming, inconvenient and complicated undertaking. Furthermore, the existing static interrupt distribution approaches introduce undesirable interrupt latencies into the systems involved. Such latency characteristics are very difficult to deal with, especially if the multiprocessing system is intended to run a realtime operating system. Consequently, as demonstrated above, a need exists for an improved method for distributing interrupts in a multiprocessing system.
A more flexible hardware interrupt distribution approach used for existing multiprocessing systems is called dynamic interrupt distribution. Using this dynamic approach, the interrupt distribution can be changed while the system is in operation. An obvious advantage of this approach is that there is no need to develop interrupt distribution statistics, because the hardware handles the interrupt distribution in realtime. Consequently, if the interrupt distribution statistics change overtime for some reason (e.g., new software development, etc.), there is no need to change the interrupt hardware or develop new distribution statistics in order to implement such a dynamic approach.
Theoretically, the use of a dynamic hardware interrupt distribution approach for a multiprocessing system is a viable alternative to the use of a static approach. Nevertheless, the existing dynamic hardware interrupt distribution approaches have significant disadvantages. For example, the hardware design for a dynamic interrupt distribution method is relatively complex, and the method itself is difficult to implement. The algorithm that controls the interrupt distribution has to be an extremely xe2x80x9csmartxe2x80x9d algorithm. In other words, the control logic for such an algorithm must be capable of determining which (if any) interrupts are currently being executed by each processor, and which interrupts have been queued by each processor for execution at a later time. Based on such information, an interrupt controller has to make relatively difficult decisions about where to send each new interrupt. With existing hardware interrupt distribution designs, the control units"" integrated circuits require a very large number of gates and take up a large amount of silicon space as a result. Consequently, this design solution is relatively costly in terms of power consumption.
In any event, most computer systems function with a certain amount of interrupt dependency. In other words, certain interrupts must be processed in a specific order, at a specific time or specific number of times, or associated with specific memory that can be locked by a semaphore (or other hardware or software resources of any kind). This interrupt dependency complicates matters even more for static interrupt distribution approaches in which the distribution is not changed easily due to software development.
Still another problem with existing multiprocessing systems using shared resources (e.g., memory, Input/Output (I/O) areas, synchronization blocks, etc.) is that bus arbitration is used to distribute the shared resources to the different processors. As such, with existing bus arbitration methods, only one processor can use a bus at one time. However, the processors can still execute non-interrupt software code or interrupt code with different priorities. The higher priority interrupts are typically processed before the lower priority interrupts and the non-interrupt software code.
A significant problem arises with existing bus arbitration methods if the arbitration process is performed using a conventional round robin or similar scheme. For example, using a typical round robin scheme, each process is given a predetermined amount of time for execution and then swapped out. A circular First-In-First-Out (FIFO) ready queue is typically used. Using such a method, the arbitration procedure treats the processors fairly if they are not processing interrupts. If the processors are processing interrupts, the processor""s priorities are not maintained because of the different interrupt priorities.
Yet another problem that arises with existing multiprocessing systems relates to the use of atomic hardware synchronization primitives. For example, when several processes are being executed by a single processor, or a typical multiprocessor design is being used, so-called atomic primitives are used as code for hardware synchronization purposes. As such, for multiple process computers or multiprocessing systems, a method for mutually excluding the different processes or processors is required. Some existing systems have implemented this exclusion method as instructions in a typical instruction set. Other existing systems have implemented this exclusion method with hardware semaphores (e.g., Atomic Exchange, TestandSet and FetchandIncrement instructions). In any event, a problem with some existing multiple process or multiprocessing systems is that not all processors are designed for mutual exclusion, and consequently, there is no relatively simple way for them to perform atomic operations. Nevertheless, as described in detail below, the present invention successfully resolves the above-described problems and other related problems.
In accordance with a preferred embodiment of the present invention, a method for simultaneous start-up of a plurality of processors in a multiprocessing system is provided whereby a special hardware register (referred to as a xe2x80x9cWhoAmI registerxe2x80x9d) can be shared by the plurality of different processors. Alternatively, a separate WhoAmI register can be provided for one or more of the different processors. When a processor performs a read operation on a WhoAmI register, the register returns an identification number associated with that processor. Consequently, this processor can perform a set of test and jump instructions to access and execute the appropriate code for this processor.
In accordance with a second embodiment of the present invention, a method for distributing interrupts between a plurality of processors in a multiprocessing system is provided, whereby each processor can access a complete interrupt vector. The interrupt vector is masked, and a different mask is provided for each processor (e.g., using special mask registers). Consequently, all of the interrupts used can be coupled to and handled by all of the processors.
In accordance with a third embodiment of the present invention, a method for bus arbitration in a multiprocessing system is provided, whereby the arbitration procedure is based on the priority of the interrupt currently being executed by each processor. A processor that executes an interrupt having the highest priority is granted all of the bus operations that processor needs in order to run at full speed. If that processor is not using the bus at a particular time, then another processor may be allowed to use the bus. However, if two or more processors attempt to execute interrupts having the same priority, or all of the processors attempt to execute non-interrupt software code, then a round-robin scheme can be used to control the bus arbitration. In this way, the system priorities for the executed code can be preserved, and the overall performance of the system will be improved.
In accordance with a fourth embodiment of the present invention, a method for synchronizing a plurality of processors in a multiprocessing system is provided, whereby atomic hardware semaphores can be implemented for synchronization purposes using processor I/O bus (operations) or coprocessor bus (operations) For this embodiment, sequences of atomic instructions can be created for execution within a single processor clock cycle using processor I/O operations and, for example, a serialization or arbitration unit. In order for a processor to take a hardware semaphore, the processor performs an I/O-Read instruction to access the location of the desired semaphore in a semaphore register. A serialization or arbitration unit associated with the semaphore register responds to that Read operation with information about whether or not that semaphore is xe2x80x9clockedxe2x80x9d and has already been taken by another processor. If the semaphore has already been taken by another processor, then the response to that Read operation reports that the requesting processor has failed to take the semaphore. In this way, a processor""s I/O-Read instruction is used to read and write from/to a semaphore register within one clock cycle, which creates an atomic instruction that can be used for synchronization purposes.
An important technical advantage of the present invention is that a method for simultaneous start-up of a plurality of processors in a multiprocessing system is provided that enables efficient use of shared memory in relatively small on-chip multiprocessors.
Another important technical advantage of the present invention is that a relatively simple and flexible method is provided which can be used for efficiently distributing interrupts (statically or dynamically) between a plurality of processors in a multiprocessing system.
Still another important technical advantage of the present invention is that a method for bus arbitration in a multiprocessing system is provided which preserves system priorities for executed code, minimizes arbitration contention problems, and improves overall system performance.
Yet another important technical advantage of the present invention is that a relatively simple method for creating atomic operations for a plurality of processors in a multiprocessing system is provided, which can be used for synchronization purposes.