1. Field of Invention
This invention relates to digital integrated circuit design, and more particularly, to techniques for improved efficiency of operation of multi-processor digital integrated circuits.
2. Description of Related Art
Progress in semiconductor manufacturing has both improved the performance of transistors and other semiconductor devices and brought about a tremendous reduction in their size. The small size of semiconductor devices is advantageous because it allows large numbers of transistors to be combined in highly complex integrated circuits, such as microprocessors. For example, the first microprocessor, Intel 4004, introduced in 1971, combined approximately 2,000 transistors on a single chip. In comparison, the Pentium III®, a typical modem high performance microprocessor, contains over 28 million transistors. With the capability for very high levels of integration, it has become possible to create systems on a chip, in which entire functional modules (“subsystems”) are combined on a single semiconductor substrate. Each of these subsystems may contain a processor with its own dedicated memory resources and peripheral devices.
The subsystems in multi-processor integrated circuits often operate synergistically, sharing buses and/or memory resources for enhanced performance. This can be especially useful in applications involving multiple tasks that can be performed in parallel. Such applications are common enough in the fields of signal processing and communications that semiconductor manufacturers have introduced special purpose signal processing integrated circuits containing four or more processors.
Because of the high level of complexity in a multi-processor integrated circuit, the coordination of its various system components can prove difficult. Among the system management functions that must be dealt with in a multi-processor integrated circuit are the initialization of program and data memory and the configuration of on-chip peripheral devices.
It is common in a multi-processor integrated circuit for each subsystem to have a processor with its own memory resources and peripheral devices, such as timers or serial ports. In one type of multi-processor integrated circuit, the internal processors are digital signal processors (DSPs), which use separate memories for data and for instructions. DSPs are special purpose processors, optimized for numerical calculations and array manipulation, as commonly encountered in signal processing applications. DSPs typically operate at very high speeds, and may perform multiple operations in a single clock cycle. The data memory for a DSP may contain filter coefficients for a digital filter, while the instruction memory contains the actual program executed by the DSP. Depending on the type of memory devices used, the data or instructions may be retained permanently (non-volatile memory) or only while power is applied to the integrated circuit (volatile memory). Non-volatile memory can be used for storing data or program instructions that never change—such as the coefficients for an industrial process controller. If the program code for a processor is placed in non-volatile memory, the processor can begin execution immediately as soon as it receives power, without waiting for the program code to be loaded into memory. There are limitations to this approach however. Non-volatile memory is a poor choice if the program or data must regularly be modified, since write speeds are generally slower for non-volatile memory devices such as flash or EEPROM, than for conventional memory. Furthermore, non-volatile memory is expensive and may occupy too much of the available area on the semiconductor substrate.
Therefore, volatile memory is most commonly used in multi-processor integrated circuits. Since volatile memory does not retain its contents in the absence of power, when the multi-processor integrated circuit is first powered-up its memory is empty and must be loaded with instructions and/or data before the multiple internal processors are allowed to begin execution. To load the memory, it must be active—i.e., power must be applied to the memory and its address and data lines must be operational. However, since an internal processor and its memory share the same substrate, they both become active when power is first applied to the integrated circuit. It is therefore necessary to prevent the processor from executing until its program has been loaded into memory—otherwise, the internal processor will retrieve only random data from the memory, rather than meaningful instructions. This is typically accomplished by holding the internal processor in reset mode while the memory is loaded, then allowing it to begin execution (by releasing the reset) once the instructions are in place. A prolonged reset permits both the internal processor and its memory to be in an active state, but the internal processor is idle until valid program instructions are available. Once the host processor has initialized the memory for each of the subsystems in a multi-processor integrated circuit, it removes the reset condition, allowing the internal processors to simultaneously begin operation.
An interface may exist to allow access to memory resources of the processors. The interface operates as a port to the integrated circuit upon which the processors reside. A limitation of conventional interfaces is that a subsystem external to the integrated circuit can access the memory resources of only one processor at a time. Consequently, the loading of instructions and data for one processor must be completed before it can be performed for another processor, even when the instructions and data are the same for all the processing subsystems. Depending on the speed of the interface and the amount of data to be loaded, this can represent a substantial startup time for the multi-processor integrated circuit.
In view of this limitation, it would be desirable to simultaneously load instructions or data into the memory of several processing subsystems in a multi-processor integrated circuit. The memory loading should preferably be accomplished under the control of a host (i.e., a manager) processor via an enhanced interface. By loading all the memories at once, startup time could potentially be reduced by a factor equivalent to the number of subsystems—subject to the assumption that the instructions and/or data are the same for all of the subsystems. In many cases this assumption is justified, since multi-processor integrated circuits are often used for signal processing applications consisting of identical tasks that can be performed in parallel.