Multi-streaming processors capable of processing multiple threads are known in the art, and have been the subject of considerable research and development. The present invention takes notice of the prior work in this field, and builds upon that work, bringing new and non-obvious improvements in apparatus and methods to the art. The inventors have provided with this patent application an Information Disclosure Statement listing a number of published papers in the technical field of multi-streaming processors, which together provide additional background and context for the several aspects of the present invention disclosed herein.
For purposes of definition, this specification regards a stream in reference to a processing system as a hardware capability of the processor for supporting and processing an instruction thread. A thread is the actual software running within a stream. For example, a multi-streaming processor implemented as a CPU for operating a desktop computer may simultaneously process threads from two or more applications, such as a word processing program and an object-oriented drawing program. As another example, a multi-streaming-capable processor may operate a machine without regular human direction, such as a router in a packet switched network. In a router, for example, there may be one or more threads for processing and forwarding data packets on the network, another for quality-of-service (QoS) negotiation with other routers and servers connected to the network and another for maintaining routing tables and the like. The maximum capability of any multi-streaming processor to process multiple concurrent threads remains fixed at the number of hardware streams the processor supports.
A multi-streaming processor operating a single thread runs as a single-stream processor with unused streams idle. For purposes of discussion, a stream is considered an active stream at all times the stream supports a thread, and otherwise inactive. As in various related cases listed under the cross-reference section, and in papers provided by IDS, which were included with at least one of the cross-referenced applications, superscalar processors are also known in the art. This term refers to processors that have multiples of one or more types of functional units, and an ability to issue concurrent instructions to multiple functional units. Most central processing units (CPUs) built today have more than a single functional unit of each type, and are thus superscalar processors by this definition. Some have many such units, including, for example, multiple floating point units, integer units, logic units, load/store units and so forth. Multi-streaming superscalar processors are known in the art as well.
The inventors have determined that there is a neglected field in the architecture for all types of multi-streaming processors, including, but not limited to the types described above: The neglected field is that of communications between concurrent streams and types of control that one active stream may assert on another stream, whether active or not, so that the activity of multiple concurrent threads may be coordinated, and so that activities such as access to functional units may be dynamically shared to meet diverse needs in processing. A particular area of neglect is in mapping and handling of external and internal interrupts in the presence of multiple streams and also exception handling.
A dynamic multi-streaming (DMS) processor known to the inventors has multiple streams for processing multiple threads, and an instruction scheduler including a priority record of priority codes for one or more of the streams. The priority codes determine in some embodiments relative access to resources as well as which stream has access at any point in time. In other embodiments priorities are determined dynamically and altered on-the-fly, which may be done by various criteria, such as on-chip processing statistics, by executing one or more priority algorithms, by input from off-chip, according to stream loading, or by combinations of these and other methods. In one embodiment a special code is used for disabling a stream, and streams may be enabled and disabled dynamically by various methods, such as by on-chip events, processing statistics, input from off-chip, and by processor interrupts. Some specific applications are taught, including for IP-routers and digital signal processors.
The DMS processor described above is further enhanced with a processing system that has an instruction processor (IP), register files for storing data to be processed by the IP, such as a thread context, and a register transfer unit (RTU) connected to the register files and to the IP. Register files may assume different states, readable and settable by both the RTU and the IP. The IP and the RTU assume control of register files and perform their functions partially in response to states for the register files, and in releasing register files after processing, set the states. The processing system used by the DMS processor is particularly applicable to multi-streaming processors, wherein more register files than streams may be implemented, allowing for at least one idle register file in which to accomplish background loading and unloading of data.
A further enhancement to the above-described DMS processor utilizes unique inter-stream control mechanisms whereby any stream may effect the operation of any other stream. In various embodiments the inter-stream control mechanisms include mechanisms for accomplishing one or more of enabling or disabling another stream, putting another stream into a sleep mode or awakening another stream from a sleep mode, setting priorities for another stream relative to access to functional resources, and granting or blocking access by another stream to functional resources. A Master Mode is taught in this enhancement, wherein one stream is granted master status, and thereby, may exert any and all available control mechanisms relative to other streams without interference by any stream. Supervisory modes are taught as well, wherein control may be granted from minimal to full control, with compliance of controlled streams, which may alter or withdraw control privileges. Various mechanisms are disclosed, including a mechanism wherein master status and inter-stream control hierarchy is recorded and amended by at least one on-chip bit map. In this mechanism each stream maintains and edits a bitmap granting or withdrawing control privileges for each other stream, the settings valid for any stream but a Master stream, which will ignore the settings.
Yet another feature taught in disclosure related to the DMS processor described above relates to interrupt handling. Interrupt handler logic is provided wherein the logic detects and maps interrupts and exceptions to one or more specific streams. In some embodiments one interrupt or exception may be mapped to two or more streams, and in others two or more interrupts or exceptions may be mapped to one stream. Mapping may be static and determined at processor design, programmable, with data stored and amendable, or conditional and dynamic, the interrupt logic executing an algorithm sensitive to variables to determine the mapping. Interrupts may be external interrupts generated by devices external to the processor software (internal) interrupts generated by active streams, or conditional, based on variables. After interrupts are acknowledged streams to which interrupts or exceptions are mapped are vectored to appropriate service routines. In a synchronous method no vectoring occurs until all streams to which an interrupt is mapped acknowledge the interrupt.
The present invention provides apparatus and methods for implementing atomicity of memory operations in systems wherein two or more processing streams share one memory resource. The present invention relates more specifically to such systems utilizing DMS processors as known to the inventors and as described above and in other disclosure in the present document. Atomicity in this context means that each participating stream is able to perform a read-modify-write operation that has the effect of an indivisible operation with respect to all participating streams.
It is well-known in the art of data processing that in many cases, a sequence of memory read and write functions may not be atomic with respect to other processors. This typically can occur when two processors are accessing the same memory location at the same time. Without mechanisms to guarantee atomicity, two separate processors programmed to increment the same location in memory may read and write their values for that location with only one increment taken. For example, assume that a value in a memory location is 4 and the incremental value is 1. A single read, increment, and write sequence by one single-stream processor would result in a value of 5. If two processors increment at separate times, then the value would be 6 reflecting two increment operations, which would be the correct result. If however, both processors attempt to increment at the same time, the value may only be incremented one time leaving a value of 5 in memory, which is not the desired result.
The well-known MIPS architecture, as well as other known architectures, provide methods to assure against the undesired result in the example described above of two or more processors attempting to increment a memory location at the same time. MIPS, for example, provides a mechanism in which a sequence of operations containing a Load Linked instruction and a Store Conditional instruction will either be atomic or will fail with an indication of failure being provided. Any modification of the memory location between the Load Linked and the Store Conditional instructions will cause the Store Conditional to fail without modifying memory. Another attempt will then be made to repeat the entire sequence.
In a DMS processor, it is desired that when two streams are attempting to atomically read and modify a memory location, processor resources are not utilized to repeatedly loop until the entire read-modify-write sequence can be completed successfully.
What is clearly needed is a new method and apparatus that guarantees atomicity while addressing the inefficiency problems described above with regard to DMS processors sharing a single memory resource. Such a method and apparatus would allow for atomicity of memory operations and at the same time provide an opportunity to further optimize processing speed of DMS processors by preventing unnecessary and repetitive use of on-chip resources.