1. Field of the Invention
This invention relates generally to the design and synthesis of digital circuits and specifically to the scheduling of rules in a Term Rewriting System (TRS).
2. Background Information
Hardware Description Languages (HDLs) have been used for many years to design digital systems. Such languages employ text-based expressions to describe electronic circuits, enabling designers to design much larger and more complex systems than possible using previously known gate-level design methods. With HDLs, designers are able to use various constructs to fully describe hardware components and the interconnections between hardware components. Additionally, time-dependency and concurrency, important attributes of most digital circuits, can be easily described.
One popular Hardware Description Language is Verilog, first implemented by Phil Moorby of Gateway Design Automation in 1984, and later standardized under IEEE Std. 1364 in 1995. Currently, Verilog is supported by a wide variety of software tools and exists in several different incarnations and versions. One factor that has led to Verilog's popularity is its ability to describe a digital system at several levels of abstraction.
At one level of abstraction, Verilog may operate as a Register-Transfer Language (RTL) in which circuits have, or are abstracted to have, a set of registers. A designer may use an RTL to specify the values of the registers in each clock period in terms of the values of the registers in the proceeding clock period. In this way, an RTL implements a finite state machine (FSM) of the circuit to be specified. While envisioning the circuit as an FSM, the designer explicitly manages concurrency of execution by scheduling the exact cycle-by-cycle interactions between multiple concurrent states. To design a more complex digital circuit, such as a pipelined central processing unit (CPU), using an RTL approach, a designer generally will define a number of modules, each as an FSM. The designer then specifies the interoperations of these modules so that they may operate concurrently.
As hardware systems become more complex, for example if one were to add out-of-order speculative instruction execution to the pipelined processor mentioned above, RTL design becomes increasingly complicated. With added complexity, design mistakes become more common, especially in coordinating interactions between multiple finite state machines. The designer must manage an increasingly complicated mental model to design and interconnect FSMs. This difficulty is compounded by the large size of the RTL code, which makes debugging more difficult.
In an attempt to address these issues, designers have sought to specify digital circuits in “behavioral” terms, rather than in terms of transitions between states. In a behavioral specification, the focus is on the functions performed by the circuit, rather than on individual register values. When several behaviors are described, the designer typically employs multiple threads of computation with message-passing or shared-memory. At another level of abstraction, Verilog may support a behavioral specification approach. Similarly, a behavioral specification may be implemented in other languages such as System C, an open-source kernel that extends the C++ language to enable hardware design.
The behavioral specification approach allows more rapid specification of circuits than RTL design, and, due to its simpler structure, produces specifications that are more easily debugged. Yet, as with RTL specifications, designers of behavioral descriptions still must explicitly manage the interactions between concurrent operations. Also, it is rarely possible or practical to synthesize an equivalent digital circuit directly from a given behavioral specification. Often, the behavioral specification must first be translated into a lower-level specification before synthesis. Finally, formal verification of a behavioral specification is often very difficult or impossible due to the nature of the specification.
To address these and other shortcomings, a hardware design approach centered upon Term Rewriting System (TRS) technology has been developed. Term Rewriting traces its foundations back to 1930s mathematical logic theory, but only recently has been adapted to hardware design. A TRS approach to hardware design employs a list of “terms” that describe hardware states, and a list of “rules” that describe behavior. A “rule” captures both a state-change (an action) and the conditions under which the action can occur. Further, each rule has atomic semantics, that is, each rule executes fully without interactions with other rules. This implies that, even if multiple rules are executed on a given state, they can be considered in isolation for analysis and debugging purposes.
More formally, a Term Rewriting System has rules that consist of a predicate (a function that is logical true or false) and an action body (a description of a state transition). A rule may be written in the following form:rule r: when π(s)=>s:=δ(s)where s is the state of the system, π is the predicate, and δ is a function used to compute the next state of the system. The expression s:=δ(s) comprises the action body of the rule. If π(s) is true, then δ(s) defines the next state of the system. In a strict implementation of a TRS, only one rule may execute on a given state. However, as explained further below, concurrent application of rules is desirable for efficient execution. Therefore if several rules are applicable on a given state, some implementations may allow more than one rule to be selected to update the system. Afterwards, all rules are re-evaluated for applicability on the new state of the system and the process continues until no further rules are applicable. In practice, abstract data types such as arrays and First In First Out (FIFO) queues are often used to make the descriptions more readable.
It has been found that the quality of the hardware generated by a TRS system is dependent on the order and concurrency of the application of the rules. While some rules may be executed concurrently, others conflict (for example, they both attempt concurrent access to a single-ported resource) and must be executed sequentially. Therefore, prior approaches to TRS hardware design have implemented a scheduler to determine which rules will execute in each clock cycle.
One type of scheduler that has been employed is a priority encoder which asserts one executable rule in each clock cycle. This type of scheduler may also include round-robin functionality that ensures that if a rule remains applicable for a sufficient number of consecutive clock cycles then it will be selected for execution. Unfortunately, the efficiency of hardware produced by the priority encoder scheduling method has been found to be inadequate. Further details relating to the priority encoder may be found below.
Another type of scheduler that has been previously implemented is an enumerated scheduler (also termed direct table encoder). In an enumerated scheduler, applicable rules are listed in an enumerated encoder table, a lookup table constructed to contain an explicit listing of the rules that can execute given a certain combination of applicable rules. Such a table is constructed so that the maximum number of non-conflicting rules execute on a given clock cycle. A more detailed discussion of the enumerated scheduler may be found below.
While the enumerated scheduler has been found to generate relatively efficient hardware, computation of the lookup table necessary for the scheduler is computationally intensive and takes an unacceptable amount of processing time. Indeed, as explained more fully below, the best known implementation of an enumerated scheduler requires processing time exponentially related to the number of rules considered, thereby making the scheduler impractical for highly complex systems.
In order to make the TRS approach to hardware design more viable, a more capable scheduler than either the priority encoder or the enumerated scheduler is required. It would be desirable for such a scheduler to generate hardware of equivalent quality to hand-coded RTL design, while not consuming an inordinate amount of processing time, so that it would be practical for highly complex systems.