This invention relates to general purpose data processors, and in particular, to such data processors having a writable instruction set with a hardware stack.
This invention is based upon the groundwork laid by our previous CPU/16 patent application Ser. No. 031,473 filed on Mar. 24, 1987, also assigned to the same assignee.
Since the advent of computers, attempts have been made to make computers smaller, with increased memory, and with faster operation. Recently, minicomputers and microcomputers have been built which have the memory capacity of original mainframe computers. Most of these computers are referred to as "complex instruction set" computers. Because of the use of complex instruction sets, these computers tend to be relatively slow in operation as compared to computers designed for specific applications. However, they are able to perform a wide variety of programs because of their ability to process instruction sets corresponding to the source programs run on them.
More recently, "reduced instruction set" computers have been developed which can execute programs more quickly than the complex instruction set computers. However, these computers tend to be limited in that the instruction sets are reduced to only those instructions which are used most often. Infrequently used instructions are eliminated to reduce hardware complexity and to increase hardware speed. Such computers provide limited semantic efficiency in applications for which they are not designed. These large semantic gaps cannot be filled easily. Emulation of complex but frequently used instructions is always a less efficient solution and significantly reduces the initial speed advantage of such machines. Thus, such computers provide limited general applicability.
The present invention provides a computer having general purpose applicability by increasing flexibility while providing substantially improved speed of operation by minimizing complexity as compared to conventional computers. The invention provides this in a way which uses simple, commonly available components. Further the invention minimizes hardware and software tool costs.
More specifically, the present invention provides a computer having a main program memory, a writable micro-program memory, an arithmetic logic unit, and a stack memory, all connected to a single common data bus. In a preferred embodiment, this invention provides a computer interface for use with a host computer. Further, more specifically, both a data stack and a subroutine return address stack are provided, each associated with a pointer which may be set to any element in the corresponding stack without affecting the contents of the stack. Further, there is a direct communication link between the return stack and the main program memory addressing logic, and a direct link between the main program memory and the microcode memory which is separate from the data bus. This provides overlapped instruction fetching and executing, and allows the processing of subroutine calls in parallel with other operations. This parallel capability provides for zero-time-cost (i.e. "free") subroutine calls not possible with other computer architectures.
A major innovation of the present invention over previous writable instruction set, hardware stack computers is the use of a fixed-length machine instruction format that contains an operation code, a jump or return address, and subroutine calling control bits. This innovation, when combined with the direct connection of the return address stack to memory, the use of a hardware data stack, and other design considerations, allows the machine to process subroutine calls, subroutine returns and unconditional branches in parallel with normal instruction processing. Programs which follow modern software doctrine use a large number of small subroutines with frequent subroutine calls. The impact of processing subroutine calls in parallel with other computations is to encourage following modern software doctrine by eliminating the considerable execution speed penalty imposed by other machines for invoking a subroutine.
As a result of the combination of a next instruction address with the opcode for each instruction, the preferred embodiment does not have a program counter in the traditional sense. Except for subroutine return instructions, each instruction contains the address of the next instruction to be executed. In the case of a subroutine return, the next instruction address is obtained from the top value on the return address stack. While this technique is commonly employed at the micro-program level, it has never been used in a high-level language machine. In particular, it has never been used on any machine for the express purpose of processing subroutine calls in parallel with other high level machine operations.
A consequence of the availability of "free" subroutine calls combined with a writable instruction set is a shift of paradigm from the programmer's point of view, opening the as yet unexploited possibility of new methods for writing programs. Conventional computers are viewed by the programmer as executing sequential arrangements of instructions with occasional branches or subroutine calls. Each list is conceived of as directly executing machine functions (although a layer of interpretation may be hidden from the programmer by the hardware.) In a writable instruction set computer with hardware stacks and zero-cost subroutine calls, programs are viewed as a tree-structured database of instructions, in which the "root" of the tree consists of a group of pointers to sub-tree nodes, each sub-tree node consists of another group of pointers to further nodes, and so on out to the tree "leaves" which contain instructions instead of pointers. Flow of control is not viewed as along sequences of instructions, but rather as flow traversing a tree structure, from roots to leaves and then up and down the tree structure in a manner to visit the leaves in sequential order. In the case of this preferred embodiment, the tree structure nodes consist of subroutine call pointers, and the leaves consist of effectively subroutine calls into microcoded primitives. Due to the capability of combining an instruction opcode with a subroutine call, greater efficiency is realized with this design than with what could be realized with a pure tree machine that could only execute operations or process subroutine calls (but not both) with each instruction.
A preferred ALU made in accordance with the invention has a register (the data hi register) on one input for holding intermediate results. On the other input side is a transparent latch (implemented in the preferred embodiment with standard 74ALS373 integrated circuits) that can either pass data through from the data bus, or retain data present on the bus on the previous clock cycle. This retention capability, along with the capability to direct the contents of the ALU register directly to the bus, allows exchanging the data hi register with the data stack or other registers in two clock cycles instead of the three clock cycles which would be required without this innovation. Since exchanging the top two elements of the data stack is a common operation, this results in a substantial increase in processing speed with very little hardware cost over having multiple intermediate storage registers.
In the preferred embodiment of the invention, a four-way decoder is used to control individual 8-bit banks of the 32-bit program memory. This, combined with data flow logic in the interface between the program memory and the data bus, allows individual access to modification of any byte value in program memory with a single write operation. Conventional computers require a full width memory read, 8-bit modification of the data within a temporary holding register, and a full width memory write operation to update a byte in memory, resulting in substantially slower speeds for such operations. While the preferred embodiment employs this new technique to modify 8 bits of a 32 bit word, this technique is generally applicable to accessing any subset of bits within any length of memory word.
The combination of appropriate software shown in Appendix A that exploits the simultaneous processing of conditional branching opcodes with subroutine calls and the use of hardware stacks combine to form an exceptionally efficient expert system inference engine. An expert system rule base typically is formed by a nested list of "rules" which can invoke other rules via subroutine calls that are only activated under certain conditions. The capability of the preferred embodiment to simultaneously process each rule-oriented subroutine call while evaluating the conditions under which the subroutine call will either be allowed to proceed or will be aborted greatly speeds up processing of expert system programs. Expert systems can run at speeds of over 600,000 inferences per second on the preferred embodiment using a 150ns clock cycle, which is a substantial improvement over existing general purpose computers, and in fact over most special purpose computers.
It will be seen that such a computer offers substantial optimization of throughput while maintaining flexibility. It is also predicted that use of such a machine will positively influence programs and programming languages to have improved structure and lower development cost by not penalizing the modern software principle of breaking programs up into small subroutines.
These and other advantages and features of the invention will be more clearly understood from a consideration of the drawings and the following detailed description of the preferred embodiment.