The invention relates to data processing, and more particularly, to commands that are stored in an on-chip memory.
With the progress of semiconductor technology, it is possible to integrate complex circuit blocks on a single chip, chip set, or board. A single chip can have I/O cells, datapath operators, memory elements, and control units. Different circuit blocks perform different functions (or operations). It is the control structure""s job to ensure that the proper circuit block performs the proper operation at the proper time.
One example of a circuit block is an adder. Adders are used for counting, filtering, and multiplying. Registers are often used at the input and output of the adder to ensure that the inputs and outputs arrive at the same time. Another example of a circuit block is a multiplier. Multipliers are used in digital signal processing operations for correlations, convolutions, filtering, and frequency analysis.
Other examples of circuit blocks include parity generators, comparators, zero/one detectors, boolean operators, arithmetic logic units (ALUs), and shifters. Parity generators are used to determine whether the number of ones in an input word is odd or even. Comparators are used to compare the magnitude of two binary numbers. Zero/One detectors are used to determine whether a number has all ones or all zeros. Binary counters are used to cycle through a sequence of binary numbers. Shifters are important for arithmetic shifting, logical shifting, and rotation functions.
Memory elements are used to store, among other things, the commands for, the inputs to, and the outputs from the circuit blocks. Memory elements are usually divided into three categories: random access memory, serial access memory, and content access memory. Random access memory is usually defined as memory that has an access time independent of the physical location of the data. Within the general classification of random access memory, there are two subcategories: read only memory (ROM) and read/write memory. The term RAM is usually used to refer to read/write memory. RAMs are used to store the outputs of the circuit blocks.
The phrase xe2x80x9cdata processingxe2x80x9d refers to moving data between the circuit blocks to achieve a particular function. This movement of data is coordinated by a control structure which issues new commands at the start of each clock cycle. Some systems use a SISD (single instruction stream single data stream) control unit, wherein instructions are processed one at time. Because there is only one instruction, the efficiency of a SISD system is usually improved by increasing the length of the instruction word or by increasing the clock frequency.
In a superscalar system, the control unit can operate two or more circuit blocks during the same clock cycle by executing two or more instructions at the same time (in parallel). If there are N circuit blocks, a superscalar system can theoretically operate N circuit blocks during the same clock cycle. Because there are multiple instructions, superscalar control units are usually more efficient than SISD control units.
The term xe2x80x9cpipeliningxe2x80x9d refers to retrieving (fetching) instructions from an on-chip memory and decoding these instructions to organize them in time to operate various circuit blocks. The control unit can store decoded instructions in an instruction path as pipeline stages. The instruction path can be divided into different stages, such as, instruction fetch, instruction decode, register read, execute, and/or write. The control unit can perform comparisons to permit pass-around and ensure that operations occur in the proper sequence.
Most systems use what is commonly referred to as a very long instruction word (VLIW) controller. In a VLIW controller, the control program is stored in an on-chip memory as individual words where each word corresponds to a particular clock cycle. Each VLIW has at least one bit field for each circuit block, even circuit blocks that are inactive relative to the current VLIW. One problem with VLIW controllers is that as the number of circuit blocks increases, so does the length and size of the control program. In particularly complex systems, there is not enough chip space to store the control program. It is not possible to manage each of the circuit blocks without devoting disproportionately large portions of the chip to store the instructions.
In most systems, only a small subset of the circuit blocks are active during each clock cycle. As a result, only a small part of each VLIW is actually used. In other words, the control word is longer than necessary and valuable chip area is wasted. In some applications, it is no longer possible to fit the command memory on a single chip. There is a need for a control structure and a command memory that can manage multiple circuit blocks and issue commands in parallel without wasting valuable chip area.
These and other drawbacks, problems, and limitations of conventional control units and command memories are overcome by consolidating the commands to reduce the amount of information stored in a command memory. The control unit interprets the commands and restores the order that was removed by the consolidation.
According to one aspect of the invention, commands are stored contiguously in memory words in a command memory. Each command has a label field and an action field. A control unit receives a group of commands that are referred to collectively as a memory word. The control unit decodes the commands and arranges the action fields of the commands in a control word based on information in the label field. The control unit stores the control words in a register and distributes the control words from the register to a plurality of circuit blocks.
When the commands are compressed in the command memory, commands that are not performed in parallel can be stored in the same memory word. Commands that are performed in parallel can be stored in different memory words. The order of the commands in the control word is determined by information in the label field, such as whether the command is performed in parallel with a preceding command.
According to another aspect of the invention, groups of commands that are decoded to form time-sensitive control words or control words that are the target of jump commands are aligned with a memory word boundary. If a group of commands starts in one memory word and ends in the next, the control unit has to read in both memory words. If the group of commands is the target of a jump command, two clock cycles are required before the group of commands is fully available. In time critical applications, it may be undesirable to wait two clock cycles for the group to become available. If, however, the group of commands is aligned with a memory word boundary, the control unit has to read in only one memory word and the control word is available sooner. In a particular embodiment, a special code or an illegal command is inserted in the previous memory word so that the group of commands begins at the beginning of the next memory word.
According to another aspect of the invention, commands are positioned within a group of commands so as to further reduce the size of the command memory. For example, a conditional command is a command that requires a particular circuit block to evaluate a condition. Other commands are executed if the condition is true. In a exemplary embodiment of the invention, if a conditional command is the first command in a group of commands, the condition applies to all the commands in the group. If the conditional command is not the first command in the group, the condition only applies to the immediately preceding command. In other words, the positioning of the commands in the group imparts additional information.
An advantage of the invention is that the command memory does not waste valuable chip area to store commands for inactive circuit blocks. The size of the command memory is minimized so as to use the least amount of chip area.
Another advantage of the invention is that the same controller and the same set of commands can be used regardless of the amount of parallelism (the number of independent paths or pipelines) in a particular system.
Another important advantage of the invention is that the same control fields can be used with slightly different chips that reuse many of the same circuit blocks. The reuse of circuit blocks is important to decreasing the time and cost associated with producing new systems.
Another important advantage of the invention is that in time-critical applications, groups of commands can be aligned with memory word boundaries so that they can be read during a single clock cycle.