1. Field of the Invention
The present invention relates to a data processing device such as a processor, and more particularly, to a data processing device which eliminates the need for reading an instruction sequence, searches a memory in correspondence with input data, and executes a process for the input data according to a search result, in order to quickly process a series of data (stream data) such as time-series data generated in a sampling cycle, etc.
2. Description of the Related Art
There are a great many cases in which a series of data (stream data) such as communication packets for use in a network communication, video/audio data, time-series data generated from each type of a sensor in a sampling cycle, data read/written from/to a disk, arithmetic operation data of a data flow processor, communication data between processors in a parallel computer, and the like is processed. The stream data process referred to here has one or a plurality of the following characteristics.
A) Data having a fixed word length is input to a processing device at a constant speed or intermittently.
B) A plurality of data types are sometimes multiplexed into stream data.
C) A process output can be new stream data.
D) A process output can be buffered in a memory.
E) Input or output stream data can be plural.
F) A process sequence can be configured with a finite state machine.
G) A table search is included as one of process capabilities. The table search is sometimes made by using a stream data word as a key.
H) A special arithmetic operation is included as one of the process capabilities. The arithmetic operation must be performed for a stream data word.
Here, the finite state machine is also the name of an automaton the capability of which is in the lowest class in a sense defined by the theory of formal languages. In this specification, the term xe2x80x9cfinite state machinexe2x80x9d is unavoidably used. This means a state machine that is defined by a finite state and a state transition in a general sense.
Stream data is transferred to a processing device such as a computer, etc., via a transmission line such as a network, a bus, etc., and is processed.
The speed of such stream data has been becoming faster year by year with an increase in a device speed. By way of example, for a communication packet, the speed of 1 Gbps (125 MB/sec) to 4 Gbps (500 MB/sec) is required even at present, and a further increase in the speed is promised. For example, if stream data with a 1-Gbps transfer speed is processed in units of 1 byte, 8 ns (125 MHz) is required to process the data. Even if this stream data is processed in units of 4 bytes, 32 ns (31.25 MHz) is required. The process speed becomes a problem if data is processed at high speed. Furthermore, in terms of capabilities, a complex process such as an image process, a communication process, etc. is required, and at the same time, it is demanded to allow the contents of a process to be flexibly changed.
The present invention aims at general-purpose data processes such as a stream data process, etc., and particularly relates to a method configuring a processing device (processor) that can change the contents of a process.
The conventional techniques for processing stream data are broadly classified into hardware and software methods. Theoretically, a stream data can be implemented by both hardware and software methods. However, processing performance and ease of a capability change must be considered.
The hardware method is a widely used method implementing process capabilities with dedicated hardware. With a dedicated hardware process, dedicated hardware is configured to allow stream data to be processed at the same speed as an input/output speed. Therefore, input stream data can be sequentially processed each time one word is input, without buffering the data (sequential processing method). However, a transfer rate and a processing rate may be sometimes adjusted via some elastic buffer although buffering is not needed as described above. The sequential processing method has an advantage that its process delay normally becomes smaller than that in a batch processing method with which the whole of a series of stream data is processed after being stored in a memory.
The performance of a current CMOS device is approximately 250 MHz. Therefore, a small delay and high performance can be implemented with the sequential processing method by suitably adjusting a word to be processed. At this time, however, a possibility of a capability change becomes a problem. A conventional solution to this problem is, for example, a method using a reconfigurable device such as an FPGA (Field Programmable Gate Array), a PLD (Programmable Logic Device), etc. The method using a reconfigurable (programmable) device is used in some Internet routers. However, since the circuitry amount that can be implemented with the current programmable devices is limited and its performance is low, this method is used only in limited fields. Even if a large-scale and high-performance programmable device becomes available with technological advances in the future, also the transfer speed of stream data using the same technology is expected to increase. Therefore, a field to which a reconfigurable device is applied will be limited only to a field of low performance.
The software method is a method implementing process capabilities with software by using a general-purpose or a dedicated processor. The software method has an advantage that a capability can be changed with ease. This is because capabilities are implemented by software. Furthermore, since an actually existing processor is used in a computer system, this method has another advantage that only a minimum of hardware is required for implementation, which leads to a reduction in cost.
However, there are some problems in terms of performance. Normally, a plurality of instructions must be executed to process one stream data. Therefore, a processor must run at a speed of several multiples of the transfer speed of stream data. Assuming that 10 instructions must be executed to process one stream data, a processor which runs at 312.5 MHz or faster must be fully operated to process 1-Gbps stream data in units of 4 bytes. That is, the software method is effective if the transfer speed of stream data is low, but has a difficulty in processing stream data with high speed that is close to the operating frequency of a processor.
Furthermore, since a computer normally runs under an administration system such as an operating system, etc., the computer cannot immediately start its processing in all cases when stream data is generated. Therefore, a series of stream data is stored in a memory and batch-processed after being accumulated to some amount, so that processed data is obtained or again transferred to another location. Such a batch processing method is a representative method adopted in a normal computer system. With this method, stream data is stored in a memory via an I/O bus. Upon completion of storing a series of data, a computer processes the data with software, and transfers the result of the process to another location via an I/O bus upon terminating the process. Specifically, many computer network processes, image processes, Internet routers, etc. adopt this method. However, because data is stored in a memory, this method poses a delay occurrence problem. For this reason, the processing is performed intermittently although its processing ability is sufficient, and the sequential processing method with a small delay cannot be adopted. This is widely known as a real (actual) time problem.
In summary, the hardware method enables high-speed processing, but has a difficulty in capability change. In the meantime, the software method can flexibly change a capability, but has a problem in data processing performance. Therefore, a processing method that can flexibly change a capability, and can sequentially process data is demanded.
A conventional processor is a stored program type called a Neumann type processor, and is composed of an arithmetic operation mechanism and a program execution mechanism, which are fundamental elements as hardware, as shown in FIG. 1. A program is intended to implement process capabilities by using these pieces of hardware, and a capability can be changed by modifying the program. With the stored program method, the following hardware operations must be performed to process data: an instruction structuring a program which implements process capabilities is fetched, and the fetched instruction is decoded and executed. If process contents are complicated, a plurality of instructions must be executed to process one data. Therefore, in general, the data processing performance of a stored program type processor is proportional to its instruction execution performance, and the data processing performance is lower than the instruction processing performance. In other words, data processing performance higher than instruction processing performance cannot be obtained.
Additionally, since a plurality of instructions must be executed to process one data, the data processing performance results in 1/n of the instruction processing performance. Here, n is a numerical value that depends on the architecture of a processor or the contents of a process. Normally, n is on the order of 5 to 10 even for simple code conversion, and on the order of 100 to 1000 for a complex communication packet process. Namely, to process stream data with a certain frequency, a processor having the instruction processing performance that is 5 to 1000 multiples of that frequency is required.
With the conventional techniques, improvements are made both from the viewpoint of an instruction processing performance increase, and the viewpoint of n reduction. Cache, pipeline, etc. are improvements from the viewpoint of an instruction processing performance increase, whereas MMX (a registered trademark of Intel Corp.) instructions, which are an instruction set for multimedia processing, is an improvement from the viewpoint of n reduction. Additionally, parallel processing is improvement measures that contribute to both of the above described viewpoints. However, as stated earlier, a stored program type processor cannot essentially free from the restriction such that xe2x80x9cinstruction processing performance greater than data processing performancexe2x80x9d. Since a stream data providing side such as a communications network is configured by dedicated hardware, the relationship of xe2x80x9cstream data performance=instruction processing performance greater than data processing performancexe2x80x9d is always satisfied when the same semiconductor technology is used, and the stored program type can never process stream data in real time.
A stored program type processor is a finite state machine that is optimized to process an instruction stream at high speed. A method changing the form of an instruction executed by a processor is conventionally proposed, and a processor architecture comprising such a method is referred to as a dynamic architecture. Examples of a typical method implementing a dynamic architecture include a microprogramming method which is chiefly used in CISC (Complex Instruction Set Computer).
The outline of the microprogramming method is as follows, although its details are omitted here. First of all, as a result of decoding an instruction, the address of a microinstruction corresponding to the instruction is obtained. Microinstructions are a program stored in control storage. The capability of an original instruction is implemented by executing a microinstruction. A microinstruction is implemented in a variety of ways, and is normally composed of a bit string for controlling the resources of processor hardware. An objective capability is obtained by sequentially reading microinstructions, and by applying them to hardware. The fundamental procedure for the process of a microprogram processor is as follows.
Procedural step 1: Reading an instruction.
Procedural step 2: Selecting a process (microinsruction) defined in correspondence with the instruction.
Procedural step 3: Executing the selected process, and returning to the procedural step 1.
A processor instruction can be changed by altering a microinstruction stored in control storage. Note that, however, the conventional form of a microinstruction is almost specific to the resources possessed by processor hardware. An instruction change must be implemented within the range of a processor architecture. Accordingly, the microprogramming method has no flexibility of being able to process arbitrary data. Even if the process can be implemented with a plurality of microinstructions, the process performance is degraded because this is essentially the same as a process performed at a processor program level. RISC (Reduced Instruction Set Computer) is a method devised to overcome the restriction on the performance of a microprogramming process. This indicates that the microprogramming method has a problem in terms of performance.
As described above, the microprogramming method has the architecture restriction and performance problems. An implementation of a normal stream data process with a microprogramming processor does not seem to be proposed conventionally.
As stated earlier, for example, if a stream data process is implemented with the hardware method, high performance can be realized in terms of processing speed, but there is a problem in that a capability change cannot be made with ease.
Furthermore, the software method cannot be free from the restriction such that data processing performance is always lower than instruction processing performance. Also the microprogramming method for implementing a dynamic architecture has a problem such that a capability cannot be flexibly changed.
An object of the present invention is to provide a data processing device that can execute general-purpose data processes such as a stream data process, etc. by making a processor execute not instructions but stream data directly with the use of a dynamic architecture analogous to a microprogramming method, and can change a process capability with ease.
A data processing device according to the present invention comprises an input converting unit, a memory searching unit and an arithmetic operation unit.
The input converting unit obtains memory search data from input data.
The memory searching unit searches, based on the search data, a state transition table storing as an entry a state word which designates a preset process, and reads the state word corresponding to the process to be performed for the input data.
The arithmetic operation unit determines the process to be performed for the input data based on the contents of the state word read by the memory searching unit, and performs the process.
According to the present invention, a high-speed data processing device can be implemented at low cost.
With the data processing device according to a preferred embodiment of the present invention, a search value for a state transition table can be obtained from input data depending on a state, since the input data in various formats is analyzed. This search value can be obtained by changing its obtainment way, depending on a state.
Furthermore, a data processing device that can easily change a process capability by altering the contents stored in a memory in which a state transition rule is stored.