Modern digital computer architectures typically provide a central processing unit (“CPU”), memory, and input/output (“I/O”) ports. The CPU is the “thinking” center of calculation, operating on data stored in the memory according to a series of instructions generally called an “executable program”. The memory stores the data upon which the CPU operates. The input ports transmit data into the memory from the external environment, and the output ports receive from the memory data that have been operated on according to the executable program for transmission to the external environment. Some non-volatile external memory, such as a hard disk drive or compact disc, communicates with internal memory, such as random access memory (“RAM”) and CPU-internal registers, using the I/O ports—the term “memory” as it is used herein means both external and internal memory.
Modern computer architectures may be broadly grouped into three categories: the Princeton (or Von Neumann) architectures, the Harvard architectures, and the modified Harvard architectures. In Princeton architectures as depicted schematically in FIG. 1, data and executable instructions are communicated to and from a CPU 11 using a data bus 12 from a volatile memory 13, or in some cases, a non-volatile memory such as a read-only memory, or “ROM” (not shown). In this way, when an executable program is executed by a user, the instructions are transmitted from the RAM 13 to the CPU 11 using the bus 12. When the instructions operate on data in the RAM 13, the computer uses the same bus 12 to fetch the data from the RAM 13 into the CPU 11 to perform the operation. Then, the computer uses the same bus 12 to save the new data back into the RAM 13 as necessary. Typically, these data and instructions are loaded into the volatile memory 13 from a non-volatile memory 15 using a data bus 14 before the program is executed.
By contrast, in the Harvard architecture 20 depicted schematically in FIG. 2, the instructions and the data have separate physical memories and separate physical buses. That is, there is an instruction memory 23 that stores instructions and an instruction bus 22 that carries instructions to the CPU 21, and there is a separate data memory 27 that stores data and a data bus 26 that carries data to the CPU 21. The volatile instruction memory 23 is connected to a non-volatile instruction memory 25 using a bus 24, and the volatile data memory 27 is connected to a non-volatile data memory 29 using a bus 28.
This Harvard architecture of FIG. 2 has certain advantages over the Princeton architecture of FIG. 1; for example, it is impossible to execute data as instructions, so this security vulnerability of the Princeton architecture is entirely eliminated. The types and widths of bits stored in the two types of memory may be different; thus, the instruction memory may store instructions having a variable bit width, while the data memory and data bus may be optimized to transfer data in large blocks. Moreover, having separate buses for instructions and data means that both instructions and data can be read from their respective memories at the same time, increasing processing speed and reducing circuit complexity, albeit at the expense of increased circuit size. However, the Harvard architecture suffers from additional logistical complexities because instructions and data are separately stored, and therefore must be separately managed.
Therefore, many modern computers implement a modified Harvard architecture as depicted schematically in FIG. 3. In this architecture, the CPU 31 has two separate physical buses: an instruction bus 32 connecting it to an instruction cache 33 and a data bus 34 connecting it to the main memory 35 to store and retrieve data. However, executable programs may include both instructions and data, and are loaded for execution from a common non-volatile memory 37 using a single, optimized data bus 36. Instructions are loaded into the instruction cache 33 as the program execution requires. Many programs spend much of their operating time executing the same instructions over and over, so the use of a specialized cache increases program execution speed. Thus, while the CPU 31 is executing instructions from the cache 33, it has the Harvard behavior, but while it is loading instructions into the cache 33 from the common memory 35, 37, it has the Princeton behavior. Typically the instruction cache 33 is large enough to include most or all of a program's most often-used instructions, so the CPU 31 spends most of its time operating according to the Harvard behavior.
All three categories of computer architectures share the common characteristic that the data memory is generally “flat”; that is, with some vendor-specific exceptions, there is no advantage to storing data in any one memory address over another. Because the memory space is flat, a computer operating system may store the instructions and data at any physical addresses in any memory location that happens to be unoccupied; the executable programs are therefore “relocatable” in memory. This is a useful property because it permits creation of executable files having instructions that use a “virtual” memory space; virtual memory addresses in the program are provided with a map into the physical memory circuits as a function of where and when the program is loaded into physical memory by the operating system. This facility permits a great deal of flexibility in the design of the operating system and applications. Modern computers may devote substantial hardware resources to implement the virtual-to-physical mapping that is required to execute programs, in the form of so-called page tables. However, the simplicity of the memory arrangement requires that the CPU be a complex device with similarly complex operating system software.
A programming language for computer systems that have a flat memory space must provide location-independent instructions. These instructions are parameterized to operate on data stored in any (virtual) memory location, because similar data may be stored in any such location. To perform a computation, these instructions are applied one after the other as “sequential logic,” perhaps taking different memory addresses as arguments, according to the design of a computer programmer to achieve an intended result.
The above-described computer architectures are not optimized to process generalized streams of data. In particular, to process a data stream in accordance with an existing computer architecture, streamed data typically are stored temporarily in a buffer that includes one or more memory locations, and sequential logic is applied to the buffer. Once processing of the data is complete, new data are stored in the buffer, and the entire sequential logic is repeated on the new data. While hardware and software systems have been provided to process streamed data in particular contexts, such as routing of high-bandwidth network data, such systems are necessarily application-specific, and are heavily optimized as a function of properties of the application space, such as a format of the input data. There does not exist a general-purpose programmable system for processing arbitrary data streams with high efficiency.