The present invention relates generally to a photon/light based data storage, distribution and simultaneous data access system for a multiprocessor computer system.
An initial response to a need for greater data processing capability is to operate the central processing unit at higher speeds. Increasing the rate of operation of a central processing unit enables greater data processing operations per unit time. This is not a complete solution because memory speed often cannot keep pace with processor speed. The mismatch of processor speed and memory speed can be minimized using memory cache, but such memory cache introduces other problems. Often high processor speeds require deep pipelining. Deep pipelining extends the processing time required to process conditional branches. Thus increased processor speed can achieve only limited improvement.
Another potential response is multi-processing. The central processing unit and at least some auxiliary circuits are duplicated. Additional data processor cores enable greater data processing operations per unit time.
Multi-processor computer systems which provide increased processing power through parallel processing operation are known. Such systems are used in a wide variety of applications wherein functions are allocated to different processors. Systems present many scalability problems since the bus may be saturated by a relatively small number of processors. One means of increasing the total number of available interconnected processors is to employ multiple sub-systems, together with some means of communicating between sub-systems, typically some form of reflective memory. Such a system can be designed in such a way that the majority of inter-processor communication remains within a sub-system, and has no impact on other sub-systems, whilst the reflective memory system provides communication between sub-systems when required. This approach however is relatively expensive in terms of the additional packaging hardware, support hardware and the reflective memory system required.
When a particular application/project/job requires more processing power than a single processor is capable of providing, it becomes necessary to provide a co-processor, such as a digital signal processor (DSP) or a floating point unit (FPU). Thus, the tasks associated with the particular application are handled in unison by the main processor and the co-processor. The most common conventional solution to solving the problem of how to allocate the resources to the multiple processors is to utilize a dual-ported memory subsystem wherein each processor has equal access to the common resources that may be used by both processors. Alternatively, each processor may be provided with a dedicated resource and a mechanism for transferring commands and data through a shared “Mail Box.” The shared “Mail Box” typically includes a number of first in/first out (FIFO) registers of varying length.
The conventional dual-ported memory solution provides processor independent design implementation, but requires a large amount of hardware for the random access arbitration for both processors. Consequently, the actual implementation of the arbitration logic and the random access for the common bus creates more delay on the common resources since the access to the common bus must be determined prior to accessing the common resources. The typically small degradation in the access speed in the dual-ported memory is magnified by a significant amount when that common resource is the main memory because the main memory is the common resource most utilized by both processors. Therefore, the interdependency of the multiple processors increases since they both rely heavily on the main memory.
The conventional dedicated resource for each processor with the shared “Mail Box” scheme prevents the multiple processors from competing with each other for the same resource, but suffers greatly in terms of access speed both since the data and commands must all pass through the “Mail Box” which has a relatively narrow bandwidth. In addition, duplicative resources are necessary since each processor has requires its own dedicated and duplicated resources. Although the scheme works quite well when the tasks for the processors are well defined and common mailbox data transfer size is relatively small, the actual performance and resource utilization suffers greatly when the tasks are not well defined and the processors are therefore more interdependent. Thus, there is a need in the art for a system and method which permits multiple processors to communicate with each other and control the access to the shared resources, and, ideally, to permit substantially simultaneous access by each of the processors to the data.
Although multiprocessors enhance the performance of a computer system, the multiple processors also create additional problems, such as when more than one of the processors attempts to access a shared hardware or software resource at the same time. A conventional solution to this problem has been through the use of semaphores located in memory. In general, semaphores are counters that are used to control access to shared resources by multiple processes. Semaphores are commonly used as a locking mechanism to prevent processes from accessing a particular resource while another process is performing operations on it.
In operation, for example, if a processor wants to access a system resource it must first check the status of the desired resource by sending a read command over the system bus to the associated semaphore in the system memory, and the semaphore returns the status information back to the processor. If the desired resource is available, the processor sends a write command to the semaphore to change the status of the semaphore from “available” to “unavailable.” To prevent another process or processor from checking the status of the semaphore concurrent with the processor, prior to sending the read command, the processor will traditionally lock the system bus until the read/write routine is completed.
Not only does locking the system bus prevent another processor or “master” from accessing the particular semaphore, but it also prevents the other processors from communicating with the other devices on the bus. This is disadvantageous in that it slows the efficiency of the system, resulting in an increased latency of system operations which defeats the advantages of utilizing a multiple processor architecture.
Accordingly, there is a need in the art for a system and method that permits multiple processors to communicate with each other and to control access to shared resources without “locking” the system bus, thus maintaining the increased efficiency and additional advantages offered by a multiprocessor system. There is also a need in the art for a system and method which permits simultaneous access to the shared data by each of the multiple processors and elimination of the use of electrical currents to transmit the data, thus increasing the overall speed of the operation of the system.
Moving from a uni-processor system to a multi-processor system involves numerous problems on both the hardware and software side. In theory providing additional data processor cores permits additional data processing operations. However, proper programming of a multi-processor system to advantageously exploit additional data processor cores is difficult. One technique attempting to solve this problem is called symmetrical multi-processing (SMP). In symmetrical multi-processing each of the plural data processor cores is identical and operates on the same operating system and application programs. It is up to the operating system programmer to divide the data processing operations among the plural data processor cores for advantageous operation. This is not the only possible difficulty with SMP. Data processor cores in SMP may operate on data at the same memory addresses such as operating system file structures and application program data structures.
Any write to memory by one data processor core may alter the data used by another data processor core. The typical response to this problem is to allow only one data processor core to access a portion of memory at one time using a technique such as spin locks and repeated polling by a data processor not currently granted access. This is liable to cause the second data processor core to stall waiting for the first data processor core to complete its access to memory. The problems with sharing memory are compounded when the identical data processor cores include caches. With caches each data processor core must snoop a memory write by any other data processor core to assure cache coherence. This process requires a lot of hardware and takes time. Adding additional data processor cores requires such additional resources that eventually no additional data processing capability is achieved by such addition.
Each multi-processing model which is currently employed has one or more problems associated with it which limit either its speed or access to data. For example, another multi-processing model is called the factory model. The factory model multi-processing requires the software developer to manually divide the data processing operation into plural sequential tasks. Data processing then flows from data processor core to data processor core in the task sequence. This division of the task is static and not altered during operation of the multi-processor system. This is called the factory model in analogy to a factory assembly line. This factory model tends to avoid the data collisions of the SMP model because the data processor cores are working on different aspects of the data processing operation. This model tends to work best for data flow operations such as audio or video data streaming.
This factory model is often used in digital signal processing (DSP) operations which typically have many of these data flow operations. There are problems with this factory model as well. The task of dividing the data processing operation into sequential tasks is generally not simple. For even loading of the data processor cores is required to best utilize this factory model. Any uneven loading is reflected in one or more data processor cores being unproductive while waiting for data from a prior data processor core or waiting for a next data processor core to take its data output. The nature of the data processing operation may preclude even loading of the plural data processor cores. Processes programmed using the factory model do not scale well. Even small changes in the underlying data processing operation to be performed by the system may require complete re-engineering of the task division.
Thus, it is evident that those computer systems which use electricity driven data buses to exchange data between processors limit both the access time and the response time, thereby slowing down the entire processing endeavor. These problems are inherent in the use of a data bus and are common for all computer components which rely on access to the same data source. Simply put, if a data bus is serving a particular computer component, such as a processor which is sending data to a hard drive, all other devices must wait until this operation is completed in order to get access to that data bus and thereby gain access to the underlying data.
The distributive problem is often dealt with at the processor level by the use of a combination of hardware and software implementations which attempt to maximize speed and access. There have also been attempt to deal with the issues at the memory cell level.
Integrated circuit designers have always sought the ideal semiconductor memory: a device that is randomly accessible; can be written to or read from very quickly; is non-volatile, but indefinitely alterable; and consumes little power.
One common volatile memory is the DRAM in which information can be written to and read from as bits of data, e.g., a “1” or a “0,” where a “1” generally corresponds to one voltage state stored on a capacitor, and a “0” generally corresponds to another voltage state stored in the capacitor. The capacitor of the DRAM cell typically has an associated transistor that acts as a switch to allow the control circuitry on the memory chip to read from and write to the capacitor.
DRAM cells suffer from a number of shortcomings. First, the capacitor of a DRAM cell is extremely energy inefficient because capacitors of DRAM cells quickly lose their stored voltage, and need to be refreshed to prevent the cell from being discharged, resulting in high levels of energy consumption. Second, because DRAM cells are based on electrical signals, the speed of integrated chips are not only limited by the speed by which electrons travel through matter, but are also limited by the number of interconnections within the chip necessary to effect proper transfer and storage of the signals; these additional interconnections contribute to the problem of short circuits. Finally, the electrical signals used in conventional memory cells can interfere with each other, resulting in increased cross-talk, and decreased performance, which is undesirable.
A typical DRAM consists of an array of transistors or switches coupled to capacitors, where the transistors are used to switch a capacitor into or out of a circuit for reading or writing a value stored in the capacitive element. These storage bits are typically arranged in an array of rows and columns, and are accessed by specifying a memory address that contains or is decoded to find the row and column of the memory bit to be accessed.
DRAM devices such as DDR (Double Data Rate) memory incur a timing penalty when a write request follows a read request. Because the data bus is shared between the read and write references, the memory manager must delay sending the write request to memory until the read data from the previous read request is done with the data bus. In some forms of DDR memory, this delay time is on the order of 6 ns. Typical computer systems ignore this timing penalty and, therefore, face a performance penalty on accesses to memory.
Important characteristics for a memory cell in electronic device are low cost, nonvolatility, high density, low power, and high speed. Conventional memory solutions include Read Only Memory (ROM), Programmable Read only Memory (PROM), Electrically Programmable Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Dynamic Random Access Memory (DRAM) and Static Random Access Memory (SRAM).
ROM is relatively low cost but cannot be rewritten. PROM can be electrically programmed but with only a single write cycle. EPROM has read cycles that are fast relative to ROM and PROM read cycles, but has relatively long erase times and reliability only over a few iterative read/write cycles. EEPROM (or “Flash”) is inexpensive, and has low power consumption but has long write cycles (ms) and low relative speed in comparison to DRAM or SRAM. Flash also has a finite number of read/write cycles leading to low long-term reliability. ROM, PROM, EPROM and EEPROM are all non-volatile, meaning that if power to the memory is interrupted the memory will retain the information stored in the memory cells.
DRAM stores charge on transistor gates that act as capacitors but must be electrically refreshed every few milliseconds complicating system design by requiring separate circuitry to “refresh” the memory contents before the capacitors discharge. SRAM does not need to be refreshed and is fast relative to DRAM, but has lower density and is more expensive relative to DRAM. Both SRAM and DRAM are volatile, meaning that if power to the memory is interrupted the memory will lose the information stored in the memory cells.
Consequently, existing technologies are either non-volatile but are not randomly accessible and have low density, high cost, and limited ability to allow multiples writes with high reliability of the circuit's function, or they are volatile and complicate system design or have low density. Some emerging technologies have attempted to address these shortcomings.
For example, magnetic RAM (MRAM) or ferromagnetic RAM (FRAM) utilizes the orientation of magnetization or a ferromagnetic region to generate a nonvolatile memory cell. MRAM utilizes a magnetoresistive memory element involving the anisotropic magnetoresistance or giant magnetoresistance of ferromagnetic materials yielding nonvolatility. Both of these types of memory cells have relatively high resistance and low-density. A different memory cell based upon magnetic tunnel junctions has also been examined but has not led to large-scale commercialized MRAM devices. FRAM uses a circuit architecture similar to DRAM but which uses a thin film ferroelectric capacitor. This capacitor is purported to retain its electrical polarization after an externally applied electric field is removed yielding a nonvolatile memory. FRAM suffers from a large memory cell size, and it is difficult to manufacture as a large-scale integrated component. More details are discussed in U.S. Pat. Nos. 4,853,893; 4,888,630; and 5,198,994, the contents of which are incorporated by reference.
Another technology having non-volatile memory is phase change memory. This technology stores information via a structural phase change in thin-film alloys incorporating elements such as selenium or tellurium. These alloys are purported to remain stable in both crystalline and amorphous states allowing the formation of a bistable switch. While the nonvolatility condition is met, this technology appears to suffer from slow operations, difficulty of manufacture and reliability and has not reached a state of commercialization. More details are discussed in U.S. Pat. Nos. 3,448,302; 4,845,533; 4,876,667; 6,044,008, the contents of which are incorporated by reference.
Wire crossbar memory (MWCM) has also been disclosed in U.S. Pat. Nos. 6,128,214; 6,159,620; and 6,198,655, the contents of which are incorporated by reference. These memory proposals envision molecules as bistable switches. Two wires (either a metal or semiconducting type) have a layer of molecules or molecule compounds sandwiched in between. Chemical assembly and electrochemical oxidation or reduction are used to generate an “on” or “off” state. This form of memory requires highly specialized wire junctions and may not retain nonvolatility owing to the inherent instability found in redox processes. Memory devices have been proposed which use nanoscopic wires, such as single-walled carbon nanotubes, to form crossbar junctions to serve as memory cells. Typically, individual single-walled nanotube wires suspended over other wires define memory cells. Electrical signals are written to one or both wires to cause them to physically attract or repel relative to one another. Each physical state (i.e., attracted or repelled wires) corresponds to an electrical state. Repelled wires are an open circuit junction. Attracted wires are a closed state forming a rectified junction. When electrical power is removed from the junction, the wires retain their physical (and thus electrical) state thereby forming a non-volatile memory cell.
In a parallel trend, as discussed in United States Patent Application 20030121764, nanowires are often thin strands of conductive or semiconductive materials with diameters in the nanometer range to a few hundred nanometers. The nanowires have been operated in a room-temperature, ultraviolet lasing mode. These devices can convert electrical energy into light energy. United States Patent Application 20050009224 mentions the high cost of manufacturing conventional solar cells limits their widespread use as a source of power generation. The construction of conventional silicon solar cells involves four main processes: the growth of the semiconductor material, separation into wafers, formation of a device and its junctions, and encapsulation. For cell fabrication alone, numerous steps are required to make the solar cell and many of these steps require high temperatures (300.degree. C.-1000.degree. C.), high vacuum or both. In addition, the growth of the semiconductor from a melt is at temperatures above 1400.degree. C. under an inert argon atmosphere. To obtain high efficiency devices (>10%), structures involving concentrator systems to focus sunlight onto the device, multiple semiconductors and quantum wells to absorb more light, or higher performance semiconductors such as GaAs and InP, are needed. These options all result in increased costs.
In summary, typical memory devices are composed of an array of bit cells, with each bit cell having a storage component to store or retain an electrical charge representative of a bit value (e.g., a logic “0” or a logic “1”). However, due to the electrical properties of the bit cells, memory devices typically can operate with relatively low power consumption or at relatively high speed, but not both. Further, memory architectures that operate with relatively low power consumption or operate at relatively high speeds typically are difficult to scale. Flash memories, for example, exhibit relatively low power consumption and are relatively easy to scale but are relatively slow in comparison to other memory architectures, such as static random access memories (SRAMs), which are relatively fast but often are difficult to scale and typically do not operate reliably in low power implementations. Accordingly, an improved technique for storing, retaining and permitting simultaneous access to data by multiple processors would be advantageous.