With the ever-increasing demand for faster and more effective computer systems naturally comes the need for faster and more sophisticated electronic components. The computer industry has been extremely successful in developing new and faster processors. The processing speed of state-of-the-art processors has increased at a spectacular rate over the past decades. The access time of memory circuits, however, has not been able to improve at the same rate. In fact, the ratio between memory access time and clock cycle time for execution has increased rapidly during the past 10 to 15 years, and is expected to increase even further in the future. This means that memory will continue to be the limiting factor for overall system performance. The relatively long access time for retrieval of information from main memory generally means that the processor has to spend time merely waiting for information that is required during execution. This waiting time is often referred to as memory latency or read latency.
A particular procedure that is strongly affected by the memory latency in conventional computer systems is the job start preparation of jobs to be executed by the processor. Typically, a number of job records are buffered in a memory-allocated queue, awaiting processing by the computer system. As soon as the processor is ready to process a job record from the queue, the relevant record information such as instruction address information and input arguments for the job, is read from the queue into the register file of the processor so that job execution can be initiated. This procedure of updating the register file, as part of the job start preparation, is usually performed by an operating system routine as a separate activity in between jobs.
However, due to the memory latency, the processor will enter a “stall state” while waiting for the required information to enter the register file from the memory, thus wasting valuable execution time and resources during the job start preparation. In many cases, the jobs are relatively short, less than 100-200 clock cycles, and with memory access times of up to 100 clock cycles it is evident that the register file update will occupy a significant part of the overall execution time. This is a general problem in all types of processors, ranging from the simplest processors to modern instruction-parallel and pipelined processors.
One way of reducing the job start preparation time is to use multiple register files that can be switched in and out quickly. While the processor is actively operating towards one of the register files, another currently switched-out register file is filled with information. Once the processor has completed execution of the current job, the previously prepared register file is now switched into operation with the processor. Although this solution provides a quite satisfactory solution to the memory latency problem at job start, the hardware overhead is quite significant and the extra register files, generally requires increased on-chip area. The increase in chip area gives rise to a corresponding increase of the distance for data transport over the chip, which in turn affects the access time.
In addition, depending on the system configuration, the job start preparation may also involve a number of table look-ups for determining the next job and for retrieving some of the job information to be transferred to the register file. Since each table look-up is associated with its own memory latency, it is evident that a whole series of table look-ups will contribute strongly to the overall memory or read latency experienced by a job.