In a vector processor, in which there are separate control and data pipelines, the writing of results from a vector operate instruction into a vector register file typically occurs long after the reading of the operands. An entire vector can be written by a single command. Once the command is received, the vector register file will autonomously generate the addresses and write enables required to write the results into the vector register file. The addresses and write enables need to be synchronized with the results so that they arrive at the vector registers at the same time. Therefore, the addresses and write enables, or the commands that produced them, need to be delayed or "siloed" for a large number of cycles.
In a dual phase clock system, two state devices, such as latches, are typically used in a silo buffer for each desired cycle delay, since the information is passed between the latches on both the A and B phases of each cycle. Thus, in order to silo information for twenty cycles, forty latches would normally be used. Such an arrangement becomes expensive in terms of gates needed to implement the silo when the number of bits that are needed to be siloed is relatively large. For example, the addresses and write enables can be eleven bits, so that a typical silo would need to have forty latches that are each eleven bits wide in order to delay the addresses and write enables for twenty cycles. This arrangement would require a large number of gates in order to be implemented.
Another area in which a silo finds use in a processor is in the replaying of virtual addresses by a control module. In systems which use cache memories, when a control module generates an address to the cache memory, the cache memory will either return the information at that address or return a miss signal if there is not valid information at that address. This process takes some finite amount of time to perform so that a miss signal will not be received by the control module until some time after the address which caused the trap was generated. Some processors, especially vector processors, are heavily pipelined so that the control module will have generated a number of successive addresses after the trap causing address was generated and before the miss signal is received. The time between the generation of the trap causing address and the receipt of a miss signal by the control module is known as the trap shadow.
Since no valid information could be returned for the trap causing address, it must again be generated. Further, those addresses which were generated during the trap shadow also need to be generated again. One method to provide the trap causing address to the control module after a trap is to silo each address for a period of time equal to the trap shadow. By this arrangement, the address which caused a trap will be at the exit of the silo and be available to the control module when the miss signal is received by the control module.
Again, this solution turns out to be very expensive in terms of hardware. The trap shadow can be typically fourteen (14) cycles long, which requires twenty-eight (28) latches to silo the addresses. Further, addresses which are used (especially with virtually addressed caches) can be thirty-two (32) bits in length. Each of the twenty-eight latches would then need to be thirty-two bits wide for this arrangement, which again would require a large number of gates in order to be implemented.