Detection of faults is important in ensuring the reliability of digital hardware systems. Various types of faults can manifest themselves in the finished product; they may be due to errors in design, manufacturing, wear and tear, or one-time events which do not affect the behavior of the system permanently (the so-called “soft errors”). Cognizant of the possibility of faults, system designers devise schemes to monitor the system for correct operation and alert it to the presence of possible faults.
Because of the size and complexity of digital hardware systems today, it is often impossible or impractical to monitor the system as a whole. Rather, the system designers create schemes to monitor individual modules of the overall system. Because these modules vary greatly in function, the mechanisms which monitor the faults in such modules naturally vary as well. For example, communication channels or storage modules (memories, disks etc.) can be monitored for faults using redundancy-based schemes, and computational units such as ALUs (arithmetical-logical units) can be monitored using the modulo-3 arithmetic etc. When very high reliability is desired, it is of course possible to duplicate the entire module and to compare the results of the function of the duplicates, but this solution is usually too expensive. While no mechanism can discover all possible faults, various schemes exist to discover large classes of faults, and it is a general rule that the monitoring system grows in complexity as its capability to discover faults grows (the duplication scheme being an extreme example). Other cases that exemplify the increase of fault coverage at the expense of growing complexity are the so-called Hamming (n,m) codes, in which a binary m-vector is multiplied by an n×m matrix; and as is well known to those skilled in the art of error-correcting codes, the ability to detect (and correct) failures grows with the size of the matrix. Error-correcting and error-correcting codes are described in The Theory of Error-Correcting Codes, by F. J. McWilliams and N. J. A. Sloane, North-Holland Mathematical Library, 1977.
The current invention pertains to the detection of faults in queues, or FIFOs. FIFOs are ubiquitous in modern digital hardware systems and fulfill a variety of functions. They are typically used as synchronizing interfaces between modules which do not operate at the same rate. As such, they are particularly prevalent in SoC (“systems on a chip”) designs, in which large collection of heterogeneous hardware modules are “glued” together by the system's integrators to provide the final integrated circuits. These modules, which typically originate from different suppliers, often work asynchronously, which means that a module producing data may do so at a rate that is higher than the ability of another module to consume it, and so a holding buffer must be interposed between the two modules; and, if the consuming module needs the data in the order in which it was generated by the producer, the buffer must be of a FIFO type. Specialized versions of FIFOs, such as pipelines, are the mainstay of modern central processing units of microprocessors. The implementation of FIFOs also varies very widely, ranging from fully static (in which the data, once inserted into the FIFO, never moves) to fully dynamic (where the data circulates through the system upon every insertion and deletion) and many other variants.