The present disclosure relates to digital circuits, and more particularly, to a tininess prediction and handler engine for smooth handling of numeric underflow in floating point units.
Floating-point units (FPUs) are designed to perform various mathematical operations on floating-point numbers. For example, FPUs can include a floating point adder, multiplier, multiply-accumulate function, and so forth. FPUs need to deal with a problem of numeric underflow. Numeric underflow occurs when there is a result that is tiny enough to not fit within a normal number range as defined by, for example, a standard such as IEEE754. When numeric underflow is detected, additional steps are usually needed to compute the final result. The detection of underflow also happens relatively late in a typical computational pipeline, thereby making it a challenge to handle such cases efficiently.
The additional steps that are needed to handle underflow involve denormalizing the intermediate result, rounding that value, and in some micro-architectures, further re-normalization. Cumulatively, these steps require logic of fairly significant depth including one or more stages in a high-frequency pipeline. When tininess is detected, further processing is needed, for example, to perform the denormalization and rounding. Typically, tininess can only be accurately determined very late in the pipeline.
Dependent operations usually have already been scheduled or are likely already in-flight when the tininess problem is detected. Conventionally, such dependent operations cannot start execution in a normal fashion as they have to wait for the producer to finish its extra computation steps. Further exacerbating this problem is that dependents of dependents (i.e., grandchild dependent operations) may also be in the process of being scheduled.
One approach to addressing the tininess problem is to flush the machine of all operations younger than the tiny-generating operation. But this approach is expensive in terms of performance. Another is to identify and kill only the children and any grandchildren operations, although such an approach is non-trivial in terms of scheduling complexity. Yet another approach is to have the data path handle subnormal (i.e., problem) operations in-pipe. The problem with this approach is that extra gate depth is needed, which affects all numeric cases—not just the tiny-generating cases.