Floating point multiplication results in a temporary mantissa which has about 2× more bits of precision than the result can be stored in (which is the original mantissa width). The IEEE floating point specification defines several very specific rounding modes to define how the temporary mantissa should be rounded and then truncated to fit the needed mantissa size for the result.
The rounding calculation in prior art always adds to the critical path in some fashion.