Since computer memory is limited, it is not possible to store numbers with infinite precision, no matter whether the numbers use binary fractions or decimal fractions. At some point a number has to be cut off or rounded off to be represented in a computer memory.
How a number is represented in memory is dependent upon how much accuracy is desired from the representation. Generally, a single fixed way of representing numbers with binary bits is unsuitable for the varied applications where those numbers are used. A physicist needs to use numbers that represent the speed of light (about 300000000) as well as numbers that represent the Newton's gravitational constant (about 0.0000000000667), possibly together in some application.
To satisfy different types of applications and their respective needs for accuracy, a general-purpose number format has to be designed so that the format can provide accuracy for numbers at very different magnitudes. However, only relative accuracy is needed. For this reason, a fixed format of bits for representing numbers is not very useful. Floating point representation solves this problem.
A floating point representation resolves a given number into three main parts—(i) A significand that contains the number's digits, (ii) An exponent that sets the location where the decimal (or binary) point is placed relative to the beginning of the significand. Negative exponents represent numbers that are very small (i.e. close to zero), and (iii) a sign (positive or negative) associated with the number.
A floating point unit (FPU) is a processor or part of a processor, implemented as a hardware circuit, that performs floating point calculations. While early FPUs were standalone processors, most are now integrated inside a computer's CPU. Integrated FPUs in modern CPUs are very complex, since they perform high-precision floating point computations while ensuring compliance with the rules governing these computations, as set forth in IEEE floating point standards (IEEE 754).
Deep learning neural networks, also referred to as Deep Neural Networks (DNN) are a type of neural networks. The configuring and training of DNNs is computation intensive. Over the course of the training of a DNN, many floating point computations have to be performed at each iteration, or cycle, of training. A DNN can include thousands if not millions of nodes. The number of floating point computations required in the training of a DNN scales exponentially with the number of nodes in the DNN. Furthermore, different floating point computations in the DNN training may potentially have to be precise to different numbers of decimal places.
Machine learning workloads tend to be computationally demanding. Training algorithms for popular deep learning benchmarks take weeks to converge on systems comprised of multiple processors. Specialized accelerators that can provide large throughput density for floating point computations, both in terms of area (computation throughput per square millimeter of processor space) and power (computation throughput per watt of electrical power consumed), are critical metrics for future deep learning systems.