Deep Learning is a class of machine learning algorithms. Deep learning architectures, such as deep neural networks, have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics and drug design.
Inference and training, two tools used for deep learning, are tending towards low-precision arithmetic. Maximizing throughput of deep learning algorithms and computations may assist in meeting the needs of deep learning processors, for example, those performing deep learning in a data center.
Quad virtual neural network instructions (QVNNI) are a type of fused multiply-add (FMA) operation that are useful in a deep learning context. Low-precision QVNNI operations, such as those using 8-bit activations with weights being as low as 2-bits or 4-bits, are expected to lead to sufficient training performance. But traditional CPU and GPU instruction set architectures keep to a 32-bit lane for all operations and require symmetric operands: both inputs having the same precision, which limits the ability to gain a performance advantage by going to 2-bit and 4-bit weights.