Neural network technology is used to perform complex tasks such as reading comprehension, language translation, or speech recognition. Although neural networks can perform such tasks, they are expensive to deploy using general purpose CPUs or general purpose GPUs. In addition, while the GPUs provide increased throughput relative to the CPUs, they have poor latency.