1. Field of the Invention
The present invention relates generally to Single Instruction Multiple Data (“SIMD”) and Single Instruction Multiple Thread (“SIMT”) architectures and, in particular, processing of neural networks using a SIMD/SIMT architecture.
2. Description of the Background Art
Improving resource utilization of algorithms for SIMD architectures requires careful consideration of memory access patterns. In order to maintain a high level of parallelism, each resource has to maintain uniform reads and writes at any given instant. Additionally, conditional operations based on the content of memory need to be structured in a manner that reduces the divergence of processes or threads that would result in serialization of operations.
Divergence in a SIMD/SIMT architecture refers to the divergence of two or more processors or threads. Divergence occurs when a set of processors or threads executing simultaneously encounters an instruction and/or data which causes a subset of threads to branch into a different execution path than the other threads. As a result, the threads are no longer synchronized (i.e., executing the same instructions) and can therefore not be run simultaneously on a SIMD/SIMT architecture. These considerations should be kept in mind when parallelizing operations associated with neural network processing.
Accordingly, what is desired is an efficient memory structure for maintaining high parallelization of neural network operations and improving utilization of compute resources in a SIMD architecture.