Computational Neural Networks (CNNs) have emerged as powerful tools when employed on large-scale learning problems. In particular, witness recent application of CNNs to important application domains including image recognition, speech recognition and facial recognition.
Contributing to the effective application of CNNs are large and powerful model(s) constructed from large-scale data set(s) and high performance computing platforms including general purpose graphics processing units (GPGPUs) providing teraflop computational capabilities. Notwithstanding contemporary implementation success(es), bottlenecks remain with respect to implementing CNNs on GPUs.
In particular, one such bottleneck encountered when implementing CNNs on GPUs is a memory bandwidth that is stressed due to massive data fetching. Given the importance CNNs and their frequent implementation on GPUs, techniques, methods and structures that enhance their performance on such GPUs would represent a welcome addition to the art.