This invention relates to high performance portable convolutional neural network library on GP-GPUs.
GPU-based clusters are increasingly being deployed in workstations or in HPC environments to accelerate a variety of software applications. GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a general purpose processor (GP-GPU) to accelerate scientific, analytics, engineering, consumer, and enterprise applications. GPU accelerators now power energy-efficient datacenters in government labs, universities, enterprises, and small-and-medium businesses around the world. GPUs are accelerating applications in platforms ranging from cars, to mobile phones and tablets, to drones and robots.
GP-GPU-accelerated computing offers unprecedented application performance by offloading compute-intensive portions of the application to the GPU, while the remainder of the code still runs on the general purpose CPU. From a user's perspective, applications simply run significantly faster. A simple way to understand the difference between a CPU and GPU is to compare how they process tasks. A CPU consists of a few cores optimized for sequential serial processing while a GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously.
Coprocessor-based clusters are those whose nodes have many-core-based coprocessors such as the NVIDIA Graphical Processing Unit (GPU) or the Intel Many Integrated Core (MIC). The coprocessor itself can be a generic concept, not necessarily a “multicore”/“manycore” processor but any processing element that can execute portions of the computation. Such a “coprocessor” can be an FPGA (specialized/customizable computation unit), a standalone processor like IBM Cell, a GPU, a Intel MIC, or any other many core processors. The coprocessor may or may not be connected by a PCI bus; instead, it can be connected by many different types of interconnect. For example, the coprocessor can be on the same chip as the main CPU (such as the AMD Fusion or IBM Cell), or connected by a bus (PCI/PCIe bus).