High Performance Computing (HPC) refers to large scale processing that is often performed by computer clusters. HPC is used to solve complicated problems that involve large amounts of data, often in the petabyte range. As the multiple nodes within the cluster process the data, the output is transferred over Input/Output (I/O) channels to be placed onto a large scale persistent data storage system such as a data warehouse. This data is often stored across several nodes.
Data staging refers to the process in which the movement of data is partly replaced with or supplemented by computations that perform post-processing of the data before moving it across I/O channels to persistent storage. Such post-processing is used to reformat, sort, or compress the data so that it is in a form suitable for more efficient access or data analytics. Due to the high volume of data undergoing post-processing, bottlenecks can occur in data staging process and I/O traffic between the processor and the storage systems.
Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.