Storage information in a high performance computing environment presents certain challenges and requires data storage architecture and data migration procedures permitting a high level of efficiency and fault tolerance for the data migrating between the high performance computers and long-term (or permanent) data storage.
Data storage architectures handling high performance computations have been developed, including those described in U.S. Patent Application Publication No. 2014/0108723, filed as a Ser. No. 14/056,265, directed to “Reducing Metadata in a Write-Anywhere Storage Sub-System”; U.S. Patent Application Publication #2014/0108473, filed as a Ser. No. 14/050,156, directed to “Maintaining Order and Fault-Tolerance in a Distributed Hash Table System”; U.S. Patent Application Publication #2014/0108863, filed as a Ser. No. 14/035,913, describing “Handling Failed Transaction Peers in a Distributed Hash Table”; U.S. Patent Application Publication #2014/0108707, filed as a Ser. No. 14/028,292, related to “Data Storage Architecture and System for High Performance Computing”; and patent application Ser. No. 14/045,170, directed to “Method and System for Data Transfer between Compute Clusters and File System”.
All these architectures use distributed data storage and a specific addressing system capable of pointing a request (when a file access is requested) to a particular location (or locations) within a group of distributed memories.
For example, a data storage architecture and system for high performance computing (described in the U.S. patent application Ser. No. 14/028,292, filed on 16 Sep. 2013) includes an intermediate storage tier interconnected between a super computer and a primary storage to temporarily store data from the compute nodes of the super computer in the intermediate storage tier.
The intermediate storage is built with Non-Volatile Memory (NVM) units which store data items generated by the compute nodes. The intermediate storage employs Input/Output (I/O) nodes to maintain information on the data items residency in the Non-Volatile Memory units via a hash table distributed among the I/O nodes. The use of a Distributed Hash Table (DHT) allows for quick access to data items stored in the Non-Volatile Memory units.
Although mentioning the possibility of storing Parity Group Information (PGI) in the DHT, neither the Parity Groups migration process between the high performance compute nodes and the permanent storage, nor creation and distribution of the Parity Group Information (PGI) descriptors for non-deterministic data addressing, nor reclamation process supported by the PGI descriptors for data migration from the intermediate storage to the backing file system have been addressed in the prior data storage architectures.