File systems in data migration systems are used to store and organize computer data as electronic files. File systems store (“write”) files on storage media, and are configured to provide the function of accessing data files, i.e., finding, reading, and deleting, of the stored files, as needed.
The process of file creation in data storage systems has essentially remained unchanged since the beginning of the computer file system technology. In the conventional file system, each file has an internal address held in a File Allocation Table (FAT). The internal address of a file is referenced to an external address structure consisting of readable file names called a Vnode table.
File creation in the typical file systems usually starts with an assignment of a file name (or “handle”) that is locked when it is created, so that another user cannot access that name at the same time. The “handle” (which is also called a Vnode) is assigned a structure (which is called an Inode), and the Inode is placed in the FAT while block addresses are gathered to place the data on the storage medium.
The filing process usually assumes grouping of unused storage blocks, i.e., the unused storage blocks are assembled into extent lists indicating where the data starts and how many blocks it occupies. Multiple extent lists are gathered behind an Inode until an “end of file” designator is reached which is indicative of the file completion resulting in the termination of the transaction.
The conclusion of the filing process usually requires acknowledgement of the operation to the host computer and release of “locks” on both the Vnode and Inode. The entire filing process is largely serial with each step sequentially following the preceding step until the file is written and the filing process is completed.
A technique for a parallel file creation has been developed, which, for example, is described in U.S. Pat. No. 9,116,819 teaching a method of writing data from a compute cluster client in a “write anywhere” technique, where the “WRITE” operation—goes to any available storage node or a multiple of storage nodes based upon nodes availability. Other parallel file systems are described in U.S. Pat. No. 9,152,649 (which teaches a method of ordering the data for easy reading), and in U.S. Pat. No. 9,213,489 (which teaches a method of finding storage locations through a metadata hash process).
U.S. Pat. Nos. 9,116,819, 9,152,649, and 9,213,489 generally teach that a file creation does not have to be a fully serial process which is prevalent in existing file systems, but rather a parallel process since the data can be written to multiple nodes effectively at the same time.
A “Distributed Hash Table” (DHT) is used in these parallel filing systems for data retention. The DHT constitutes a temporary storage mechanism to greatly decrease the latency required to migrate data from a compute cluster to a storage device. However, persistent storage of the DHT data is generally enabled by a mechanism which migrates data to a traditional file system through a gateway node by using a serial process of Inode creation as described supra.
Traditional file systems require at least three layers of software constructs to execute any file operation, and data is stored in available blocks which are gathered and apportioned based on availability at the time the data is written. Since traditional file systems allow file amendment by multiple users, they must maintain complex lock structures with open and close semantics. These lock structures must be distributed coherently to all of the servers used for data access. Since data is placed based on random block availability, the traditional file systems are fragmented. This is especially true in environments where the data is unstructured, and it is not uncommon to write widely varied file sizes. Using a traditional file system designed for amendable data to store immutable data constitutes an inappropriate and wasteful use of bandwidth and computer sources. This wasteful practice results in the requirement for a great deal of additional hardware and network resources to achieve data distribution goals.
It would be highly desirable to provide a data storage system and method utilizing a fully parallel data migration process including parallel migration between the DHT and a persistent storage.