The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for intelligent selection of replication node for file data blocks in general parallel file system, share nothing cluster.
A share nothing (SN) architecture is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage. Share nothing architecture is popular for Web development because of its scalability. A pure SN system can scale indefinitely simply by adding nodes in the form of inexpensive computers, because there is no single bottleneck to slow the system down. Share nothing architectures have become prevalent in the data warehousing space.
General parallel file system (GPFS) share nothing cluster (SNC) is a cluster file system for analytics and clouds designed to address latency issues with traditional storage area networks (SANs). GPFS is extended to provide high availability through advanced clustering technologies, dynamic file system management, and advanced data replication techniques. GPFS-SNC boasts twice the performance of competing architectures, supports portable operating system interface (POSIX) for backwards compatibility, and includes advanced storage features such as caching, replication, backup and recovery, and wide area replication for disaster recovery.
GPFS has the following key advantages:
Clustering: thousands of nodes, fast, reliable communication, common admin domain.
Shared disks: all data and metadata on disk accessible from any node, coordinated by distributed lock service.
Parallelization: data and metadata flow to/from all nodes from/to all disks in parallel; files striped across all disks.