The reliability of multinode storage systems using clustered data storage nodes has been a subject matter of both academic interests and practical significance. Among various multinode storage systems, brick storage solutions using “smart bricks” connected with a network such as Local Area Network (LAN) face a particular reliability challenge. One reason for this is because inexpensive commodity disks used in brick storage systems are typically more prone to permanent failures. Additionally, disk failures are far more frequent in large systems.
A “smart brick” or simply “brick” is essentially a stripped down computing device such as a personal computer (PC) with a processor, memory, network card, and a large disk for data storage. The smart-brick solution is cost-effective and can be scaled up to thousands of bricks. Large scale brick storage fits the requirement for storing reference data (data that are rarely changed but need to be stored for a long period of time) particularly well. As more and more information being digitized, the storage demand for documents, images, audios, videos and other reference data will soon become the dominant storage requirement for enterprises and large internet services, making brick storage systems an increasingly attractive alternative to generally more expensive Storage Area Network (SAN) solutions.
To guard against permanent loss of data, replication is often employed. The theory is that if one or more, but not all, replicas are lost due to disk failures, the remaining replicas will still be available for use to regenerate new replicas and maintain the same level of reliability. New bricks may also be added to replace failed bricks and data may be migrated from old bricks to new bricks to keep global balance among bricks. The process of regenerating lost replicas after brick failures is referred to as data repair, and the process of migrating data to the new replacement bricks is referred to as data rebalance. These two processes are the primary maintenance operations involved in a multinode storage system such as a brick storage system.
The reliability of brick storage system is influenced by many parameters and policies embedded in the above two processes. What complicates the analysis is the fact that those factors can have mutual dependencies. For instance, cheaper disks (e.g. SATA vs. SCSI) are less reliable but give more headroom of using more replicas. Larger replication degree in turn demands more switch bandwidth. Yet, a carefully designed replication strategy could avoid the burst traffic by proactively creating replicas in the background. Efficient bandwidth utilization depends on both the given (i.e. switch hierarchy) and the design (i.e. placement strategy). Object size also turns out to be a non-trivial parameter. Moreover, faster failure detection and faster replacement of failed bricks can provide better data reliability, but they incur increased system cost and operation cost.
There is therefore a need for an optimized storage system configuration having system parameters and design stratgies to balance the tradeoffs between cost, performance, and reliability.