Many computer-related applications require large amounts of storage capacity. These applications often require more storage space than is available on a single computer. Applications such as mail servers, local mail replicas, workgroup data, and large databases require huge investments in storage capacity. In addition, file duplication further increases demands for storage capacity.
Almost all computers in a network have some unused storage space, but a typical user cannot access this unused storage space readily. Much of the untapped storage space available on a network is on computers primarily intended for stand alone use by an individual. These computers may not be online when needed. Even if one user in a network could access the untapped storage space on a second user's computer, such space would be available as individual storage units, rather than as one large unit. The space available on an individual unit may not be adequate for the storage of large files or data structures.
A need exists to identify untapped storage space in a network and to make the untapped storage space available as one unit. Applications that could take advantage of such a system include file level backup and recovery, archival of electronic mail replicas, and archival of static data.
Data can be stored in any device capable of retaining the data and from which the data can be retrieved. A storage device whose contents are lost when power is cut off provides volatile storage. A storage device whose contents are not lost when power is cut off provides non-volatile storage.
The terms storage and memory are sometimes used loosely as synonyms. In a more precise and useful sense, the term memory pertains to the part of storage in which instructions are executed and excludes auxiliary storage devices such as disks, diskettes, mass storage devices and magnetic tape. The term memory is used primarily for volatile storage in electronic solid state components whereas the term storage is used primarily for storage in magnetic and optical media.
A hard disk means a rigid magnetic disk such as the internal disks used in the system units of personal computers and in external hard disk drives. The term hard disk is also used loosely in the industry for boards and cartridges containing microchips or bubble memory that simulate the operations of a hard disk drive. A hard disk drive means a stand alone disk drive that reads and writes data on rigid disks and can be attached to a port on the system unit.
Engineers build storage systems by taking a storage device, such as a hard disk drive, and adding layers of hardware and software in order to create a highly reliable system. Storage systems include Direct Attached Storage (DAS) and Network Attached Storage (NAS). In “The Evolution of Storage Systems” IBM Systems Journal, Vol. 42, No. 2, 2003, the authors, R. J. T. Morris and B. J. Truskowski, describe how the emergence of low-cost local area data networking has allowed the development of Network-Attached Storage (NAS) and storage area network (SAN) technologies. The authors further describe how block virtualization and SAN file systems are necessary to fully reap the benefits of these technologies.
Client server networks allow distributed data processing where a program on one computer sends a request to a program at another computer and awaits a response. The requesting program is called a client, and the answering program is called a server. Client server networks can share physical storage space; however, the use of the shared space is limited by the availability of the server.
An architecture that avoids dependency on a single server is peer-to-peer, commonly known as P2P. A peer-to-peer network has two or more computers that communicate and share data where each computer uses the same program or type of program. Peer-to-peer networks allow the sharing of resources, including storage, among the members of the network without dependency on a single server. A peer-to-peer network is not dependent on a single server because each computer has the same capabilities as the other computers. Therefore, unlike a client-server network, the computers in a peer-to-peer network can each assume the role of a server computer or a client computer to any of the other computers.
International Business Machine Corporation's Advanced Peer-to-Peer Networking (APPN) is an example of a product that supports peer-to-peer communication and resource sharing. APPN is a group of protocols enabling program-to-program communication within IBM Systems Network Architecture (SNA) network. APPN is an extension to SNA that includes greater distributed network control that isolates the effects of single points of failure, dynamic topology information, dynamic definition of network resources, and automated resource registration and directory lookup.
While a peer-to-peer network avoids dependency on a single server, and allows sharing of physical storage space among the computers in the network, a problem arises when one of the computers having shared storage space goes off line. Therefore, shared distributed physical storage space requires planning for a method to deal with a loss of a portion of the shared space should one or more of the contributing computers go off line.
Methods to deal with a loss of a storage space using redundancy are known. Redundancy can be built into a computer storage system through specialized algorithms that store data in an array of independent disks. For example, Redundant Array of Independent Disks (RAID), the most common algorithm for storing data in a disk drive, maps multiple disk drives into a large, single drive. A RAID drive generally appears as a single disk drive to a user, but files stored in a RAID drive may actually span multiple disks. RAID systems protect data from disk failure by storing data redundantly on disks within the array.
RAID distributes data, along with information used for error correction, among two or more hard disks in order to improve performance and reliability. Parity is an error checking procedure in which the number of 1 s must always be the same—either even or odd—for each group of bits transmitted without error. A parity bit is an extra bit used in checking for errors in groups of data bits transferred within or between computer systems. With personal computers, the term is frequently encountered in modem-to-modem communications, in which parity bit is often used to check the accuracy with which each character is transmitted, and in RAM, where a parity bit is often used to check the accuracy with which each byte is stored. The hard disk array is governed by array management software and a disk controller, which handles the error correction.
Different RAID algorithms provide for various degrees of data redundancy and fault tolerance. For example, RAID-1 maintains a “mirror image” of a disk, but requires a second disk on which to store the mirror image. In theory, the data on the mirror image is always available if the original disk fails or is otherwise unavailable. Additionally, RAID-1 allows a computer to read both disks simultaneously, which effectively doubles the data transfer rate. Thus, RAID-1 is a simple system that provides substantial benefits, but at twice the cost. A RAID-3 configuration stores data on several drives by combining a set of same-size disk partitions on separate disks into a single logical volume that an operating system can recognize as a single drive, a process referred to as “striping.” In addition to storing data on several drives, parity is stored on one drive. A RAID-5 configuration uses striping to place data at block level across several drives and also distributes the parity data on the several drives. A RAID-6 configuration dedicates one drive to storing parity data. Each disk drive in a RAID-6 configuration also contains parity data for itself. A RAID 10 configuration, which may be also referred to as RAID 0+1, uses striping to place data on several drives, and makes a copy of the striped drives for redundancy. The mirroring of the disks in RAID 10 eliminates the need for parity.
International Publication WO 02/089488 entitled “P2P Network Architecture for Distributed Storage” (the '488 publication) discloses the use of distributed mass storage devices, such as hard disk drives, that are partitioned to prevent direct manipulation of the data by the user. A given video program may be stored in segments on various set top boxes, and data is transferred through a router under the control of a head-end control system. Therefore, in the '488 publication, a system is disclosed where a program on a computer may determine where the computer's data resides, and may also contain data that is stored on the computer that is not managed by that computer. Specifically, the '488 publication discloses a peer-to-peer environment where multiple peers may affect the availability and access of content in the peer-to-peer network.
What is needed beyond the prior art is a method for using the unused storage capacity within an enterprise that capitalizes on existing peer-to-peer architecture capabilities and existing RAID technology. A further need exists for a system and method to take advantage of the unused space on network-attached personal computers, notebook computers, and servers by allowing any computer in the system to request access to the unused space, and to control other computers in allocating the space. Additionally, a need exists for a redundant system to use such unused space to account for periodic non-availability of a contributing computer.