The current invention relates to storage systems, and in particular to automated data storage libraries for remote access and archiving and otherwise efficient storage of data.
In the field of data storage systems, a purpose driving innovation is the need for access to information wherein the storage of the data that carries the information is both inexpensive and also has a fast data access time. Units of memory or storage elements have generally addressed the cost concern at the expense of access time, or have addressed access time at the expense of cost. They can be divided into several sets of general categories: serial and random, online vs. nearline vs. offline, fixed and removable. Serial storage elements (like tape media) have historically had significantly more ability to store data at less cost than random access media. Online has generally been identified with fixed storage elements. Removable storage elements have been divided into automated sets (i.e., in robotically controlled libraries) and manual sets. Fixed units of memory address access time concerns and are integrated with a read/write device such that they are directly connected to an associated data processing system at all times. Removable units of memory or storage elements address storage cost and are generally not integrated with read/write capabilities, or with power at all times, instead residing in a storage library (for example). The storage library itself contains a number of read/write drives capable of mounting individual storage elements. The individual storage elements are typically moved from a storage slot to the read/write device by a robotic access device or other means. The current systems also include metadata that allow the storage subsystem to interpret virtual volume requirements and configure physical storage resources such as, for example, groups of storage elements organized as a striping group for performance or as a redundancy group for reliability. There is also provision in current systems to instruct storage systems to carry out operations on the basis of policy. Policy can be stated in terms of events or change of status or an algorithm. An example of an event is to do a backup at midnight every night. An example of an algorithm is to make a copy when either there have been more than 1 megabyte of data written or twenty four hours have elapsed since the oldest write data has been copied
Examples of adapting a magnetic tape cartridge style storage library to contain magnetic disks (e.g., hard disk drives) are known. For example, previous solutions teach a universal data storage element, where a uniform form factor is presented that can accept different storage elements (also known as media). This allows multiple types of media to be stored in an automated storage library without needing to alter the storage library drastically.
As technological advances are made in storage media and devices, the costs of such media and devices generally decrease. This has been the case particularly with respect to devices such as IDE/ATA disk drives. The decreased cost in these kinds of drives has made random access to data possible at a cost much nearer to that of serial access storage media.
In the current application, the different storage slots are referred to in terms of their relationship to a read/write port (such as, for example a device that can accept the storage element, which device is connected to send data to and receive data from a network). The slots are also referred to in terms of whether they are currently supplied with power (i.e., whether the data on the individual storage element in that slot can be accessed without the additional action of connecting the individual storage element to a power supply). Slots that are supplied with power are referred to herein as xe2x80x9chotxe2x80x9d slots, while unpowered slots are herein referred to as xe2x80x9ccoldxe2x80x9d slots.
Additionally, some slots can be used to store data storage elements, while others may be empty. Also some slots may contain storage elements that do not themselves contain data or are not in use by or even known to the management systems that maintain the metadata controls to map the using system requests to physical locations. Slots that are used for storage elements that contain data that has been supplied by and will be accessed by the using systems are called inventory slots. Slots that contain storage elements that are known to the management systems, but that have never been used to store data are called standby. Slots that contain storage elements that are not known to the management systems are called spares. Slots that contain no storage elements are called empty.
Active slots refers to those which are connected to using system(s) in such a fashion as to allow a storage element to be read from or written to through the connection, preferably to a network. The tape devices in a robotically controlled tape library would be examples of active slots. Typically, because of space and cost restrictions, a majority of storage slots in a library are not active slots, but instead are passive slots. Typically, only 0.5% to 5% of a library""s slots are active slots. When a storage element in a passive slot must be accessed, it is typically moved by a robotic manipulator of some kind to an active slot (e.g., a read/write drive).
The expense of a library is increased with each added active slot because the cost of adding read/write mechanisms and communication mechanisms such as cables to the library is prohibitive, both in terms of money and limited space. Therefore, typically most slots are not active, which requires that data storage elements must be moved from their passive slot to an active one for accessing.
Some storage libraries that require fast access to archived data replace the serial access storage elements (such as magnetic tape drives) with random access storage elements (such as a magnetic disk drive, for example, an IDE/ATA drive). In such libraries, if all slots are active slots, it requires every storage slot to be connected to a read/write device with a connection cable.
Current large computing systems make extensive use of disk devices (i.e., random access storage elements) for online storage data. When an installation has thousands of disk drives in use, there are always a significant number of drives requiring some maintenance operation at any one point in time. For example, the large scientific computing centers have 10 to 100 TB of disk configured in racks from 4 GB to 40 GB drives which use thousands of square feet of floor space. At any given moment, an observation of the computer floor area will show a number of drive doors open awaiting the swapping out of individual disk drives. For these installations, a quantity of spare disk drives are kept in a spare parts inventory room on site.
It would therefore be advantageous to have a storage system that allowed remote users random (rather than serial) access to stored data, but which requires little or no human operator intervention for operation and maintenance.
The current application discloses a system and method of maintaining an automated storage library that allows a mixture of serial access and random access storage elements (e.g., tape cartridges, new technology like holographic cubes or MEMS-Probe, and disk drives) to be stored therein. It takes advantage of the emergence of new technologies and the decreasing costs of IDE/ATA disk drives to provide random access to data in a fully automated storage library.
In a preferred embodiment, an automated storage library is built to contain random access storage elements (for example ATA disk drives) or a library design for serial access storage elements is modified so as to contain random access storage elements in addition to the serial access storage elements (for example). Maintenance software and hardware is presented that automatically takes care of maintenance tasks. Some of these are currently performed by human operators, such as ordering copies of data for backup or archive. The storage system can also respond to changes in the requirements a xe2x80x9cusing systemxe2x80x9d communicates to the xe2x80x9cstorage subsystemxe2x80x9d defining a specific virtual volume that is currently employed. For example, the using system could change the reliability index of a set of data and thereby drive the subsystem to change the redundancy mechanism (for example to require multiple backups or mirrors or to require a local and a remote mirror). Another example is to increase or decrease the number of RAID 5 type redundancy such as going from a 4+1 parity scheme to an 8+3. The subsystem is also capable of replacing malfunctioning storage elements.
In one example embodiment, the innovative storage system is designed to automatically remove any malfunctioning drives or storage elements, package them, and convey them outside the storage library itself to a location for easy disposal (if necessary) or other treatment (such as mailing them back to a manufacturer). The defective storage element can be replaced by the storage system""s robot, which fetches a new drive from a storage location, preferably located within the library itself.
In another example embodiment, the storage system""s internal robot (normally used to relocate storage elements within the library, for example, from an inactive slot to an active slot or a read/write drive) is used to reconnect access cables from storage slot to storage slot. The mass of a cable connection is generally significantly less than the mass of a tape cartridge or a disk drive, thus decreasing access times to data. This alleviates the need to move the data storage elements themselves in order to access the data thereon, and reduces the total number of necessary active slots in the library.
In another example embodiment, the innovative storage system is used to reduce the cycle time between a storage element malfunctioning and its replacement. This is accomplished, for example, by maintaining a redundancy mechanism related to the storage elements in a library. This can be done with complex functions like those used to implement random arrays of inexpensive disk (RAID) in what is known as RAID 5. The simplest approach is to use the RAID 1 mechanism, which is a simple copy function. RAID 1 renders a copy of each identified storage element within the library, specifically in an active slot. However, when the storage system is maintaining the mirror, it is preferably created in a slot that is spare or standby rather than part of the main inventory. This is referred to hereinafter as ghost mirroring the storage element. If a storage element fails or its data becomes otherwise inaccessible, the defective storage element is removed, its mirror (or ghost mirror) is connected (i.e., made the principle location in of the main inventory, for example, by using the robot to connect a specific access cable to that storage element""s slot or moving the storage element itself to the known slot, or modifying the metadata to point at the new location). Then its contents are copied to a spare drive kept within the library itself (thus reestablishing the ghost mirror). This spare drive can then be associated with or relocated to the slot previously occupied by the storage element previously being used as the ghost mirror, thereby replacing it.