The present invention relates to methods and systems for storing data, and more particularly, to cost-effective methods and systems for storage and retrieval of a large amount of data, e.g., in a range of tens to hundreds of Terabytes.
The volume of data generated by business processes in a variety of organizations is increasing exponentially with time. Most industrial and business processes are far more efficient in generating digital data than in utilizing it. As a result, the demand for long-term data storage and back-up is growing rapidly. Currently, large scale data warehousing is typically implemented by employing tape media, which suffer from long access latency, namely, the time required for loading the tape and other associated access times. In addition, robotic tape systems are bulky and expensive to maintain.
Since the latency period for access to database items located in a tape archive is typically on the scale of tens to hundreds of seconds, a system overload frequently arises when a database search requires access to data located on many or all of the tapes in a library. Improving robotic tape storage access presents a challenging problem. Even with multiple arms and tape drives, access within each tape remains serial with few opportunities for speeding up access to data. Software approaches that streamline tape access by clustering and de-clustering multiple accesses are known. These approaches can improve performance of Petabyte tape libraries that include several hundred Terabytes of disk cache. These approaches, however, can not eliminate the fundamental limitations arising from tape access latency.
Magnetic disk storage currently available presents an alternative to tape. Current commodity disk drive units are only marginally more costly than tape media and will be less costly within a few years, if current trends continue.
However, disk-based systems having very large storage capacities, for example, hundreds of Terabytes, are very costly, and offer short retention life in comparison to tape. Redundant Arrays of Inexpensive Disks (RAID) include a small number of disk drives, and an interface that presents these drives as a single large disk to a user while protecting data loss in case of failure of any of the disks. Current RAID systems have a maximum storage capacity of approximately a Terabyte, and are optimized for random access speed.
A storage area network (SAN) provides a practical approach for combining many RAID modules to obtain high storage capacity, for example, tens of Terabytes, albeit at high cost. Networked Attached Storage (NAS) devices provide another alternative for high capacity disk storage. A NAS cluster relies on the scalability of networks in a file server topology to provide high storage capacity. However, similar to SAN, NAS devices can also be costly.
Temperature management can be important in high density data storage systems. For example, poor thermal characteristics in a disk array system can reduce its reliability and lifetime. Operating a drive above its nominal temperature (e.g., over the life of a drive) can result in a higher failure rate. Further, excessive temperature can create or exacerbate problems with many components in a disk drive, from the actuation arm and spindle motor to the bearings.
Accordingly, there is a need for cost effective methods and systems for high speed, and high capacity storage of data. There is also a need for enhanced thermal management of high capacity data storage systems.