1. Field of the Description
The present description relates to magnetic tape data storage and, in particular, to methods and systems for monitoring operation of a tape-based data storage system including gathering data from tape libraries and including determining and predicting health of media and tape drives and other storage system components so as to provide alerts that indicate media or tape drives that are likely to file during future operations and indicating between a tape/media and a drive which is the more likely to be at fault (“higher suspicion of blame” in monitored problematic operations in a tape library or the like).
2. Relevant Background
For decades, magnetic tape data storage has offered cost and storage density advantages over many other data storage technologies including disk storage. A typical small to large-sized data center will deploy both tape and disk storage to complement each other, with the tape storage often being used for backup and archival data storage. Due to the increased need for securely storing data for long periods of time and due to the low cost of tape, it is likely that tape-based data storage will continue to be utilized and its use will only expand for the foreseeable future.
Briefly, magnetic tape data storage uses digital recording on to magnetic tape to store digital information, and the tape is packaged in cartridges and cassettes (i.e., the storage media or simply “media”). The device that performs writing and reading of data is a tape drive, and mainframe-class tape drives are often installed within robotic tape libraries that may be quite large and hold thousands of cartridges to provide a tremendous amount of data storage (e.g., each tape may hold several terabytes of uncompressed data).
An ongoing challenge, though, for the data storage industry is how to manage and monitor data centers, and, particularly, how to better monitor tape storage media and devices. For example, customers demand that data be safely stored with lower tape administration costs. In this regard, the customers desire solutions that efficiently and proactively manage data center tape operations including solutions that provide failure analysis for problematic or suspect media and drives. Further, customers demand data collection regarding operations to be non-invasive, and the management solution should provide recommended corrective actions. Data storage customers also want their investment in tape technologies to be preserved and data integrity maintained. This may involve monitoring tape capacities in volumes and/or libraries, flagging media to be migrated, and advising on resource rebalancing. Customers also desire a management solution that provides an effective and useful user interface to the collected tape operations data and reporting of detected problems or issues.
Unfortunately, existing tape data storage management solutions and systems have not met all of these needs or even fully addressed customer dissatisfiers. For example, existing management tools typically only collect and report historical data, and it can be very difficult after the fact or after a problem with tape operations occurs to determine whether a particular drive or piece of media was the cause of a failure. This can lead to cartridges or other media being needlessly replaced or a tape drive being removed for repair or even replaced without verification of which component caused a fault. Some systems manage media lifecycles, but this typically only involves tracking the age or overall use of media to provide warnings when a tape or other media is potentially nearing the end of its useful life to allow a customer to remove the media. Existing systems also often only provide alerts after a failure or problem has occurred, e.g., alert when already in a crisis mode of operation. Further, reporting is limited to predefined reports that make assumptions regarding what information likely will be important to a customer and provide the customer with no or little ability to design a report or select data provided to them by the tape operations management system.
The data storage industry's current tape monitoring approach may be categorized as falling within one of three categories, with each having issues or problems limiting their widespread use or adoption. First, tape monitoring may involve a datapath breach approach. Such an approach only works in a storage area network (SAN) environment and it also introduces drive availability risk and exposes data to vendors and/or others. Second, tape monitoring may involve a media vendor-lock in approach, which undesirably results in reporting only being available if the media in a data center or tape library was sourced from a particular vendor. Third, tape monitoring may be limited to a single library within a data center, and this may be undesirable as each library has to launch its own monitoring application and the data is not aggregated for analysis or for reporting to the customer or operator of the data center.
Hence, there remains a need for improved systems and methods (e.g., software products) for providing customers with information to efficiently and timely manage data center tape operations. Preferably, the information would include tape analytics that would allow proactive management of the tape operations rather than merely reactive management based on vendor-selected sets of historic data.