Data types and applications may be classified along two vectors: file versus structured data, and fixed versus dynamic data. Structured dynamic data is typically the data created by relational databases and online transaction processing applications. These applications often run on large servers with either direct attached storage (DAS) disk arrays or storage area network (SAN) disk arrays to store data. Requirements for this type of data may include high throughput, transaction performance and availability, which may be adequately provided by DAS or SAN solutions.
Unstructured dynamic data is usually created by departmental file sharing applications, such as office documents and computer-aided design (CAD). This data has been supported by a variety of storage architectures. Some IT departments, however, are moving to network-attached storage (NAS) systems because they may be easy to deploy, support heterogeneous clients and include advanced data protection features such as snapshot capabilities.
Structured static data created by append-only applications is a relatively new type of data previously served by mainframe and storage archive management solutions. The increasing amount of data associated with digital supply chain, enterprise resource planning (ERP), and radio frequency ID (RFID) applications, however, is creating an opportunity for new approaches to storage.
Unstructured static data (or fixed content), such as digital repositories, medical imaging, broadcast, compliance, media, Internet archives and dark archives, e.g., data that is retained for legal reasons and may be retrieved in the event of a legal dispute, represents a category with fast growing storage requirements. Fixed content may share the long-term storage requirement of structured static data but generally does not change and may require the ability to access the data quickly and frequently, e.g., random reads. Near-line tape may be widely deployed to reduce costs, but are limited in performance and reliability. Large-scale commodity RAID arrays may be economically attractive, unless, for example, added complexity, lack of reliability and limited scalability become evident.
Some characteristics of fixed content, e.g., non-changing data that may be stored long term, yet be quickly and readily accessible, may yield a number of requirements for storage systems:    Data and metadata—to optimally locate and retrieve stored data, the storage system may have the capability to store the files as well as system and user-defined metadata. Metadata describes the content and the attributes for locating the data within the storage system. The system may be optimized to manage different kinds of metadata and queries against the metadata, as well as store data, requiring a separate database. In addition, metadata may be managed so that it may be leveraged to support new data services such as complex queries and data validation.    Data integrity—once written, the data may be protected from accidental damage or intentional tampering and provide assurances that the data is not corrupted or suffering from bit rot over its stored life.    Reliability—the storage system may manage data reliably over the entire lifetime of the data. This may be applicable for large fixed content as these files may be so large that backup and recovery from tape may be too time consuming and thus impractical.    Scalability—the storage system may have the capability to scale from entry-level systems of a few TB to large multi-petabyte (PB) collections, without having to remove and rebuild the data on larger systems. Scalability may be non-disruptive and seamless.    Open architecture—a requirement to archive data for years to decades may mean the data will outlive a number of generations of hardware and software. Open standards may be used to ensure data can migrate across technology generations.
Several conventional technologies may be used in an attempt to meet the above requirements: (i) a DAS or SAN with an application server and metadata database; (ii) a NAS with an application server and metadata database; or (iii) a tape or optical disk with an application server and metadata database. The application servers and metadata databases are used to manage the content-specific functions such as indexing and searching the content.
Conventional solutions may not meet the requirements for unstructured static data. Static unstructured data applications may consume vast amounts of storage, and scale steadily over time. Scaling may create complexity and manageability issues because the process often requires new file systems to be created, clients to be redirected and data to be manually redistributed. NAS systems may be unsuitable because they may be difficult to scale due to the complexity of managing multiple file system mounts.
Another issue with current disk-based solutions may be reliability. For example, large-scale disk-based systems with RAID 5 may not be adequate because of the increasing risk of dual drive failures in a RAID 5 array. Furthermore, in large fixed content systems, the data may not be frequently backed up simply because it is too large. An alternative is to tolerate permanent data loss or implement mirroring schemes that decrease the density and increase the overall cost of a solution.
Tape and optical disks systems are intended for long-term storage, but the access time may be too slow for fixed content application requirements. In addition, fixed content applications may require the data to be always available and on-line, which may reduce the durability of tape or optical disks.
The above solutions may require custom integration of many discrete hardware and software components, as well as application development. Custom solutions may also contribute to complexity and increase management and service costs that surpass the expense of acquiring the technology. The added complexity of databases, application servers, NAS, SAN, volume managers, high availability and hierarchical storage management (HSM) may combine to make these solutions inefficient.