1. Field of Invention
The present invention relates in general to the digital data processing field and, in particular, to block data storage (i.e., data storage organized and accessed via blocks of fixed size). More particularly, the present invention relates to a mechanism for enhancing data storage performance (e.g., data access speed, power consumption, and/or cost) through the utilization of a write activity level metric recorded in high performance block storage metadata.
2. Background Art
In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises at least one central processing unit (CPU) and supporting hardware, such as communications buses and memory, necessary to store, retrieve and transfer information. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU or CPUs are the heart of the system. They execute the instructions which comprise a computer program and direct the operation of the other system components.
The overall speed of a computer system is typically improved by increasing parallelism, and specifically, by employing multiple CPUs (also referred to as processors). The modest cost of individual processors packaged on integrated circuit chips has made multiprocessor systems practical, although such multiple processors add more layers of complexity to a system.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Sophisticated software at multiple levels directs a computer to perform massive numbers of these simple operations, enabling the computer to perform complex tasks. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, using software having enhanced function, along with faster hardware.
Computer systems are designed to read and store large amounts of data. A computer system will typically employ several types of storage devices, each used to store particular kinds of data for particular computational purposes. Electronic devices in general may use programmable read-only memory (PROM), random access memory (RAM), flash memory, magnetic tape or optical disks as storage medium components, but many electronic devices, especially computer systems, store data in a direct access storage device (DASD) such as a hard disk drive (HDD).
Although such data storage is not limited to a particular direct access storage device, one will be described by way of example. Computer systems typically store data on disks of a hard disk drive (HDD). A hard disk drive is commonly referred to as a hard drive, disk drive, or direct access storage device (DASD). A hard disk drive is a non-volatile storage device that stores digitally encoded data on one or more rapidly rotating disks (also referred to as platters) with magnetic surfaces. A hard disk drive typically includes one or more circular magnetic disks as the storage media which are mounted on a spindle. The disks are spaced apart so that the separated disks do not touch each other. The spindle is attached to a motor which rotates the spindle and the disks, normally at a relatively high revolution rate, e.g., 4200, 5400 or 7200 rpm. A disk controller activates the motor and controls the read and write processes.
One or more hard disk drives may be enclosed in the computer system itself, or may be enclosed in a storage subsystem that is operatively connected with the computer system. A modern mainframe computer typically utilizes one or more storage subsystems with large disk arrays that provide efficient and reliable access to large volumes of data. Examples of such storage subsystems include network attached storage (NAS) systems and storage area network (SAN) systems. Disk arrays are typically provided with cache memory and advanced functionality such as RAID (redundant array of independent disks) schemes and virtualization.
Various schemes have been proposed to optimize data storage performance (e.g., data access speed, power consumption, and/or cost) of hard disk drives based on data-related factors such as the type of data being stored or retrieved, and whether or not the data is accessed on a relatively frequent basis.
U.S. Pat. No. 6,400,892, issued Jun. 4, 2002 to Gordon J. Smith, entitled “Adaptive Disk Drive Operation”, discloses a scheme for adaptively controlling the operating speed of a disk drive when storing or retrieving data and choosing a disk location for storing the data. The choice of speed and disk location are based on the type of data being stored or retrieved. In storing data on a storage device (e.g., a disk drive), it is determined what type of data is to be stored, distinguishing between normal data and slow data, such as audio data or text messages. Slow data is data which can be used effectively when retrieved at a relatively low storage medium speed. Slow data is further assigned to be stored at a predetermined location on the storage medium selected to avoid reliability problems due to the slower medium speed. Storing and retrieving such data at a slower medium speed from the assigned location increases drive efficiency by conserving power without compromising storage device reliability. An electrical device, such as a host computer and/or a disk drive controller, receives/collects data and determines the type of data which has been received/collected. While this scheme purports to increase drive efficiency through the determination of the type of data which is to be received/collected, it does not utilize a write activity level metric.
U.S. Pat. No. 5,490,248, issued Feb. 6, 1996 to Asit Dan et al., entitled “Disk Array System Having Special Parity Groups for Data Blocks With High Update Activity”, discloses a digital storage disk array system in which parity blocks are created and stored in order to be able to recover lost data blocks in the event of a failure of a disk. High-activity groups are created for data blocks having high write activity and low-activity parity groups are created for data blocks not having high write activity. High activity parity blocks formed from the high-activity data blocks are then stored in a buffer memory of a controller rather than on the disks in order to reduce the number of disk accesses during updating. An LRU stack is used to keep track of the most recently updated data blocks, including both high-activity data blocks that are kept in buffer memory and warm-activity data blocks that have the potential of becoming hot in the future. A hash table is used to keep the various information associated with each data block that is required either for the identification of hot data blocks or for the maintenance of special parity groups. This scheme has several disadvantages. First, the information in the LRU stack and hash table may be lost when power is removed unless this information is stored in nonvolatile memory. Secondly, while the number of special parity groups is small and can be managed by a table-lookup, no write activity information is available with respect to the vast majority of the data blocks. Finally, although the disk array subsystem manages the special parity groups through table-lookups, the information in the LRU stack and the hash table is not available to the host computer.
U.S. Patent Application Publication No. 2008/0005475, published Jan. 3, 2008 to Clark E. Lubbers et al., entitled “Hot Data Zones”, discloses a method and apparatus directed to the adaptive arrangement of frequently accessed data sets in hot data zones in a storage array. A virtual hot space is formed to store frequently accessed data. The virtual hot space comprises at least one hot data zone which extends across storage media of a plurality of arrayed storage devices over a selected seek range less than an overall radial width of the media. The frequently accessed data are stored to the hot data zone(s) in response to a host level request, such as from a host level operating system (OS) or by a user which identifies the data as frequently accessed data. Alternatively, or additionally, access statistics are accumulated and frequently accessed data are migrated to the hot data zone(s) in relation thereto. Lower accessed data sets are further preferably migrated from the hot data zone(s) to another location of the media. For example, the system can be configured to provide indications to the host that data identified at the host level as hot data are being infrequently accessed, along with a request for permission from the host to migrate said data out of the hot data zone. Cached data are managed by a cache manager using a data structure referred to as a stripe data descriptor (SDD). Each SDD holds data concerning recent and current accesses to the data with which it is associated. SDD variables include access history, last offset, last block, timestamp (time of day, TOD), RAID level employed, stream parameters and speculative data status. A storage manager operates in conjunction with the cache manager to assess access history trends. This scheme has several disadvantages. First, the access statistics would be lost when power is removed from the storage manager unless the access statistics are stored in nonvolatile memory. Secondly, access history statistics accumulated on an on-going basis for all of the data would occupy an inordinate amount of memory space. On the other hand, if the access statistics are accumulated for only a selected period of time, access statistics would not be available with respect to any data not accessed during the selected period of time.
Therefore, a need exists for an enhanced mechanism for improving data storage performance (e.g., data access speed, power consumption, and/or cost) through the utilization of a write activity level metric recorded in high performance block storage metadata.
A brief discussion of data structures for a conventional sequence or “page” of fixed-size blocks is now presented to provide background information helpful in understanding the present invention. FIG. 1 is a schematic diagram illustrating an example data structure for a conventional sequence 100 of fixed-size blocks 102 (e.g., 512 bytes) that together define a page. Typically, for performance reasons no metadata is associated with any particular one of the blocks 102 or the page 100 unless such metadata is written within the blocks 102 by an application. Metadata is information describing, or instructions regarding, the associated data blocks. Although there has been recognition in the digital data processing field of the need for high performance block storage metadata to enable new applications, such as data integrity protection, attempts to address this need have achieved mixed success. One notable attempt to address this need for high performance block storage metadata is the T10 End-to-End Data Protection architecture.
The T10 End-to-End (ETE) Data Protection architecture is described in various documents of the T10 technical committee of the InterNational Committee for Information Technology Standards (INCITS), such as T10/03-110r0, T10/03-111r0 and T10/03-176r0. As discussed in more detail below, two important drawbacks of the current T10 ETE Data Protection architecture are: 1) no protection is provided against “stale data”; and 2) very limited space is provided for metadata.
FIG. 2 is a schematic diagram illustrating an example data structure for a conventional sequence 200 (referred to as a “page”) of fixed-size blocks 202 in accordance with the current T10 ETE Data Protection architecture. Each fixed-size block 202 includes a data block 210 (e.g., 512 bytes) and a T10 footer 212 (8 bytes). Each T10 footer 212 consists of three fields, i.e., a Ref Tag field 220 (4 bytes), a Meta Tag field 222 (2 bytes), and a Guard field 224 (2 bytes). The Ref Tag field 220 is a four byte value that holds information identifying within some context the particular data block 210 with which that particular Ref Tag field 220 is associated. Typically, the first transmitted Ref Tag field 220 contains the least significant four bytes of the logical block address (LBA) field of the command associated with the data being transmitted. During a multi-block operation, each subsequent Ref Tag field 220 is incremented by one. The Meta Tag field 222 is a two byte value that is typically held fixed within the context of a single command. The Meta Tag field 222 is generally only meaningful to an application. For example, the Meta Tag field 222 may be a value indicating a logical unit number in a Redundant Array of Inexpensive/Independent Disks (RAID) system. The Guard field 224 is a two byte value computed using the data block 210 with which that particular Guard field 224 is associated. Typically, the Guard field 224 contains the cyclic redundancy check (CRC) of the contents of the data block 210 or, alternatively, may be checksum-based.
It is important to note that under the current T10 ETE Data Protection architecture, metadata is associated with a particular data block 202 but not the page 200. The T10 metadata that is provided under this approach has limited usefulness. The important drawbacks of the current T10 ETE Data Protection architecture mentioned above [i.e., 1) no protection against “stale data”; and 2) very limited space for metadata] find their origin in the limited usefulness of the metadata that is provided under this scheme. First, the current T10 approach allows only 2 bytes (i.e., counting only the Meta Tag field 222) or, at best, a maximum of 6 bytes (i.e., counting both the Ref Tag field 220 and the Meta Tag field 222) for general purpose metadata space, which is not sufficient for general purposes. Second, the current T10 approach does not protect against a form of data corruption known as “stale data”, which is the previous data in a block after data written over that block was lost, e.g., in transit, from write cache, etc. Since the T10 metadata is within the footer 210, stale data appears valid and is therefore undetectable as corrupted.