The present invention relates generally to a method for the allocation of data on physical media by a file system which optimizes power consumption.
The operation of computers are very well known in the art. Such a file system exists on a computer or across multiple computers, where each computer typically includes data storage, such as a hard disk or disk(s), random access memory (RAM) and an operating system for executing software code. Software code is typically executed to carry out the purpose of the computer. As part of the execution of the computer code, storage space on the hard disk or disks and RAM are commonly used. Also, data can be stored, either permanently or temporarily on the hard disk or disks and in RAM. The structure and operation of computers are so well known in the art that they need not be discussed in further detail herein.
In the field of computers and computing, file systems are also very well known in the art to enable the storage of such data as part of the use of the computer. A computer file system is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. File systems may use data storage devices such as a hard disks or CD-ROMs and involve maintaining the physical location of the files, and they might provide access to data by the computer operating system or on a file server by acting as clients for a network protocol (e.g., NFS, SMB, or 9P clients). Also, they may be virtual and exist only as an access method for virtual data.
More formally, a file system is a special-purpose database for the storage, organization, manipulation, and retrieval of data. This database or table which centralizes the information about which areas belong to files, are free or possibly unusable, and where each file is stored on the disk. To limit the size of the table, disk space is allocated to files in contiguous groups of hardware sectors called clusters. As disk drives have evolved, the maximum number of clusters has dramatically increased, and so the number of bits used to identify each cluster has grown. For example, FAT, and the successive major versions thereof are named after the number of table element bits: 12, 16, and 32. The FAT standard has also been expanded in other ways while preserving backward compatibility with existing software.
File systems are specialized databases which manage information on digital storage media such as magnetic hard drives. Data is organized using an abstraction called a file which consists of related data and information about that data (here after referred to as metadata). Metadata commonly consists of information like date of creation, file type, owner, etc.
The file system provides a name space (or a system) for the unique naming of files. File systems also frequently provide a directory or folder abstraction so that files can be organized in a hierarchical fashion. The abstraction notion of file and folders does not represent the actual physical organization of data on the hard disk only its logical relationships.
Hard disks consist of a contiguous linear array of units of storage referred to as blocks. Blocks are all typically the same size and each has a unique address used by the disk controller to access the contents of the block for reading or writing. File systems translate their logical organization into the physical layer by designating certain address as special or reserved. These blocks, often referred to as super-blocks, contain important information about the file system such as file system version, amount of free space, etc. They also contain or point to other blocks that contain structures which describe directory and file objects.
One of the most important activities performed by the file system is the allocation of these physical blocks to file and directory objects. The algorithm employed to make these decisions is commonly called the allocator, which are implemented in computer code that runs on a computer. The present invention relates specifically to the method used by a computer for the allocator to determine how, where and when to write the new data to free blocks on the physical media within a computer environment.
In the prior art, various types of algorithms are employed. For example, it is well known to use a global allocator algorithm when only one disk is available, such as a single hard disk. In this case, the allocator selects the next free block from the list of free-blocks maintained by the file system. This global allocation system and algorithm works well for storage systems that includes only one single physical disk.
However, today's computer systems, such as servers and data centers, commonly have hundreds or even thousands of physical hard disks for storage that are written to. However, since large storage systems contain more than one physical disk, the decision as to which block to allocate become more complex. In these environments, it is common for file systems to attempt to spread file data out uniformly across all available disks. This is done to optimize performance and to balance the input/output (I/O) load across all devices. While this simple strategy provides a globally optimized system from the standpoint of I/O load, it can cause significant difficulty in power managed storage systems which attempt to reduce power consumption via the deactivation of idle disks. Therefore, known algorithms and allocators are not particularly well suited for these large arrays of disks.
In the prior art, there have been various attempts to address the aforementioned shortcomings in known algorithms and allocators by providing a system that better handles the competing interests of optimizing performance and load with the reduction of power consumption to provide a “greener” overall system.
One such attempt in the prior art is the important trend of providing a “power managed system” that improves the power efficiency of computing devices thereby reducing their indirect emission of greenhouse gasses. One way to do this is to power down the devices when they are not in use. While this appears feasible in theory, it is very difficult if not impossible to carry out in practice, particularly with storage systems because all mainstream file systems employ some type of global allocator which forces all disk drives to become active when ever data is written. Since data is commonly written across many drives, as above, those drives must all be active to enable data to be written to them. Despite this low utilization of a storage devices, I/O bandwidth can still force all drives in the system to become active, thereby defeating the power management of the prior art.
For example, such prior art systems can include a massive array of idle disks, more commonly known as a MAID. A MAID is a system using hundreds to thousands of hard drives for near-line data storage. MAID is typically designed for Write Once, Read Occasionally (WORO) applications. In a MAID each drive is only spun up on demand as needed to access the data stored on that drive. This is not unlike a very large JBOD but with power management.
Compared to a Redundant Array of Independent Disks (RAID) technology, a MAID has increased storage density, and decreased cost, electrical power, and cooling requirements. However, these advantages are at the cost of much increased latency, significantly lower throughput, and decreased redundancy. Most large hard drives are designed for near-continuous spinning; their reliability will suffer if spun up repeatedly to save power.
With the advent of SATA disk drives that are designed to be powered on and off, MAID architecture has evolved into a new storage platform for long term, online storage of persistent data. Large scale disk storage systems based on MAID architectures allow dense packaging of drives and are designed to have only 25% of disks spinning at any one time.
There are many advantages to MAID. These include the ability to avoid 80% of the stored data from being accessed for long period of time which is conducive to large arrays, such as data centers with 2000 drives, or more. Another advantage is 30 KW total power consumption with a total annual drive power consumption of 263,000 KWH which can potentially save about 210,000 KWH per year in this example. Also, MAID is easily scalable up and down.
However, there are a number of shortcomings of a MAID. Such file systems generally expect all drives under management to be spinning. This is exacerbated in modern file systems that spread their data across drives forcing them into a high power state even under light I/O loads. This results in pathological power thrashing, premature drive failure, poor performance and dissatisfied users.
In view of the foregoing, there is a need to provide a method of allocating data on physical media of a computer, such as one with a MAID, that is optimized for a MAID
There is a need for a method of allocating that optimizes power consumption, particularly on a MAID.
There is also a need for a file system that dynamically changes according to the I/O of the computer.
There is a further need to provide a method of allocating data on physical media of a computer that enables devices that are not in use to successfully power down to improve power efficiency.
Yet another need is to provide a method of allocating data on physical media of a computer that results in a “greener” device than prior art devices.
There is another need to provide a method of allocating data on physical media that optimizes MAID, even under normal file sharing loads.