1. Field of Invention
The present invention relates to managing the storage of data and particularly to use of a data placement methodology and configuration that results in high throughput of data delivery and access.
2. Description of Related Art
Electronic mail (i.e. email) messages are typically stored on a disk based storage system. In one implementation to ensure redundancy of information, multiple disks can be combined using Redundant Array of Inexpensive Disks (RAID) technology to provide a logical storage device. The logical storage device is managed by a file system that provides the application (e.g. the email application) the ability to organize directories and files in a hierarchical namespace.
In one embodiment, email storage can be arranged as a hierarchy of directories that represent mailboxes. FIG. 1 illustrates an email storage hierarchy 100 that includes multiple directories. Specifically, directory 101 represents Sue's mailbox, directory 102 represents Bob's mailbox, and directory 103 represents Dan's mailbox. In hierarchy 100, the messages are stored in one or more files within these directories. For example, in directory 101, a message from Bob to Sue can be stored in file 111 and a message from Dan to Sue can be stored in file 112.
Additional information regarding the directory and its files, called meta data, can be stored as files or databases within these directories. Thus, in hierarchy 100, directory 101 includes meta data 110. Meta data can include headers (to facilitate searches of messages), states (e.g. whether a message has been read, etc.), protection information (e.g. who is designated to share the mailbox, who can read the mailbox, etc.), flags (e.g. deletion flags), and other bookkeeping information for the mailbox and its messages. The other mailbox directories can store messages and include meta data in a similar manner.
In one embodiment, the throughput of such an email storage can be measured by the number of messages per second that can be delivered and accessed by clients (such as Sue, Bob, and Dan in hierarchy 100) across a network. More specifically, the throughput has a large dependency on the functioning of the file system, i.e. the number of file system operations per second that can be performed. This in turn depends on how many inputs/outputs (I/Os) per second that can be done to the disk subsystem, wherein the subsystem could include one or more drives in a RAID system. Despite the presence of a write cache in the disk subsystem, the disk throughput for a write intensive application is ultimately dependant on the amount of seeking done on the drive.
File systems written for general-purpose systems generally have a disk space subdivided into fixed-size allocation regions. The placement policy is typically to distribute directories over the file system and place the files close to their parent directories. For example, FIG. 2A illustrates a disk space 200 in which the meta data MD1, MD2, and MD3 for directories 101, 102, and 103, respectively, are distributed in the file system. Note that the files for directory 101, i.e. files 111 and 112, are placed immediately following directory 101. In a similar manner, files 202 (shown collectively) for directory 102 can be placed immediately following directory 102 and files 203 (shown collectively) for directory 103 can also be placed immediately following director 103.
Of importance, the arrangement of directories in this manner fails to provide good locality. Specifically, as random mailboxes are addressed for delivery of or access to messages, the resulting disk access pattern is random. In other words, a disk arm in the disk subsystem would need to be moved in a random manner to write/read the meta data and/or files in the directories, thereby resulting in considerable time inefficiencies.
Some systems have attempted to address this inefficiency by spreading the load over multiple file systems. In other words, referring back to FIG. 1, each mailbox could be placed in its own file system. Thus, Sue's mailbox could be placed in one file system, Bob's mailbox could be placed in another file system, and Dan's mailbox could be placed in yet another file system. In this manner, each mailbox (i.e. directory) could be accessed in parallel. However, this configuration results in additional hardware, thereby increasing system cost. Moreover, such a configuration creates artificial constraints on the storage area, thereby causing potential management problems. For example, if Sue's mailbox is full, then additional messages for Sue cannot be written to the file systems that contain either Bob's mailbox or Dan's mailbox even if space is available in one or both mailboxes. In this case, Sue's mail delivery is “down”, i.e. non-functional, until a system administrator can redistribute the disk space between the users.
Another solution to accelerate data access uses solid-state disk technology. In solid-state disk technology, memory that retains its state when power is off, e.g. flash memory, can be used to create virtual disk drives. However, this type of storage can be prohibitively expensive. Moreover, it is limited in capacity.
Yet other file systems have attempted to reduce the access time by having an “append-only” (also called a log-structured) layout. FIG. 2B illustrates a disk space 210 organized with such a file system. Specifically, both meta data and files are written in the space currently indicated by a disk arm 213, which moves in sequential order through the allocation regions. In this configuration, the meta data and files are interspersed, thereby resulting in undesirable read and write latencies in accessing the meta data. Moreover, the deletion of files leaves “holes” or empty allocation regions, which causes time inefficiencies in reading both meta data and files. These holes are typically filled in over time when disk arm 213 returns to that region. In one embodiment, a separate process is needed to perform this fill operation.
Therefore, a need arises for a method to improve the throughput of data storage within the context of a single file system without incurring additional hardware or management costs.