Technical Field
This application relates to managing caches in storage systems.
Description of Related Art
Computer systems may include different resources used by one or more host processors. Resources and host processors in a computer system may be interconnected by one or more communication connections. These resources may include, for example, data storage devices such as file servers and those included in the data storage systems manufactured by EMC Corporation. These data storage systems may be coupled to one or more servers or host processors and provide storage services to each host processor. Multiple data storage systems from one or more different vendors may be connected and may provide common data storage for one or more host processors in a computer system.
A host processor may perform a variety of data processing tasks and operations using the data storage system. For example, a host processor may perform basic system I/O operations in connection with data requests, such as data read and write operations.
Host processor systems may store and retrieve data using a storage device containing a plurality of host interface units, disk drives, and disk interface units. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels to the storage device and the storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical disk units. The logical disk units may or may not correspond to the actual disk drives. Allowing multiple host systems to access the single storage device unit allows the host systems to share data in the device. In order to facilitate sharing of the data on the device, additional software on the data storage systems may also be used.
In data storage systems where high-availability is a necessity, system administrators are constantly faced with the challenges of preserving data integrity and ensuring availability of critical system components. One critical system component in any computer processing system is its file system. File systems include software programs and data structures that define the use of underlying data storage devices. File systems are responsible for organizing disk storage into files and directories and keeping track of which part of disk storage belong to which file and which are not being used.
An operating system, executing on a data storage system such as a file server, controls the allocation of a memory of the data storage system to host systems or clients connected to the data storage system. Allocation is generally performed at a page granularity, where a page is a selected number of contiguous blocks. The particular size of a page is typically a function of an operating system, the page size may be 8 kilobytes (KB).
To the operating system of a data storage system, a file system is a collection of file system blocks of a specific size. For example, the size of a file system block may be 8 kilobytes (KB). As the data storage system is initialized, some of the pages are reserved for use by the operating system, some pages are designated as ‘free’ for allocation to other applications, and a large chunk of pages are reserved to provide a buffer cache (also referred to as “buffer cache pool”). The buffer cache temporarily stores pages in a volatile memory of a data storage system that are also stored in an attached disk device to increase application performance. File system accesses may be serviced from the buffer cache rather than read from the disk, thereby saving the delay associated with disk I/O access and increasing performance of the data storage system.
One of the functions of the operating system of a data storage system is to allocate pages to applications. The operating system maintains a ‘free list’, which is a list of pages that are available for allocation to applications. When an application requires one or more pages, the operating system may allocate a page from either the free list or preempt a page from the buffer cache. When client applications no longer need pages, they are returned to the free list.
The performance of applications is heavily influenced by the speed with which an application can retrieve data. As such, it is important to cache as much data as possible to improve performance of the data storage system.
File systems typically include metadata describing attributes of a file system and data from a user of the file system. A file system contains a range of file system blocks that store metadata and data. A user of a file system access the file system using a logical address (a relative offset in a file) and the file system converts the logical address to a physical address of a disk storage that stores the file system. Further, a user of a data storage system creates one or more files in a file system. Every file includes an index node (also referred to simply as “inode”) that contains the metadata (such as permissions, ownerships, timestamps) about that file. The contents of a file are stored in a collection of data blocks. An inode of a file defines an address map that converts a logical address of the file to a physical address of the file. Further, in order to create the address map, the inode includes direct data block pointers and indirect block pointers. A data block pointer points to a data block of a file system that contains user data. An indirect block pointer points to an indirect block that contains an array of block pointers (to either other indirect blocks or to data blocks). There may be as many as five levels of indirect blocks arranged in a hierarchy depending upon the size of a file where each level of indirect blocks includes pointers to indirect blocks at the next lower level.
Although existing various methods provide reasonable means of providing access to data and metadata, with the explosion in the amount of data being generated, the resources needed for backup, archive, and restore are rising dramatically. It may be difficult or impossible to manage efficient access to data and metadata and caches in data storage systems.