A storage system is a computer that provides storage service relating to the organization of information on writable persistent storage devices, such as memories, tapes or disks. The storage system is commonly deployed within a storage area network (SAN) or a network attached storage (NAS) environment. When used within a NAS environment, the storage system may be embodied as a file server including an operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on, e.g. the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information, such as the actual user data for the file. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored. As used herein a file is defined to be any logical storage container that contains a fixed or variable amount of data storage space, and that may be allocated storage out of a larger pool of available data storage space. As such, the term file, as used herein and unless the context otherwise dictates, can also mean a container, object or any other storage entity that does not correspond directly to a set of fixed data storage devices. A file system is, generally, a computer system for managing such files, including the allocation of fixed storage space to store files on a temporary basis.
The file server, or storage system, may be further configured to operate according to a client/server model of information delivery to thereby allow many client systems (clients) to access shared resources, such as files, stored on the storage system. Sharing of files is a hallmark of a NAS system, which is enabled because of its semantic level of access to files and file systems. Storage of information on a NAS system is typically deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet, that allow clients to remotely access the information (files) on the storage system. The clients typically communicate with the storage system by exchanging discrete frames or packets of data according to pre-defined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
In the client/server model, the client may comprise an application executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. NAS systems generally utilize file-based access protocols; therefore, each client may request the services of the storage system by issuing file system protocol messages (in the form of packets) to the file system over the network identifying one or more files to be accessed without regard to specific locations, e.g., blocks, in which the data are stored on disk. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS), the Network File System (NFS) and the Direct Access File System (DAFS) protocols, the utility of the storage system may be enhanced for networking clients.
A SAN is a high-speed network that enables establishment of direct connections between a storage system and its storage devices. The SAN may thus be viewed as an extension to a storage bus and, as such, an operating system of the storage system enables access to stored information using block-based access protocols over the “extended bus”. In this context, the extended bus is typically embodied as Fibre Channel (FC) or Ethernet media adapted to operate with block access protocols, such as Small Computer Systems Interface (SCSI) protocol encapsulation over FC or TCP/IP/Ethernet.
A SAN arrangement or deployment allows decoupling of storage from the storage system, such as an application server, and some level of information storage sharing at the application server level. There are, however, environments wherein a SAN is dedicated to a single server. In some SAN deployments, the information is organized in the form of databases, while in others a file-based organization is employed. Where the information is organized as files, the client requesting the information maintains file mappings and manages file semantics, while its requests (and server responses) address the information in terms of block addressing on disk using, e.g., a logical unit number (lun).
Certain storage systems may support multi-protocol access and, to that end, enable clients to access data via both block and file-level requests. One example of such a storage system is described in U.S. patent application Ser. No. 10/215,917, entitled MULI-PROTOCOL STORAGE APPLIANCE THAT PROVIDES INTEGRATED SUPPORT FOR FILE AND BLOCK ACCESS PROTOCOLS, by Brian Pawlowski, et al.
One common use for a storage system that supports block-based protocols is to export one or more data containers, such as luns, for use by a client of the storage system. The client typically includes an operating system and/or a volume manager that forms the data containers into one or more volume (or disk) groups. A volume group is a set of luns aggregated to provide a storage space that may be utilized by the client to overlay one or more file systems or other structured storage thereon. As used herein, the term storage space means storage managed by a client that utilizes one or more data containers hosted by one or more storage systems, an example of which is a file system overlaid onto a volume group that comprises one or more luns stored within a plurality of volumes of a single storage system or within a plurality of volumes of a plurality of storage systems. Another example of a storage space is a volume group managed by a client to enable an application, such as a database application, to store structured data thereon.
Storage system users often may wish to search the data containers stored on a storage system to identify those containers that contain user data matching one or more search criteria, such as phrases and terms. As noted, a data container may include a file, a directory, a virtual disk (vdisk), or other data construct that is addressable via a storage system. For example, a user may wish to search and locate all data containers serviced by the storage system that contain user data matching the phrase “Accounts Receivable.” By enabling searching of data containers on storage systems, users may improve utilization of their data, especially in large enterprises where the number of data containers may be in substantial, e.g., the tens or hundreds of millions.
To identify data containers that contain user data that match the search criteria, a search process may need to examine all of the data containers within the storage system every time a search is requested. In a typical storage system having a substantial number of data containers, this is not a practical solution due to the substantial amount of time required to access and process every data container to determine if it contains the search criteria. To enable faster searching, a search index of information associated with the data containers may be generated for the storage system. The storage system search index may be constructed by performing a file system “crawl” through the entire file system (or other data container organizational structure) serviced by the storage system. Typically, a file system crawl involves accessing every data container within the file system to obtain the necessary index information. However, such a file system crawl is expensive both in terms of disk input/output operations and processing time, and suffers from the same practical problems of directly accessing each data container. That is, the file system crawl may substantially impede access to the file system, e.g., for tens of minutes at a time, which results in an unacceptable loss of performance.
Furthermore, the file system crawl is typically performed at regular intervals (periodically) to maintain up-to-date index information. As a result of the substantial processing time required, a further disadvantage of the file system crawl is that the periodic search index information may be inconsistent with the current state of the file system, i.e., the index information only represents the file system as of the completion of the last file system crawl.
A further noted disadvantage arises in a storage system environment where a client overlays a file system or other structured storage onto storage space provided by a storage system. In such an environment, indexing functionality within the storage system may not operate as the overlaid file system that may utilize a different format than that of the storage system's native file system format. This prevents a storage system administrator, who may support a plurality of differing vendors of clients, from being able to quickly and efficiently search through user data to enable users to identify data containers containi- desired search terms.