1. Field of the Invention
The present invention generally relates to data storage systems and methods, and, more particularly, to a method for checking and recovering large file systems in a distributed object-based data storage system.
2. Description of Related Art
As part of checking a file system in a data storage environment, the information identifying all the files and directories in the file system is generally stored in a memory and then a file system recovery (FSRC) process is performed using the information stored in the memory to verify that the existing data structure is good or error-free and that all the files in the system are in those directories where they are supposed to be. Thus, the FSRC process performs a file system integrity check and also executes steps necessary to recover faulty or missing directory entries to maintain a healthy and correct file system.
In an object-based data storage system, often the memory consumption for efficient file system recovery process is a linear function of the number of objects (file objects as well as directory objects) in the file system. In other words, in case of a large number of objects, a large memory is needed to store all file and directory object-related information for FSRC process. With millions or billions of objects in modern distributed data storage systems, the memory space requirement for FSRC-related information storage may be prohibitive. Further, because of the presence of a very large number of objects in modern data storage systems, traditional file system recovery process may become very time consuming because of the need to check all objects.
When the FSRC process is executed, it is typically the only process that accesses data on storage. There may be orphan objects or missing directory entries that require that the FSRC process walk through and check all the objects in the system to cure or rectify the missing information. An FSRC process may typically perform two types of file system checks: static and dynamic. A static check can be done without looking at the parents or children of an object and can be performed transparently anytime an object is cached or stored in the system. In a static check on an object, the object needs to be “touched” only once. The verb “touch” as used in various forms hereinbelow refers to the act of accessing an object that is stored in a main system memory or cached in the system or stored on a storage disk to get (or set) the object's attributes (e.g., in case of a file object) or to read or write object entries (e.g., in case of a directory object).
In the discussion below, the term “parent-child relationship” between two or more objects refers to a forward pointer (e.g., a directory entry) from a parent object to its child object and a backward pointer (e.g., an object attribute) from a child object to its parent object. It is noted that the term “parent-child relationship” also includes a child-parent relationship. Thus, the term “parent-child relationship” does not strictly imply that the first object is always a parent and the second one is always a child. A dynamic check makes sure that for every forward pointer there exists a backward pointer. Thus, dynamic checks, as opposed to static checks, make sure that all parent-child relationships among the system objects are correct. Hence, dynamic checks may require to touch multiple objects in the system to verify existence of proper parent-child relationships among related objects. For example, during a dynamic check, the FSRC process may go through a list of objects in order and may first come to a parent P that has children A, B, and C. Later, during the file system checking, the FSRC process may come to the child C in the list and, at that time, the FSRC process must touch the parent P again to make sure that P has a forward pointer to C because it may happen that instead of C there may be another child object D that has a backward pointer to P, but P has no forward pointer to D. The roles of parents and children is reversed when a child is encountered first in the list. In that case, the FSRC process may need to touch the parent object as many times as there are children of that parent that point to the parent but are in front of the parent in the list of objects.
As noted before, the presence of extremely large number of objects in advanced data storage systems mandates a very large amount of memory (on the order of many gigabytes) if object-related information for all objects is to be stored at the time of dynamic checking. This may not be possible when system storage space is limited and has to be apportioned among a number of applications. Further, a dynamic checking may require touching of multiple objects from random places in the list of objects. Thus, when an FSRC process is performed with a limited amount of memory space, the necessity to touch multiple objects from random places may require time-consuming I/O operations, thereby degrading the system performance during file system checking and recovery.
Therefore, it is desirable to devise a method to efficiently check a file system that has a very large number of objects. As multiple pass- or touch-oriented traditional dynamic file system checks are time-consuming and heavily I/O dependent, it is desirable to have an FSRC process that performs dynamic checks on system objects by touching an object only once. It is further desirable that the FSRC process perform efficient file system checking and recovery without requiring a memory space that is a linear function of the number of objects in the system.