A conventional information processing system stores information in a storage directly connected to a computer system. An access to information stored in the storage is permitted only through the use of the directly connected computer. Accordingly, the other computer systems need to access the data via the directly connected computer. In recent years, the development of network technologies and a dramatic increase in the amount of information to be stored promote separation of a computer system for processing information and a storage system for storing information. A storage is connected to the network and has become capable of being shared by a plurality of computer systems. Such networked storage is referred to as a network storage.
Examples of such network storage include a storage area network (SAN) storage that is connected by an SAN and provides a block access; a network attached storage (NAS) that is connected by an IP network, Infiniband, etc. and provides a file access; and a Web storage that offers an access according to an interface HTTP protocol or its extended protocol for Web accesses.
As network storages become widely used, a system administrator needs to manage a plurality of network storages connected to the network. When the amount of data to be stored exceeds a maximum value for the existing network storage, for example, it is necessary to add a new network storage and reconfigure the system by re-distributing data or the like. A drastic increase in the need for storage capacities forces repeated system reconfigurations, thus increasing system management costs.
In order to conserve system management costs, a technology for virtualizing storages is mandatory so that a plurality of network storages can be viewed as a single storage from the computer system and the entire system is not affected by addition of new devices. Various systems are developed and proposed for such technology for virtualizing storages.
For example, “The Zebra Striped Network File System” (Hartman et. al., ACM Transactions on Computer System, vol. 13, No. 3, 1995, pp. 274-310) describes the system for distributively storing a single file in a plurality of network storages. A file is divided into a specified length and divided portions of the file are sequentially stored in a network storage on a round robin basis. There is provided a server for centrally managing in which order the file is distributively stored. The system is characterized by inquiring the resource manager of information about storage locations of the file, and then accessing data during file access processing. Namely, the system provides an effect of having a plurality of network storages virtually viewed as a single large network storage by distributively storing a file in a plurality of servers and using a single server for central file management. In addition, the U.S. patent application Ser. No. 6,029,168 describes the method for file striping according to the non-centralized resource manager system different from the above-mentioned centralized resource manager system. This system embeds starter node information in a file identifier. The starter node information stores striping information about the file. By doing so, a plurality of network storages distributively manages the striping information. Since the file identifier stores a starter node, the system first accesses the starter node during a file access and determines a location to store the file. If necessary, the system transfers an access request to a server storing fragments (blocks) of the file to be accessed and processes the file. For this reason, a client just needs to issue a request to the starter node for accessing the relevant file. There is provided an effect of eliminating the need for considering a storage location according to the file striping.
The system described in the above-mentioned Zebra Striped Network File System needs to make an inquiry into the centralized resource manager when determining a network storage for storing a file fragment to be accessed. Accordingly, when the number of network storages increases, the centralized resource manager becomes a bottleneck, causing a hindrance to the system scalability.
The non-centralized resource manager system described in the U.S. patent application Ser. No. 6,029,168 solves the problem of the centralized resource manager in the point that the resource manager is distributed into a plurality of servers. However, this system premises that, during a file access, a server such as a distributed directory is responsible for making a conversion between a file name or a directory name for uniquely specifying the file and a file identifier embedded with the starter node. The system requires additional information for managing the correspondence between a file name and a server (starter node) for managing file storage locations. The U.S. patent application Ser. No. 6,029,168 also describes that a “well-known” distributed directory file is used as a means for storing such information. When a new network storage is added and the starter node for the existing file is moved to the new network storage, it is necessary to rewrite the information about the correspondence between a file name and a file identifier stored in the distribute directory. This is because the starter node location is directly embedded in the file identifier.