To locate and use a data item on a computer, it is generally necessary to have or to be able to obtain “metadata.” Metadata is information about the data item. For example, metadata may tell us where to find the data item. Obtaining metadata may be particularly difficult in a data storage system for storing a large number of data items distributed across many computers.
A namespace defines a set of valid names for data items or other objects, and a hierarchical structure for the namespace helps eliminate ambiguity. For example, in the namespace of United States telephone numbers, a valid name generally comprises ten digits, in which a three-digit area code disambiguates a seven-digit local number.
In a stand-alone computer using a conventional disk-based operating system, data items (such as documents) may be stored in files. A file system is provided to associate each file with selected metadata describing the file. The operating system is able to obtain selected metadata (such as directory information) about each file in the file system. The metadata may include, for example, a name, a file type, a file size, and a physical or logical storage location where the file is stored, such as on a disk drive.
To organize the storage of files, a hierarchical file name is typically provided in such a file system. For example, a hierarchical file name may be used to express a file name by describing its location in nested directories on a disk drive. In this typical directory structure, an exemplary file name may be expressed as C:\docs\english\sample.txt, where “C:\” represents a highest-level (root) directory of a disk drive identified as Drive C, “docs” represents a second-level directory under the root, “english” represents a third-level directory under docs, and “sample.txt” represents a file stored in the english directory.
Computer networks, such as local-area networks (LANs), wide-area networks (WANs), and the Internet, are often configured to permit distributed data storage. Distributed data storage allows a user of a networked computer to access data items that are stored on another computer accessible through the network.
A typical example of a hierarchical name for distributed data storage is a conventional Uniform Resource Locator (URL), as widely used on the Internet. A user may enter a URL, such as http://example.com/docs/index.html, into a web browser. The web browser will generally use the domain name system (DNS), such as by querying a nameserver, in order to translate, map, or resolve the domain name example.com to a corresponding numeric Internet Protocol (IP) address, such as 123.45.67.123. The IP address identifies a particular remote computer. The web browser may then use hypertext transfer protocol (HTTP) to establish a connection with the remote computer identified by the given IP address. The string “docs” represents a directory on the remote computer, where the remote computer will attempt to find a hypertext markup language (HTML) document called “index.html”.
Existing hierarchical models for data naming in distributed data storage systems tend to bind data to a particular host computer on which the data resides, as illustrated by the foregoing examples of file names and URLs. Such data naming models generally lack flexibility for an environment that can be dynamically mapped onto a changing set of computers.