Fundamentally, computers are tools for helping people with their everyday activities. Processors may be considered as extensions to our reasoning capabilities and storage devices may be considered as extensions to our memories. File systems, including distributed file systems, are typically provided for accessing data organized in a hierarchal namespace, such as a directory tree, on storage devices, but the gap between the human memory and the simple hierarchical namespace of existing file systems makes these file systems hard to use.
The human brain typically remembers objects based on their contents or features. For example, when you run into an acquaintance, you may not remember the person's name, but you may recognize the person by features, such as a round face and a shiny smile. These identifying features are known as semantics or semantic information.
To bridge the gap between the human memory and the hierarchical namespace of existing file systems, people have used either separate tools or file systems that integrate rudimentary search capabilities. Tools such as GREP and other local search engines have to exhaustively search every document to match a pattern for identifying a document.
Some known semantic file systems, such as Semantic File System (SFS) and Hierarchy and Content (HAC), organize a namespace by executing queries based on semantic information and constructing the namespace with the results of the queries. For example, a directory in HAC may be created with all files that match the results of a query. These file systems, however, provide only simple keywords-based searches, and these file systems do not maintain any indices for minimizing retrieval times.
Also, known semantic file systems do not typically support archival functions, such as versioning. Generally, the most arduous task in restoring a backed up version is to find the desired file and the desired version of the file. Currently, the only way to locate the version is by remembering the date that the version was produced. In many cases, people are interested in files produced by other people, and are interested in versions with certain features. For example, in a digital movie studio an artist may make many variations of video clips. To produce a video clip, the artist may perform several editing iterations until the clip has the desired look and feel of the artist. In the process, the artist may go back to one or more previous versions, which may not be the latest version. Also, the artist may need to incorporate scenes produced by other artists, but the artist may not know the file name or correct version of the file including scenes to be incorporated. Instead, the only thing the artist may know is that these files have certain semantics. This situation arises in a variety of applications and environments, including universities, research laboratories, and medical institutions, etc.