1. Field of the Invention
The invention relates to a file system as used in computers for storing and using different types of data such as text, audio, video, spreadsheets, presentations, and images wherein the data has identifiable parts.
2. Description of the Background
In the past, file systems implemented on computers and computer networks have been part of the operating system running on the computer. Thus, in the case of an operating system such as Windows NT, which is available from Microsoft Corporation, AIX, UNIX, etc., there is a file system that comes as part of the operating system.
In the past, in implementing such file systems through the operating system, it was assumed that everything is connected. Thus, when a request was made from a file system, a user created a file, and that file was directly passed to storage. At this point the operating system owns all the phases of the request, and as long as it is maintained within the computer running the operating system, everything functions smoothly.
However, while functioning smoothly in a stand-alone system, management of the file system can become complicated in more modern arrangements typically known as network attached storage systems (NAS). In an NAS environment, the file system component has been removed from the operating system and placed on storage units which include xe2x80x9cintelligencexe2x80x9d in the form of a storage processor which may be a CPU, and one or more non-volatile memory components such as a disk drive.
In such an NAS environment, what occurs is that a user creates a file or performs any kind of file system operation and sends the file or the file system operation over the network to an external facility. How that is controlled becomes a complex issue. More specifically, a request has been sent, and you know a reply to the request will be forthcoming, but how it is processed is no longer under the user""s control. More specifically, the file handling utility in the operating system no longer comes into play, because the file has been released to another environment, and the file system component on the storage unit.
In the past, the management of such a file system was not particularly complex because most of the data was either text or business operational data such as payroll. Increasingly, users have been storing different kinds of data, such as multi-media data, etc. As a result, tools have been developed to identify parts of such data which has meaning and usage in itself.
One example is that users of the extensible markup language (XML) mark up their data into useful segments such as xe2x80x9caddress of the sender,xe2x80x9d xe2x80x9cprice quotes,xe2x80x9d etc. Using XML tools the data can now be placed into files where size and boundaries of the segments are known to the file system. It is a common practice in applications which create and distribute content to read files to search for identifiable segments of data, or content, and to mix them in different ways to distribute to end users. It is also a common practice to alter the data according to user profile. A still further and more recent practice has been to deploy a data source directly connected to a network in an NAS environment.
There are a number of problems with current file systems in the content management environment. Initially, knowledge of where the content boundaries and sizes are in a file must be resident in an application. Any mixing of the content must be accomplished by writing an application or duplicating the data in a different form. As will be appreciated by those of ordinary skill in the art, duplication of data causes problems because any updates to the data must be made to different places, introducing the possibility of user error. Changes to the content must be accomplished through the use of an application, and constraints or new security definitions and new attribute definitions to the content can only be accomplished through an application. Finally, there is no mechanism to transmit how the users desire the content to be modified. This process creates complications because each user in a chain must transmit the entire data to a prior user until it returns to the initial user. The initial user then makes the modifications and sends out data to the recipient. As a result, this significantly increases the amount of data flowing through the network.
One attempted solution to resolve the problem is through the use of XML to try to capture the semantics of business data operations to help reduce the cost of developing content applications. Numerous products add attribute-based search capabilities and limited forms of modifications. Some allow a framework for defining some forms of content modification. Each provides a limited solution to the problem.
The related tools do not solve the problem of having to write a custom application for solving every problem. Similarly, no capability for creating new content from existing data with its own property and security definitions is provided. The specific steps for different types of modification definitions to be transmitted are not addressed. In accordance with the invention, the problems of the current solutions are avoided by virtualizing the file system interface so that desired content can be created without having to write an application.
In accordance with one aspect of the invention, there is provided a file system for creating and managing content. The file system contains directories containing base files which contain data about which information is not known. A repository of metadata contains identifying information about the data in the base files. Phantom files may exist based on at least one base file. The phantom files are designated by names and associated attributes. The phantom files contain information about data in base files but do not specify a path name to the base file containing the data. The phantom files may point to segments of data in base files and properties of the data without requiring duplication of the data into additional files.
More specifically, the phantom files identify the properties of the segments of data identified, and include attributes of the segments, as well as user restrictions limiting access to the segments of data to specified users.
In another aspect, there is described a method of managing content in an existing file system. The file system has a plurality of directories having base files containing data about which information is not known. A repository of metadata containing identifying information about the data in the base files is created. Phantom files are created and designated with names and associated attributes. The phantom files contain information about data in base files but do not specify a path name to the base file containing the data. Segments of data which are named as content in the base file are thus identified by the phantom file, with the segments of data being defined for use by users accessing the phantom file.
The phantom files may be created by splitting an existing file along content boundaries, combining contents of an existing file into a phantom file, splitting or combining contents of phantom files, defining a filter to work on content, mirroring an existing file, or versioning an existing file.
Thus, in accordance with the system and method described herein, the file system interface is virtualized where desired content can be created without writing an application, but instead, by creating a phantom file. This allows users to create myriad views of data effortlessly without resorting to creating applications. It provides additional adaptation definitions which abstract the notion of data modifications, and provides a generalized model of modifying content and transmitting the request for modification.