This invention relates generally to the field of processing computer information formats and more particularly to a method and system for dynamically accessing information in a format different than the format used by the computer system to internally represent the information.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright(copyright) 1998, Microsoft Corporation, All Rights Reserved.
Computer applications such as document processors, data base programs, simulators, games, editors, and compilers all need to persist information even while the application is not running. Computer systems store persistent information in a variety of ways, including disk-based file systems, data bases, hierarchical storage systems, internet servers, and distributed memory. Persistent application data is stored in different formats depending on the type of application, and even depending on the version of a single application. The format of the information is what gives meaning to the binary bits which are used to represent the information. The format includes both the explicit details of how to interpret the bits, as well as the rules that are to be observed when accessing the information such as how to concurrently access the data from multiple users, how to sequence modifications to the information so that it can be recovered after a system crash, or how to maintain auxiliary data used to manage the information for purposes such as workflow management, auditing, or security. Multiple formats can be applied to the same information. The persistent storage that holds the information produced by an application is sometimes referred to as a file. The computers on which such applications run have file systems and other persistent stores which store the files out onto memory devices in yet further formats. These multiple different formats, both at the application level and at the file server level lead to difficult interoperability problems. For example, a document produced by a later version of a document processor is often not readable by a previous version of the document processor. When a user buys a new computer loaded with the latest software, produces a document, and gives a copy of the document to someone else only having a previous version of the software, the copy can be useless and indecipherable by the previous version.
Further difficulties arise when a user desires to share documents and other files over a network with a person using a different operating system, or application, or even a different version of the same operating system or application. If the different systems use different formats for the information, due to changes in the applications, or internal operating system components, they may have difficulty sharing information. In particular the newer system or application may use an information format that was invented after the earlier system was developed. These difficulties also arise with different applications that use a common type of information, but expect different formats, such as image processing applications that use JPEG instead of GIF, or document processors which use HTML instead of Word7 format. Incompatibilities can also be due to the file systems or other persistent stores used by different operating systems. One type of operating system has file servers that store data files formatted as a single stream. Applications interface with the file server via an interface, such as OLE32, and expected the data to be returned to it in a certain format. OLE32 was specifically designed to retrieve and transfer data in the single stream format of docfiles. A newer or different type of file format may use the same set of interfaces, but store the information in a different format, perhaps relying on a file system format that supports multiple streams in a single storage container, and this results in a compatibility problem.
Prior attempts to solve the problem of using different versions of applications and different applications storing data in different formats involved the use of conversion programs which performed explicit conversions on information between formats. Thus, when opening a document, a user would be presented with a choice of converting a document to a new format prior to opening it. Also, on storing out a document, a user may select many different application level formats in which to store it. These solutions worked well for new versions of software, where the support for such conversions was built into the programs, but did not work well when an older version of software was confronted with a data format produced by a newer version. If a user of the new version failed to explicitly save the information in a format that was understood by earlier systems, the information would be unavailable to users on earlier systems. Either the earlier system must be upgraded with a new program to convert the data, or the newer program must be started again and the file converted prior to trying to use the older version to work with it. This was an unsatisfactory solution because the older application or system would not understand that the information was in a newer format, and give the user confusing error messages. Even where the format problem could be detected, there were generally no tools available on the older system to effect the conversion. The problem is also common on computers coupled by network, where a file server, remote database or other distributed persistent storage mechanism may store data in a newer file system format, or there may be multiple versions of the same software on different machines, and one user does not have access to newer versions in order to appropriately transform application information formats.
Some image processing applications keep an image file in an internal compressed format, and then use an operating system driver to transform the file to appear to be in a fixed set of well-know image formats (JPEG, GIF, etc). It does not allow modifications to the well-known formats, and is only involved in data format conversion.
Such solutions also fail to provide more than data format conversion. The xe2x80x98how toxe2x80x99 rules associated with the format are not implemented, so users cannot share or manage the information. This type of format conversion produces a copy of the information in the old format, which can be accessed or modified independently of the original, producing inconsistencies between the separately stored versions of the information.
There is a need for an easier and more convenient way to provide interoperability between different versions of applications and operating system persistent storage systems. There is a need for such a way which does not require modifications to the applications, and that is backward compatible with existing applications. The provision of such interoperability should be transparent to a user and should also be provided in an efficient manner. Further, it should allow persistent application information to be dynamically shared and managed according to the rules of the newer format, rather than requiring users of older software to only make a copy of the information in an older format.
An operating system layer resides between software components or application programs that expect information to be in one format and a persistent store manager of the operating system which maintains the information in its persistent form. The operating system layer, which is referred to as a filter driver, provides on-the-fly conversion between the file format expected by the application layer and the format used by the persistent store manager. The filter driver determines which format a program expects, and dynamically converts the information to such a format, including both the static layout of the binary data as well as the dynamic rules for how to access the data.
Computer programs access persistent information by invoking Application Programing Interfaces (APIs) which make copies of the information in the persistent store available in the program""s memory, and also update the persistent store with any desired changes. In addition to the static binary data portion of the information, there is auxiliary information regarding aspects such as dates, security, amount of information available, and other properties. This auxiliary information is sometimes called xe2x80x98meta-data.xe2x80x99 The filter driver dynamically converts between formats by copying information between the persistent store and the application""s memory according to a conversion algorithm, providing the application with a xe2x80x98viewxe2x80x99 of the file that is different from the view offered by the underlying storage system. The xe2x80x98converted viewxe2x80x99 provided by the filter driver does not necessarily mean that all the data and meta-data of the file has been converted. The requirement is only that the data that is copied into the application""s memory appears to have been converted.
Both file system formats and application program specific formats are convertible by the filter driver. This allows applications and other programs to operate transparently with different file systems and older versions of applications without modification. In one instance of the invention, separate loadable conversion modules are provided for converting application specific formats due to the potential large number of such formats which can be encountered.
Loadable conversion modules are provided as either parts of the operating system or as parts of distinct applications. For example two versions of a word processor application might run on the same system, with the newer one storing documents in a different format. The newer version of the application could provide a conversion module for use by the filter driver to allow files created by the new application to be accessed by the old application.
The filter driver may reside in the kernel of an operating system of a computer system. Applications may be operating directly on the computer system, or may be networked to the computer system. In either event, the filter driver sits above a persistent store, such as a file system and intercepts requests for stored information coming from either the local Application Programming Interfaces (APIs) or across the network. An indication that an application requires a data format transformation is provided to the filter driver by either the application specifying the desired format, or it is deduced from information such as the version of the system or application opening the file. If no indication of the desired format is provided, an older version of the application is assumed which requires the information to be in an older, well-known format. The stored form of the information may be converted to an intermediate format which is maintained by the filter driver to handle semantic differences. The intermediate format may include cached information in order to improve performance and avoid having to convert files with each access. The filter driver may also keep a file in different formats depending on access history or other system requirements.
Statistics are kept on the manner in which the information containers are accessed. The statistics are used to estimate the overall cost of dynamic conversion from the various alternatives for the actual storage format. If it is estimated that the overall costs, measured in cpu cycles, memory requirements, on-disk storage size, and similar resource metrics, will be less if a new stored format is used, the stored form of the information can be translated to a new format using otherwise idle system resources, such as during night or weekend hours.
The filter driver allows applications to open files in the formats that they expect even though the underlying file system and data format may be different If the file""s true format and expected format are compatible, the filter driver allows the open to succeed directly, bypassing the filter driver. If the formats are incompatible, then when the application reads and writes the file, the filter driver causes the file to appear in the expected format. Semantic information regarding concurrent access between applications is also translated. Auxiliary information having implied semantics such as access control lists, management information, property sets, alternate representations, cached information, annotations, audit trails and other similar information is also maintained and may be cached for faster access.
One benefit of the current invention is that parts of a system may be updated to work with a new file system or new versions of software without having to ensure that the entire system is converted at the same time. This makes upgrades easier to perform, and also allows upgrades to take place in stages, which can be very important for organizations with large numbers of systems. Applications can also embed files in a new context, such as in emails or copying to an offline media, where specific formats are required. Since the filter driver resides in or near the kernel, overhead of the conversions are low, and conversion is transparent to the applications. Further, when converting back to an older format the filter driver can choose a more efficient representation of the information in the older format based on information in the newer format, such as in WindowsNT 5.0 where NSS to docfile conversion results in contiguous file allocation tables.