1. Field of the Invention
This invention generally relates to managing data objects in a distributed, heterogenous network environment, and, more specifically, to managing aggregate forms of such data objects across distributed heterogenous resources such that the aggregate forms of the data objects are transparent to the user.
2. Background
Many applications require access to data objects distributed across heterogeneous network resources. Examples of such data objects include office automation products, drawings, images, and electronic E-mail. Other examples include scientific data related to digital images of cross-sections of the human brain, digital sky survey image files, issued patents, protein structures, and genetic sequences. In a typical scenario, data objects are generated at multiple sites distributed around the country. Data objects related to a common topic or project are organized into a collection for access. If the data sets are located at different sites, efficient access usually requires gathering the data sets at a common location. The resulting collection must then be archived to guarantee accessibility in the future. The management of data objects is typically complicated by the fact that the data objects may be housed in diverse and. heterogeneous computer-based systems, including database management systems, archival storage systems, file systems, etc. To efficiently make use of these data objects, a unified framework is needed for accessing the data objects from the numerous and diverse sources.
Conventional systems for managing data include those depicted in U.S. Pat. Nos. 6,016,495; 5,345,586; 5,495,607; 5,940,827; 5,485,606; 5,884,310; 5,596,744; 6,014,667; 5,727,203; 5,721,916; 5,819,296; and 6,003,044.
U.S. Pat. No. 6,016,495 describes an object-oriented framework for defining storage of persistent objects (objects having a longer life than the process that created it). The framework provides some core functionalities, defined in terms of several classes (e.g., Access Mode, CachedEntity Instance, TransactionManager, DistributedThreadContext, and ConnectionManager) and user extensible functionalities that can be modified to provide access according to the persistent storage being used. The concept of a xe2x80x9ccontainerxe2x80x9d as discussed in the patent simply refers to a logical grouping of class structures in a persistent storage environment, and is different from the concept of xe2x80x9ccontainerxe2x80x9d of the subject invention as can be seen from the embodiment, later described.
U.S. Pat. No. 5,345,586 describes a data processing system consisting of multiple distributed heterogeneous databases. The system uses a global data directory to provide a logical data model of attributes and domains (type, length, scale, precision of data) and a mapping (cross-reference) to physical attributes (and tables) residing in multiple (possibly remote) databases. The global data directory stores route (or location) information about how to access the (remote) databases. The cross-reference information is used to convert the values from the physical databases into a consistent and uniform format.
U.S. Pat. No. 5,495,607 describes a network administrator system that uses a virtual catalog to present an overview of all the file in the distributed system. It also uses a rule-based monitoring system to monitor and react to contingencies and emergencies in the system.
U.S. Pat. No. 5,940,827 describes a method by which database systems manage transactions among competing clients who seek to concurrently modify a database. The method is used for maintaining cache coherency and for copying the cache into the persistent state.
U.S. Pat. No. 5,485,606 describes a method and system for backing up files into an archival storage system and for retrieving them back into the same or different operating system. To facilitate this function, the system writes a directory file, for each data file, containing information that is specific to the operating system creating the file as well as information common to other operating systems that can be utilized when restoring the file later.
U.S. Pat. No. 5,884,310 describes a method for integrating data sources using a common database server. The data sources are organized using disparate formats and file structures. The method extracts and transforms data from the disparate data sources into a common format (that of the common database server) and stores it in the common database for further access by the user.
U.S. Pat. No. 5,596,744 describes a method for sharing of information dispersed over many physical locations and also provides a common interface for adapting to incompatible database systems. The patent describes a Federated Information Management (FIM) architecture that provides a unified view of the databases to the end user and shields the end user from knowing the exact location or distribution of the underlying databases.
The FIM uses a Smart Data Dictionary (SDD) to perform this integration. The SDD contains meta-data such as the distribution information of the underlying databases, their schema and the FIM configuration. The SDD is used to provide information for parsing, translating, optimizing and coordinating global and local queries issued to the FIM.
The SDD uses a Cache Memory Management (CMM) to cache meta-data from SDD into local sites for speeding up processing. The patent describes several services that use the FIM architecture. The patent also describes methods for SQL query processing (or DBMS query processing).
U.S. Pat. No. 6,014,667 describes a system and method for caching directory information that may include identification information, location network addresses and replica information for objects stored in a distributed system. These directory caches are located locally and used for speeding up access since directory requests need not be referred to a remote site. The patent deals with caching of directory information in order to reduce traffic. The patent also allows for replicated data addresses to be stored in the cache.
U.S. Pat. No. 5,727,203 is similar to U.S. Pat. No. 5,940,827 but is restricted to object-oriented databases.
U.S. Pat. No. 5,721,916 describes a method and system for making available a shadow file system for use when a computer gets disconnected from a network which allowed it to access the original file system. The system transparently copies the file from the original file system to a local system whose structure is recorded in a local file database. When no longer connected to the network, the access to the file is redirected to the shadow file.
U.S. Pat. No. 5,819,296 describes a method and apparatus for moving (migrating) large number of files (volumes) from one computer system to another. Included are methods for moving files from primary storage to secondary storage and from one system to another system. In this latter case, the system copies the directory information, and the files that need to be migrated are manually copied. Then, the directory structure merged with the new storage system. The patent discusses moving files residing in volumes which are physical storage partitions created by system administrators.
U.S. Pat. No. 6,003,044 describes a system and method to back up computer files to backup drives connected to multiple computer systems. A controller system allocates each file in a backup set system to one or more multiple computers. Each of the multiple computer systems is then directed to back up files in one or more subsets, which may be allocated to that computer system. The allocation may be made to optimize or load balance across the multiple computer systems.
A problem which plagues such systems is the overhead involved in accessing archived individual data objects from a remote site. Remote accesses such as this are typically fraught with delay caused primarily by the high latency of archival resources such as tape and, to a lesser degree, the network latency and system overhead. This delay limits the effectiveness of such systems. To overcome the delay, the user might manually aggregate data objects using tools provided by the operating systems or third parties, and copy the data to a nearby facility. However, this requires the user to be familiar with the physical location of the data objects and manner in which they are aggregated and stored, a factor which further limits the effectiveness of the system.
Consequently, there is a need for a system of and method for managing data objects distributed across heterogenous resources which reduces or eliminates the delay or latency characteristic of conventional systems.
There is also a need for a system of and method for managing data objects distributed across heterogeneous resources in which the physical location of and manner in which the data objects are stored is transparent to the user.
There is also a need for a system of and method for providing a data aggregation mechanism which transparently reduces overhead and delay caused by the high latency of archival resources.
There is further a need for a system of and method for managing data objects distributed across heterogenous resources which overcomes one or more of the disadvantages of the prior art.
The objects of the subject invention include fulfillment of any of the foregoing needs, singly or in combination. Further objects and advantages will be set forth in the description which follows or will be apparent to one of ordinary skill in the art.
In accordance with the purpose of the invention as broadly described herein, there is provided a system for transparent management of data objects in containers across distributed heterogeneous resources comprising: a client configured to issue requests relating to data objects in containers in response to user commands; at least one server accessible by the client over a network; a broker process, executable on a server, for responding to a request issued by a client; a meta-data catalog maintained on a server, and accessible by the broker, for defining data objects and containers, and associating data objects with containers; and at least one data resource maintained on one or more servers for storing data objects in containers; wherein the broker, responsive to a request, is configured to access the meta-data catalog, process the request using the meta-data catalog, and then update the meta-data catalog to reflect changes incidental to the request, whereby data objects, once aggregated into containers, are maintained therein transparent to users.
Also provided is a method of creating a logical resource comprising the steps of: associating one or more physical resources with the logical resource; for each physical resource, specifying a type thereof from the group comprising an archive, a cache, a primary archive, and a primary cache; and for each physical resource, also specifying size and access control information.
A method of creating a container is also provided comprising the steps of: specifying, in response to a user request, a name of a container and a logical resource to be allocated to the container, the logical resource being associated with one or more physical resources, including at least one archive and at least one cache; creating meta-data for the container, including meta-data specifying the container name, the logical resource to be allocated to the container, and the one or more physical resources associated with the logical resource; storing the meta-data for the container in a meta-data catalog; and reserving one or more of the archives allocated to the container.
The invention further includes a method of importing a data object into a container comprising the steps of: specifying a container; querying meta-data for the container, including an offset within the container; finding or staging to a selected resource a current cache copy of the container; writing the data object into the cache copy at the specified offset; updating the meta-data for the container to reflect introduction of the data object into the container; and marking the cache copy as dirty or synchronizing it with any other copies.
A method of synchronizing a plurality of copies of a container is also included comprising the steps of: if no copies of the container are marked as dirty, ending the method; if a cache copy of the container is marked as dirty, synchronizing such to one or more archival copies that are not marked as dirty; if all archival copies are thereby written over, resetting the dirty flags of all such archival copies; and if one or more but not all archival copies are thereby written over, setting the dirty flags of the one or more archives that are written over.
The invention also includes a method of reading a data object from a container comprising the steps of: querying meta data for the container, including an offset where the data object is stored within the contain; finding or staging to a selected resource a current cached copy of the container; and using the offset to retrieve the data object from the cached copy of the container.