1. Field of the Invention
This invention generally relates to managing data objects in a distributed, heterogenous network environment, and, more specifically, to managing aggregate forms of such data objects across distributed heterogenous resources such that the aggregate forms of the data objects are transparent to the user.
2. Background
Many applications require access to data objects distributed across heterogeneous network resources. Examples of such data objects include office automation products, drawings, images, and electronic E-mail. Other examples include scientific data related to digital images of cross-sections of the human brain, digital sky survey image files, issued patents, protein structures, and genetic sequences. In a typical scenario, data objects are generated at multiple sites distributed around the country. Data objects related to a common topic or project are organized into a collection for access. If the data sets are located at different sites, efficient access usually requires gathering the data sets at a common location. The resulting collection must then be archived to guarantee accessibility in the future. The management of data objects is typically complicated by the fact that the data objects may be housed in diverse and heterogeneous computer-based systems, including database management systems, archival storage systems, file systems, etc. To efficiently make use of these data objects, a unified framework is needed for accessing the data objects from the numerous and diverse sources.
Conventional systems for managing data include those depicted in U.S. Pat. Nos. 6,016,495; 5,345,586; 5,495,607; 5,940,827; 5,485,606; 5,884,310; 5,596,744; 6,014,667; 5,727,203; 5,721,916; 5,819,296; and 6,003,044.
U.S. Pat. No. 6,016,495 describes an object-oriented framework for defining storage of persistent objects (objects having a longer life than the process that created it). The framework provides some core functionalities, defined in terms of several classes (e.g., Access Mode, CachedEntity Instance, TransactionManager, DistributedThreadContext, and ConnectionManager) and user extensible functionalities that can be modified to provide access according to the persistent storage being used. The concept of a “container” as discussed in the patent simply refers to a logical grouping of class structures in a persistent storage environment, and is different from the concept of “container” of the subject invention as can be seen from the embodiment, later described.
U.S. Pat. No. 5,345,586 describes a data processing system consisting of multiple distributed heterogeneous databases. The system uses a global data directory to provide a logical data model of attributes and domains (type, length, scale, precision of data) and a mapping (cross-reference) to physical attributes (and tables) residing in multiple (possibly remote) databases. The global data directory stores route (or location) information about how to access the (remote) databases. The cross-reference information is used to convert the values from the physical databases into a consistent and uniform format.
U.S. Pat. No. 5,495,607 describes a network administrator system that uses a virtual catalog to present an overview of all the file in the distributed system. It also uses a rule-based monitoring system to monitor and react to contingencies and emergencies in the system.
U.S. Pat. No. 5,940,827 describes a method by which database systems manage transactions among competing clients who seek to concurrently modify a database. The method is used for maintaining cache coherency and for copying the cache into the persistent state.
U.S. Pat. No. 5,485,606 describes a method and system for backing up files into an archival storage system and for retrieving them back into the same or different operating system. To facilitate this function, the system writes a directory file, for each data file, containing information that is specific to the operating system creating the file as well as information common to other operating systems that can be utilized when restoring the file later.
U.S. Pat. No. 5,884,310 describes a method for integrating data sources using a common database server. The data sources are organized using disparate formats and file structures. The method extracts and transforms data from the disparate data sources into a common format (that of the common database server) and stores it in the common database for further access by the user.
U.S. Pat. No. 5,596,744 describes a method for sharing of information dispersed over many physical locations and also provides a common interface for adapting to incompatible database systems. The patent describes a Federated Information Management (FIM) architecture that provides a unified view of the databases to the end user and shields the end user from knowing the exact location or distribution of the underlying databases.
The FIM uses a Smart Data Dictionary (SDD) to perform this integration. The SDD contains meta-data such as the distribution information of the underlying databases, their schema and the FIM configuration. The SDD is used to provide information for parsing, translating, optimizing and coordinating global and local queries issued to the FIM.
The SDD uses a Cache Memory Management (CMM) to cache meta-data from SDD into local sites for speeding up processing. The patent describes several services that use the FIM architecture. The patent also describes methods for SQL query processing (or DBMS query processing).
U.S. Pat. No. 6,014,667 describes a system and method for caching directory information that may include identification information, location network addresses and replica information for objects stored in a distributed system. These directory caches are located locally and used for speeding up access since directory requests need not be referred to a remote site. The patent deals with caching of directory information in order to reduce traffic. The patent also allows for replicated data addresses to be stored in the cache.
U.S. Pat. No. 5,727,203 is similar to U.S. Pat. No. 5,940,827 but is restricted to object-oriented databases.
U.S. Pat. No. 5,721,916 describes a method and system for making available a shadow file system for use when a computer gets disconnected from a network which allowed it to access the original file system. The system transparently copies the file from the original file system to a local system whose structure is recorded in a local file database. When no longer connected to the network, the access to the file is redirected to the shadow file.
U.S. Pat. No. 5,819,296 describes a method and apparatus for moving (migrating) large number of files (volumes) from one computer system to another. Included are methods for moving files from primary storage to secondary storage and from one system to another system. In this latter case, the system copies the directory information, and the files that need to be migrated are manually copied. Then, the directory structure merged with the new storage system. The patent discusses moving files residing in volumes which are physical storage partitions created by system administrators.
U.S. Pat. No. 6,003,044 describes a system and method to back up computer files to backup drives connected to multiple computer systems. A controller system allocates each file in a backup set system to one or more multiple computers. Each of the multiple computer systems is then directed to back up files in one or more subsets, which may be allocated to that computer system. The allocation may be made to optimize or load balance across the multiple computer systems.
A problem which plagues such systems is the overhead involved in accessing archived individual data objects from a remote site. Remote accesses such as this are typically fraught with delay caused primarily by the high latency of archival resources such as tape and, to a lesser degree, the network latency and system overhead. This delay limits the effectiveness of such systems. To overcome the delay, the user might manually aggregate data objects using tools provided by the operating systems or third parties, and copy the data to a nearby facility. However, this requires the user to be familiar with the physical location of the data objects and manner in which they are aggregated and stored, a factor which further limits the effectiveness of the system.
Consequently, there is a need for a system of and method for managing data objects distributed across heterogenous resources which reduces or eliminates the delay or latency characteristic of conventional systems.
There is also a need for a system of and method for managing data objects distributed across heterogeneous resources in which the physical location of and manner in which the data objects are stored is transparent to the user.
There is also a need for a system of and method for providing a data aggregation mechanism which transparently reduces overhead and delay caused by the high latency of archival resources.
There is further a need for a system of and method for managing data objects distributed across heterogenous resources which overcomes one or more of the disadvantages of the prior art.
The objects of the subject invention include fulfillment of any of the foregoing needs, singly or in combination. Further objects and advantages will be set forth in the description which follows or will be apparent to one of ordinary skill in the art.