1. Field of the Invention
The present invention relates to a method for storing and retrieving complex objects. In particular, this invention relates to a system and method for integrating object oriented objects with a digital library.
2. Description of the Related Art
The lowering of data storage costs made it feasible to store various forms of data, including digitized data. Digitized data primarily consists of digitized visual images, audio and video, although it is not limited to those types of data. Practitioners in the entertainment industry rely on digitized audio and video data. Advanced information management system structures are required to store and manage digitized data.
Just like in many other branches of computer science, object oriented representation data modelling techniques are used in the storage and retrieval of digitized data. Objects are complex forms of representing data elements and their interrelationships. Functions which were represented as procedures in conventional programming methodologies are represented as encapsulated data within objects. The advantages of using an object oriented methodology can be further enhanced if the objects can be distributed across a network. Widespread use of networks including local-area and wide-area networks and appropriate software techniques have made it possible to use distributed objects.
However, there are several problems in using distributed objects one of which is storing objects. A substantial use of program as well as programmer time is required, if a complex network of objects and their relationships have to be created each time an object oriented application is launched. Therefore maintaining persistence of objects is critical to using computing resources efficiently.
Objects must maintain their states, even though the programs that created them have terminated. In order to maintain persistence, the objects must be stored in data stores like databases or files. Object database management systems provide transparent persistence of objects. However, the users of volumes of preexisting data, as in digitized libraries, need to find a way of accessing the data as well as for representing the data in an object oriented format. There also has to be a means of committing the data in memory to a non-volatile store. In essence, a distributed object oriented system requires a storage layer that stores persistent objects.
An example of a distributed system with a storage layer that achieves persistence of objects is a digital library. A conceptual view of a digital library is shown in FIG. 1 and described in detail in U.S. Pat. No. 5,649,185 to Antognini et al. which is incorporated herein by reference. It includes a library server 110, one or more object servers 120, 121, and one or more library clients 130, 131. The library server, object servers and library clients, each have permanent storage media associated with it. That is, the library server 110 has a library catalog 140, the object servers include object stores 150, 151, and the library client includes a client cache 160. Also a communication isolator allows the library server, object servers and library clients to communicate with one another without the need for application programs in the clients being concerned with complex communication protocols. The library server, object servers and clients are arranged in a distributed manner and are connected by a communication network such as a wide area network or local area network. The library clients are typically implemented on a workstation and the library and object servers are implemented on a host processor which can be a workstation or a main-frame computer. Library clients send requests to the library servers to store, retrieve and update objects stored in the object servers. They also send requests to query the object indices and descriptive information stored in the library catalog.
The Java programnming language (hereinafter "Java") is one of the languages commonly used for representing and storing digitized data, whose features cannot be fully exploited without a mechanism for persistent storage of objects. Java provides a few basic data types, apart from which everything else in Java is an object. One of the advantages of Java pertinent to the present invention is that Java has automatic garbage collection, so that programmers do not have to worry about memory leaks. This is particularly advantageous to creators of digitized data, where large volumes of data are encountered.
An advantage of Java is that Java compilers generate an architecture-neutral compiled code that runs on several common computer architectures. Since there are few implementation-dependent aspects in Java, it is very easily portable. It is generally advantageous to provide the ability to access digitized data in a platform independent fashion. The Java programming language is becoming an increasingly popular means of providing digitized data in a platform independent manner. As in other object oriented systems, Java requires a storage layer that allows persistent storage of objects.
A dictionary is a data type commonly used in object oriented languages, including Java. A dictionary is an ordered collection of elements, each being identified with a key. Each key is associated with a value. A value can be any one of a variety of data types including a content independent binary large object (hereinafter a "blob") which can range in size up to several gigabytes of uninterpreted binary data. Keys can be strings or values. The keys in a dictionary are unique and may not be duplicated within the same dictionary. Given a key, the dictionary returns the value associated with the key. Several operations can be performed on the dictionaries, including adding, retrieving and removing items; counting occurrences of an item; and closing and deleting of dictionaries. A key value dictionary (KVD) is a subclass of a dictionary that inherits all the attributes, represented by keys and their corresponding values, from a dictionary. In addition to the attributes that a KVD inherits from a dictionary, it comprises an additional attribute, represented by a key, called a "structural type". The "structural type" attribute stores information corresponding to the structural type of the object. A hash table could be used instead of a dictionary in a preferred embodiment.
Generic Java objects can be stored as KVDs. In this case, the keys represent the object attributes and the values represent the values of the object attributes. Storage layers, including digital libraries, have been used to store persistent objects in a form according to the C++ programming language. However, they have not been used to store objects in many other object oriented languages, including Java. A storage layer, including a digital library is useful since it can store any kind of object in the form of a blob, without concern for the content of the blob.
The semantic information within a blob is contained in metadata, which refers to data about other lower forms of data. Determining the content of the blob is left to application programs. Typically, application tools do not have the necessary information as to what each blob contains and how it is to be processed.
Therefore, if objects, including Java objects, are to be stored in a storage layer like a digital library, there has to be a way of storing information (e.g., metadata) about the structure of the objects as well as a way to retrieve and reconstruct the objects. That is, information about the blob (i.e., metadata) should be stored as index information in order to later search for and retrieve the blob from the digital library. Today, such metadata is manually determined and entered by a user as blob index information. However, the manual process is cumbersome and inefficient.