I. BACKGROUND OF THE INVENTION
II. SUMMARY OF THE INVENTION
III. BRIEF DESCRIPTION OF THE DRAWINGS
IV. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
1 Architecture
2 Packaging
3 Class Overview
4 Class Dynamics
5 Object Collection classes
5.1 Collection
5.2 SequentialCollection
5.3 Folder
5.4 Parts
5.5 QueryEvaluator
5.6 QueryableCollection
5.7 FederatedCollection
5.7.1 Methods
5.8 Results
5.9 Iterator
5.10 SequentialIterator
5.11 FederatedIterator
5.11.1 Methods
6 Object Query classes
6.1 QueryManager
6.2 QueryBase
6.3 Query
6.4 ParametricQuery
6.5 TextQuery
6.6 ImageQuery
6.7 CombinedQuery
6.8 OnDemandQuery
6.9 FederatedQuery
7 Data Object classes
7.1 DataObjectBase
7.2 DataObject
7.3 DDO Basexe2x80x94Dynamic Data Object Base
7.4 DDOxe2x80x94Dynamic Data Object
7.5 Pid
8 XDO Classes
9 Data Access classes
9.1 Datastore
9.2 DatastoreDL
9.3 DatastoreTS
9.4 DatastoreQBIC
9.5 DatastoreQD
9.6 DatastoreFederated
9.6.1 Methods
9.6.2 Federated query string
9.6.3 Federated query processing
9.7 ResultSetCursor
10 Schema Mapping
11 Federated datastore mapping components
12 Schema Mapping Classes
13 Persistency support
14 Supporting classes
12 Sample Programs
12.1 Queryable Collection in DL
12.2 Combined Query in DL
12.3 Folder processing in DL
12.4 Example of Add, Retrieve, Update, Delete and Open from XDO object
The present invention relates to a system and method for representing and searching multiple heterogeneous datastores (datastore is a term used to refer to a generic data storage facility, such as a relational data base, flat-file, hierarchical data base, etc.) and managing the results of such searches.
For nearly half a century computers have been used by businesses to manage information such as numbers and text, mainly in the form of coded data. However, business data represents only a small part of the world""s information. As storage, communication and information processing technologies advance, and as their costs come down, it becomes more feasible to digitize other various types of data, store large volumes of it, and be able to distribute it on demand to users at their place of business or home.
New digitization technologies have emerged in the last decade to digitize images, audio, and video, giving birth to a new type of digital multimedia information. These multimedia objects are quite different from the business data that computers managed in the past, and often require more advanced information management system infrastructures with new capabilities. Such systems are often called xe2x80x9cdigital libraries.xe2x80x9d
Bringing new digital technologies can do much more than just replace physical objects with their electronic representation. It enables instant access to information; supports fast, accurate, and powerful search mechanisms; provides, new xe2x80x9cexperientialxe2x80x9d (i.e. virtual reality) user interfaces; and implements new ways of protecting the rights of information owners. These properties make digital library solutions even more attractive and acceptable not only to corporate IS organizations, but to the information owners, publishers and service providers.
Creating and Capturing Data
Generally, business data is created by a business process (an airline ticket reservation, a deposit at the bank, and a claim processing at an insurance company are examples). Most of these processes have been automated by computers and produce business data in digital form (text and numbers). Therefore it is usually structured coded data. Multimedia data, on the contrary, cannot be fully pre-structured (its use is not fully predictable) because it is the result of the creation of a human being or the digitization of an object of the real world (x-rays, geophysical mapping, etc.) rather than a computer algorithm.
The average size of business data in digital form is relatively small. A banking recordxe2x80x94including a customers name, address, phone number, account number, balance, etc.xe2x80x94represents at most a few hundred characters, i.e. few hundreds/thousands of bits. The digitization of multimedia information (image, audio, video) produces a large set of bits called an xe2x80x9cobjectxe2x80x9d or xe2x80x9cblobsxe2x80x9d (Binary Large Objects). For example, a digitized image of the parchments from the Vatican Library takes as much as the equivalent of 30 million characters (30 MB) to be stored. The digitization of a movie, even after compression, may take as much as the equivalent of several billions of characters (3-4 GB) to be stored.
Multimedia information is typically stored as much larger objects, ever increasing in quantity and therefore requiring special storage mechanisms. Classical business computer systems have not been designed to directly store such large objects. Specialized storage technologies may be required for certain types of information, e.g. media streamers for video or music. Because certain multimedia information needs to be preserved xe2x80x9cforeverxe2x80x9d it also requires special storage management functions providing automated back-up and migration to new storage technologies as they become available and as old technologies become obsolete.
Finally, for performance reasons, the multimedia data is often placed in the proximity of the users with the system supporting multiple distributed object servers. This often requires a logical separation between applications, indices, and data to ensure independence from any changes in the location of the data.
Searching and Accessing Data
The indexing of business data is often imbedded into the data itself. When the automated business process stores a person""s name in the column xe2x80x9cNAME,xe2x80x9d it actually indexes that information. Multimedia information objects usually do not contain indexing information. This xe2x80x9cmeta dataxe2x80x9d needs to be created in addition by developers or librarians. The indexing information for multimedia information is often kept in xe2x80x9cbusiness likexe2x80x9d databases separated from the physical object.
In a Digital Library (DL), the multimedia object can be linked with the associated indexing information, since both are available in digital form. Integration of this legacy catalog information with the digitized object is crucial and is one of the great advantages of DL technology. Different types of objects can be categorized differently as appropriate for each object type. Existing standards like MARC records for libraries, Finding Aids for archiving of special collections, etc. . . . can be used when appropriate.
The indexing information used for catalog searches in physical libraries is mostly what one can read on the covers of the books: authors name, title, publisher, ISBN, . . . enriched by other information created by librarians based on the content of the books (abstracts, subjects, keywords, . . . ). In digital libraries, the entire content of books, images, music, films, etc. are available and xe2x80x9cnew contentxe2x80x9d technologies are needed; technologies for full text searching, image content searching (searching based on color, texture, shape, etc. . . . ), video content searching, and audio content searching. The integrated combination of catalog searches (e.g. SQL) with content searches will provide more powerful search and access functions. These technologies can also be used to partially automate further indexing, classification, and abstracting of objects based on content.
To harness the massive amounts of information spread throughout these networks, it has become necessary for a user to search numerous storage facilities at the same time without having to consider the particular implementation of each storage facility. Many approaches have been made to provide effective tools for performing xe2x80x9cfederatedxe2x80x9d searches of multiple heterogeneous storage facilities, each having diverse data types, and for managing the results of these searches. A comprehensive survey on the federation of heterogeneous database systems can be found in Sheth, A. P. and Larson, J. A., xe2x80x9cFederated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases,xe2x80x9d ACM Computing Surveys, vol. 22, No. 3, September 1990, pp. 183-236.
Some particular approaches include, for example, U.S. Pat. Nos. 5,596,744 (Dao et al.), and 5,634,053 (Noble et al.) which disclose Federated Information Management (FIM) architectures for providing users with transparent access to heterogeneous database systems with multimedia information. However, these architectures rely on complex application software for translation and interaction between various entities in the system.
Object-oriented approaches are generally better suited for such complex data management. The term xe2x80x9cobject-orientedxe2x80x9d refers to a software design method which uses xe2x80x9cclassesxe2x80x9d and xe2x80x9cobjectsxe2x80x9d to model abstract or real objects. An xe2x80x9cobjectxe2x80x9d is the main building block of object-oriented programming, and is a programming unit which has both data and functionality (i.e., xe2x80x9cmethodsxe2x80x9d). A xe2x80x9cclassxe2x80x9d defines the implementation of a particular kind of object, the variables and methods it uses, and the parent class it belongs to.
An example of a known object-oriented approach to managing heterogeneous data from heterogeneous data bases is found in U.S. Pat. No. 5,557,785 (Lacquit et al.). Lacquit provides for the searching of multiple heterogeneous data managers, such as Global Information Service (GIS), Relational DataBase Management Service (RDBMS), and Visual Data (VD). This approach utilizes a first object-oriented class which describes properties common to all objects manipulated by the information system. A second class defines the properties relative to the use of functions of the various data managers. Lacquit also models particular databases as specific instantiations of a generic data manager class, to enhance their accessibility. However, the Lacquit approach does not provide a federated datastore object which can represent multiple heterogeneous datastores at any given time, or which is directly manipulatable by a user/application to provide a user/application with the ability to xe2x80x98seexe2x80x99 or directly access different datastores and features of them through the federated datastore object.
Other known programming tools that can be used for developing search and result-management frameworks include IBM VisualAge C++, Microsoft Visual C++, Microsoft Visual J++, and Java. These systems provide tools such as collection objects and iterators, however, these systems only employ flat collections which do not provide users with useful access to sub-units within the collections.
It is therefore an object of the present invention to provide multi-searching and updating capabilities across a combination of heterogeneous datastores.
It is an object of the present invention to provide a flexible mechanism which can employ a combination of different types of search engines selectable by users, e.g., SearchManager/TextMiner, QBIC, etc., and allow users to formulate and submit parametric, text, and/or image queries against heterogeneous datastores and get back the results in a consistent, uniform format.
It is an object of the present invention to allow an application program to manipulate data objects as a group or collection and at the same time preserve the sub-grouping relationship that exists between the objects. Such a collection, can be used to represent the results of a query against heterogeneous datastores so that the combined results constitute a collection of collections of results from each datastore. The client application/user then has a choice to iterate over the whole combination of results as a flat collection or to iterate over each subcollection individually, while preserving the sub-grouping information and relationships.
It is an object of the present invention to allow a user/application to combine several datastores in a digital library domain to form a unified conceptual view in which multi-search queries can be formulated, executed, and coordinated to produce results in a consistent format.
It is an object of the present invention to provide these and other capabilities in a common object model for Digital Library types of data access.
Accordingly, the present invention provides a common object model in an object-oriented environment which includes a federated query object, a federated collection object, and a federated datastore object. These three objects separately and together provide client applications/users with capabilities to efficiently and powerfully search, and manage the search results from heterogeneous datastores. The present invention thereby relieves the user/application with the burden of having to directly manipulate each of the heterogeneous datastores, without removing the user""s/application""s ability to directly manipulate features of particular datastores if desired.
For example, the federated query object can coordinate query processing functions such as translation, filtering, merging, and data conversion for multiple heterogeneous datastores in a single query. Subqueries managed by a federated query object include parametric, text, image, SQL, and combined queries, even if the various subqueries are for different datastores (e.g., DB2, Visual Info, On Demand, Digital Library, etc. . . . ). The federated query object can even have another federated query object as a subquery.
The federated collection object returns the query results in a uniform and datastore-neutral format, which can be processed as a flat collection or as sub-collections according to the source datastores.
The federated datastore object can provide a unified conceptual view of all of the included datastores. The federated datastore can combine the participating datastores in two levels: without mapping to reflect the results as a single union; and with mapping of concepts in each datastore to relate/equate data in one datastore to another. The concept mapping enables a user to follow links and join tables as part of a query where the result of a first datastore links to data in another, for example.