For many years, businesses have used computers to manage information such as numbers and text, primarily in the form of coded data. However, business data represents only a small portion of the world's information. As storage, communication, and information processing technologies advance and the cost of these technologies decrease, it becomes more feasible to digitize and store large volumes of other various types of data. Once digitized and stored, the data should be available for distribution on demand to users at their place of business, home, or other locations.
New digitization techniques have emerged in the last decade to digitize images, audio, and video, giving rise to a new type of digital multi-media information. These multimedia objects are significantly different from the business data that computers managed in the past, often requiring more advanced information system infrastructures with new capabilities, such as “digital libraries” or content management systems.
New digital technologies can do much more than replace physical objects with their electronic representations. These technologies enable instant access to information; support fast, accurate, and powerful search mechanisms; provide new “experiential” (i.e., virtual reality) user interfaces; and implement new ways of protecting the rights of information owners. These properties make digital library solutions attractive and acceptable to corporate information service organizations as well as to the information owners, publishers, and service providers.
Generally, business data is created by a business process, such as an airline ticket reservation, a deposit at a bank, or a claim processing at an insurance company. Most of these processes have been automated by computers and produce business data in digital form such as text and numbers, i.e., structured coded data. In contrast, the use of multimedia data is not fully predictable. Consequently, multimedia data cannot be fully pre-structured because it is the creative result of a human being or it is the digitization of an object of the real world such as, for example, x-rays or geophysical mapping, rather than a computer algorithm.
The average size of business data in digital form is relatively small. A banking record that comprises a customer's name, address, phone number, account number, balance, etc., and may represent only a few hundred characters and a few hundreds or thousands of bits. The digitization of multimedia information such as image, audio, or video produces a large set of bits called an “object” or binary large objects (“blobs”). For example, a digitized image could take as much as 30 MB of storage. The digitization of a movie, even after compression, may take as much as 3 GB to 4 GB of storage.
Multimedia information is typically stored as much larger objects, ever increasing in quantity and therefore requiring special storage mechanisms. Conventional business computer systems have not been designed to directly store such large objects. Specialized storage technologies may be required for certain types of information such as media streamers for video or music. Because certain multimedia information needs to be preserved or archived, special storage management functions are required for providing automated backup and migration to new storage technologies as they become available and as old technologies become obsolete.
For performance reasons, multimedia data is often placed in the proximity of the users with the system supporting multiple distributed object servers. Consequently, a logical separation between applications, indices, and data is required to ensure independence from any changes in the location of the data.
The indexing of business data is often embedded into the data itself. When the automated business process stores a person's name in the column “NAME”, it actually indexes that information. Multimedia information objects usually do not contain indexing information. Developers or librarians typically create this “meta data” or “metadata”. The indexing information for multimedia information is typically kept in standard business-like databases separated from the physical object.
In a digital library or a content management system, the multimedia object can be linked with the associated indexing information since both are available in digital form. Integration of this legacy catalog information with the digitized object is one of the advantages of content management or digital library technology. Different types of objects can be categorized differently as appropriate for each object type. Existing standards such as, for example, MARC records for libraries or Finding Aids for archiving of special collections can be used when appropriate.
The indexing information used for catalog searches in physical libraries is typically the name of the book, author, title, publisher, ISBN, etc., enriched by other information created by librarians. This other information may comprise abstracts, subjects, keywords, etc. In contrast, digital libraries may contain the entire content of books, images, music, films, etc.
Technologies are desired for full text searching, image content searching (searching based on color, texture, shape, etc.), video content searching, and audio content searching. Each type of search is usually conducted by a specialized search engine. The integrated combination of catalog searches, for example, using SQL in conjunction with content searches provides powerful search and access functions. These technologies can also be used to partially automate further indexing, classification, and abstracting of objects based on content. The term multi-search refers to searches employing more than one search engine, for example text and image search.
To harness the massive amounts of information spread throughout these many networks of varying types of content, a user should be able to simultaneously search numerous storage facilities without considering the particular implementation of each storage facility. Object-oriented approaches are generally better suited for such complex data management. The term “object-oriented” refers to a software design method that uses “classes” and “objects” to model abstract or real objects. An “object” refers to the main building block of object-oriented programming, and is a programming unit that has both data and functionality (i.e., “methods”). A “class” defines the implementation of a particular kind of object, the variables and methods the object uses, and the parent class to which the object belongs. In this context, the term datastore is a used to refer to a generic data storage facility, whereas heterogeneous is used to indicate that the datastores need not be of the same type. A federated datastore is composed as an aggregation of several heterogeneous datastores configured dynamically by the application user.
Currently, the ability to search across many different types of datastores in many different geographical locations is achieved by the use of a federated datastore system, which provides mechanisms for conducting a federated multi-search and update across heterogeneous datastores. For example, each datastore may represent a company or division of a company. A division manager requires access to his or her local datastore but not to the datastores of other division managers. Conversely, a corporate officer may require access to the datastores of all the divisions, located, for example, in New York, San Francisco, London, and Hong Kong. A federated system would search all the databases, combine and aggregate the data into one report, and present the report to the corporate officer.
One example of a federated datastore system is the IBM Enterprise Information Portal or EIP that allows federated searches across heterogeneous datastores, processes the results and updates according to the application logic. This functionality is provided by a consistent set of object oriented classes within the EIP framework implemented in, for example, Java and C++/ActiveX. These classes provide the query processing capabilities and aggregate the results from various datastores participating in the federation.
Content management datastores provide modeling constructs to represent and manipulate document and folder metaphors. Such a model is referred to as content management data model. Within the content management system, a folder has some attributes and may contain some other documents and folders. A document represents a physical document such as an insurance claim and has some attributes as well as zero or more textual or binary contents. These contents are also known as parts. Examples of textual content could be an insurance policy in an XML format, or an abstract of a technical report. Binary contents could be the JPEG picture of an automobile involved in an accident, or a video clip of a news report.
A federated datastore shares the same basic interface as other datastores such that the interface provides access transparency within the datastore family. Consequently, the federated datastore also provides the same modeling constructs consistent with other datastores, which sometimes referred to as a federated content management data model or federated content model for short.
Applications using federated searches and updates require a seamless extension of the federated modeling constructs of documents and folders. That is, the users wish to extend the document and folder concepts to span across disparate datastores and be able to search and manipulate them transparently. Specifically, the extensions should include the ability to perform federated search and update on federated folders and documents.
While the federated system has proven to be quite useful, additional enhancements would extend the capabilities of such a system. For example, the results of the search performed by the federated system are not persistent; i.e., the search results are not saved for later use by the client of the federated system. Saving the results of the search would allow the client to manipulate the search results, forward the search results to other users, use the search results in a report, save the results for later comparison, route the results into a document workflow, and so forth. The extension of the capability of the federated system does not need to be limited to support persistency of the search results alone. In general, the extension covers the persistency aspect of the federated content model which includes the capability to manage federated folders.
What is therefore needed is a system, a service, a computer program product, and an associated method that would allow users to save the search results in a persistent manner, to manage federated folder, and to fully support a persistent federated content model. The need for such an implementation has heretofore remained unsatisfied.