“Heterogeneous” is a term used to describe groups of data sources where the data sources are of different form or structure. The data sources can include databases and a variety of structured documents.
Existing systems only allow views over individual data sources to be created by database designers/administrators or knowledgeable users who have extensive knowledge of the data contained or referred to by the respective source. Existing systems either do not support the creation of views of data over heterogeneous data sources and are incapable of learning the relationships among the data components that are used for creating those views. For instance, traditional relational databases allow database designers/administrators to create views using only data components defined in the schema of the database.
An editor or report generator may allow data components from various sources to be imported into a compound document (creating a specific view of the data) but makes no attempt to learn the relationships among the imported data components. More recently, enterprise portals allow users to navigate across multiple components from a variety of data sources. They provide an environment that allows developers to quickly build the necessary logic to link the data sources. Typically, the schemas from all of the data sources within an enterprise can be imported by a developer to build logical joins from data source to data source. Nevertheless, as in traditional databases, the relationships among the data components are pre-defined for the average users. The system does not attempt to deduce unknown relationships from views of data created by average users (that is, people who are not responsible and not necessarily skilled in the art of the administration of computer systems and databases).
Allowing an average user to create views of data across heterogeneous data sources presents many problems. The user does not have extensive knowledge of all the data that is available, the definitions of the data, or the relationships among the data components. Nevertheless, the user is typically familiar with data from a few data sources. As in enterprise portals, a system is required to track the available data and store the definitions of the data and their relationships. At the same time, the views of data created by the users typically include joins among data sources with which they are familiar. The additional information could provide new insight to the data and could be utilized to establish new logical joins between the data sources.
The World Wide Web (or simply, the “Web”) provides users with networked access to large amounts of information from a large number of information sites. However, much of that information remains technically and/or practically inaccessible due to being stored in database systems of varying forms. Also, it is difficult for users to collate information from many different data sources, where the desired data may be stored on the Web in some combination of database systems or structured documents.
The collated information can be referred to as a “view” of the data Commonly-used relational database management systems (RDBMS), such as provided by Oracle™, often provide users with a graphical user interface (GUI) to design views across tables in the database system. These GUIs are designed to remove the need for the user to create views by directly writing SQL queries. However these views are typically limited to tables in the RDBMS.
More general reporting systems, such as Brio Intelligence™ (Brio Software, Inc.) and Crystal Reports™ (Crystal Decisions) allow users to design views or reports across known sets of data sources, where the data sources are generally accessed via proprietary wrappers. Users can design reports by selecting data sources and viewing all the data components (or fields) that can be used in the reports. Data components of interest can then be selected for inclusion in the report. However, this method requires that the report designer understands the relationships between data components of different data sources. No automated use is made of relationships that are learned from reports designed by others. Such a method is typically suited to smaller corporate environments, where the people who generate the reports are usually very familiar with the different corporate data sets.
What is desirable is a means for users of information to effortlessly create views across heterogeneous data sources (i.e., data sources of different form and structure) without having to “personally” know or understand the relationships between data components of the different data sources. For example, data is often duplicated in different data sources in a corporate environment, as different departments tend to “manage” their own data. Knowledge of what data components represent the same information is often critical in the design of new views of data that incorporate data from these disparate data sources.
Another limitation of prior art arrangements is that the procedures that must be followed by users to create new reports or views in existing report generation systems are typically designed for users having a good understanding of database terminology and procedures frequently used for report generation. This class of user is often satisfied with an approach where the data is first collected and then a graphical form of the report (e.g., table, line graph, scatter plot) is subsequently decided. However, users increasingly expect to move more directly to their desired end result. For example, if a user already knows that the report should be presented as a line graph, there should be no need for the user to first collect the data in a table then create a graph from the table.