Querying over heterogeneous data sources is the challenge of performing a search over data sources having different data models. The challenge also presents itself where disparate data sources have the same data model. In order to query over multiple data sources with multiple data models, a multiplicity of query execution engines is normally required. The input query is normally split up by one monolithic processor which decides a priori which attached execution engine should get which portion of the original input query. The original query is thus monolithically processed to divide up the query into distinct pieces for execution. Each execution engine corresponds to a particular data model or data source. The individual query execution engines then execute their portion of the query and return the results to the monolithic processor. The monolithic processor then has the task of combining the individual query results from each of the query execution engines and stringing them together to form a complete set of query results.
This approach to heterogeneous data querying has the disadvantage of requiring a monolithic processor that can identify and manipulate all possible data sources. This is an ominous task because different data sources have very different API's or models for interacting with their data, and it is not generally feasible or desirable to build a monolithic processor that has knowledge of all data models and can manipulate all possible data sources. For example, if one wished to query over a SQL database or an XML file, the only interface to interact with an XML file is the Document Object Model (DOM), and the only interface to the database is SQL commands. As a result, one would require different code to work with the database and the XML file. The problem is exacerbated if one attempts to build a monolithic processor capable of handling additional data model types as those data types emerge to importance in the field. Under such conditions, the monolithic engine capacity may very well be exceeded by changing requirements and may require a redesign if any additional data model types are added or if an existing data model type is significantly changed.
An additional problem in creating a heterogeneous data source query mechanism is virtual querying. If a data source can be queried easily in one data model type yet it is desirable to structure the query in a second data model query language, then a conversion from one data model query language type may be needed. This need may cause multiple query language conversions requiring multiple sets of hardware and software modules and a corresponding number of optimizers to ensure efficient coding of the queries.
Thus there is a need for an architecture which avoids the problem of designing and building a monolithic query processor which is adaptable for changing query language requirements. Additionally, there is a need for an architecture that avoids the problems associated with converting multiple query languages from one form into another. The present invention addresses the aforementioned needs and solves them with an inventive architecture which is adaptable to changing query environment needs.