1. Field of the Invention
The present invention generally relates to data processing and more particularly to accessing data in data repositories with a system comprising user defined table functions (UDTFs) and a metadata processor.
2. Description of the Related Art
Data repositories are subject oriented, integrated, time-variant collections of relatively large volumes of transactional and archived data that facilitate decision support services. Data repositories are often implemented as databases that are managed by database management systems. Databases are computerized information storage and retrieval systems. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses. Data repositories may also be implemented as simply a collection of text files that are formatted according to a particular protocol. Alternatively, data repositories may also be implemented as XML based databases.
A relational database management system (RDBMS) is a database management system that manages relational databases and is capable of storing and retrieving large volumes of data. Further, large scale relational database management systems can be implemented to support thousands of users accessing databases via a wide assortment of applications. An RDBMS can be structured to support a variety of different types of operations for a requesting entity (e.g., an application, the operating system or an end user). Such operations can be configured to retrieve, add, modify and delete data being stored and managed by the RDBMS. Standard database access methods support these operations using high-level query languages, such as the Structured Query Language (SQL). One of the primary operations performed with SQL is querying (also referred to herein as retrieving or selecting) data from data structures within a database.
The standard manner of retrieving data from a relational database is by directly querying the data structures in which the data resides. In the case of relational databases, these data structures may be tables or indexes. However, for a variety of reasons including those pertaining to security and optimal system performance, it is not preferred to allow database users to directly access database objects (particularly tables) in order to retrieve data. One alternative to providing direct access to database objects is to utilize user defined table functions (UDTFs).
UDTFs, like database views, are programmatically defined objects that directly reference database tables and appear to users of an RDBMS to be conventional tables. UDTFs can be based on a query or other appropriate logic and return a two-dimensional array structure as though it were a set of rows and columns, thereby allowing RDBMS users to utilize the UDTF as though it were a table. When referenced (or “called”), UDTFs are configured to execute a query that retrieves data from underlying tables and then provide the retrieved data as an array. With a UDTF, it is possible to treat data stored in proprietary file systems, data returned from an API or data generated by a program, as a table. In addition, in certain environments, UDTFs may be used to precompute and provide aggregated representations of base data. They can also be used to precompute joins between two or more tables, with or without aggregations. In many instances, a UDTF optimizes resource usage associated with expensive joins and aggregations for queries involving large volumes of data spread across multiple data structures in a distributed database environment.
Additionally, UDTFs can be used to create an abstraction layer between database users and the actual database tables being selected. Abstraction layers can be used to provide additional security by providing a strictly controlled manner in which data can be retrieved from a database. Further, a UDTF based abstraction layer can be used to provide users with the ability to query data that is sourced from multiple tables belonging to multiple schemas and residing in multiple databases without the user requiring detailed knowledge of the physical layout of the distributed database environment.
In order to maximize the benefit from the functionality described above, distinct UDTFs can be created to represent each table within a distributed database environment. Those skilled in the art will acknowledge, however, that tables exist in wide variety of forms and have vastly different attributes. For instance, in terms of the number of columns, certain tables may be configured with five columns while others may be configured with ninety columns. One problem with using UDTFs is that, as conventionally implemented, each respective UDTF would require a unique query defined to properly retrieve data from the underlying tables. Therefore, during the handling of a query request, for each UDTF referenced, a unique piece of code utilized for coupling the respective UDTF to a corresponding table would have to be invoked to provide the necessary results. This is a resource intensive and inefficient solution for utilizing UDTFs in a distributed database environment.
Another problem is that, as conventionally defined, UDTFs are dependent on the physical structure of the database remaining stable. In other words, the table and column definitions on which a UDTF depends, should not change. Currently, a UDTF typically needs to be re-defined if the underlying objects referenced by UDTF are changed.
Therefore, there is a need for an improved system and method for defining and using UDTFs. There is a further need for the solution to interface with and support applications that are configured to retrieve data from the repository directly.