1. Technical Field
The present invention relates generally to an improved data processing system and, in particular, to a method and system for improving a relational database system. Still more particularly, the present invention relates to a method, apparatus, and computer instructions for direct linkage of relational database tables and other files for data preparation.
2. Description of Related Art
In order to perform analysis on data currently existing in a relational database, system analysts utilize various data preparation tools in order to prepare data for utilization. For example, to “prepare” data for utilization, system analysts might clean, transform, and consolidate data into a format that is most suitable for analysis. These steps are examples of steps used to “prepare” the data for analysis. An example of a data preparation tool is SAS, available from SAS Institute, Inc. SAS Institute offers a set of enterprise software that allows system analysts to perform business functions, such as data warehousing, data mining, human resources management, financial management, and decision support, etc. SAS offerings include Base SAS and SAS/ACCESS software. Base SAS provides a SAS language, procedures and a macro facility for data analysis and reporting. SAS/ACCESS software provides an interface between Base SAS and various types of relational databases from different vendors, so that different relational database data may be accessed. One type of relational database includes DB2, a product available from International Business Machines Corporation.
In order to analyze data in SAS, input data has to be in SAS dataset format. A SAS dataset includes data values of a data object and variables that define characteristics of the data object. For example, ‘Name’ is a variable having data values of ‘David’ or ‘John’. Currently, SAS includes functionality to load data from an existing SAS dataset into a relational database table by using a DATA step and a PROC step. The DATA step takes raw input data, for example, a SAS dataset, and computes values or applies conditional logic to the dataset to generate an output, such as a report. The PROC step takes a SAS dataset as an input, calls a group of SAS statements, and executes a procedure to analyze data in order to generate a report. The PROC step includes a DBLOAD function that creates a new relational database table and loads a SAS dataset into the newly created table.
The above functionality moves data only in one direction, that is, SAS can load a SAS dataset into a database table. If a user wants to move data in the other direction, that is, load database table data into a SAS dataset, then some other process must be used to take input data from a database table and create an SAS dataset for data preparation. A common process is for the user to manually write database table data into a flat file and write SAS code to read the file into a SAS dataset. This process requires a user to perform multiple steps including writing data to and reading data from an output flat file. These steps increase the time and effort needed to analyze data.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions to directly link relational database table(s) and SAS dataset(s) in a simplified solution that moves data in both directions without the need to perform multiple steps to read and write data to an output flat file.