1. Field of the Invention
The present invention generally relates to generation of suitable data for statistical analysis and, more particularly, to generating query output which is suitable as input to statistical analysis routines.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Regardless of the particular architecture, a DBMS can be structured to support a variety of different types of operations for a requesting entity (e.g., an application, the operating system or an end user). Such operations can be configured to retrieve, add, modify and delete information being stored and managed by the DBMS. Standard database access methods support these operations using high-level query languages, such as the Structured Query Language (SQL). The term “query” denominates a set of commands that cause execution of operations for processing data from a stored database. For instance, SQL supports four types of query operations, i.e., SELECT, INSERT, UPDATE and DELETE. A SELECT operation retrieves data from a database, an INSERT operation adds new data to a database, an UPDATE operation modifies data in a database and a DELETE operation removes data from a database.
Data that is collected and stored in a database can be used for various purposes including know-how management, decision making and statistical analysis. Statistical analysis on data in an underlying database is generally performed by executing suitable analysis routines on query results obtained in response to execution of corresponding queries against the underlying database. Such analysis routines normally require a set of variables as input, which are often measurements that are carried out at specific points in time. However, in some cases the required data may not be available. For example, in retrospective studies which are performed once all data required as input to corresponding analysis routines was collected, data with respect to a given field may not be chronologically standardized, i.e., certain events for various instances of a given entity did not occur with same frequency. In other words, the available data in the underlying database may not match, from a chronological perspective, the needed data that corresponding analysis routines require as input to perform a required statistical analysis.
For instance, assume an analysis routine that is configured to analyze medical data in order to determine whether a given drug X produces satisfactory results in cancer treatment. More specifically, assume that in the context of a medical test series the drug X was administered to 10 patients having a particular tumor which is presumed to be treatable using the drug X. At the time of administration of the drug X, the tumor size is initially measured for each of the 10 patients. Then, the tumor size of each of the 10 patients is measured in follow-up examinations at various intervals in order to track the progress of the tumors. Corresponding tumor size measurements are collected and stored in a database. These tumor size measurements can be retrieved from the database for statistical analysis purposes by issuing a suitable query against the database.
Assume now that in the given example the analysis routine is configured to determine an average tumor reduction for all patients after one month, two months and one year after administration of the drug X. However, for various reasons tumor size measurements were not carried out for all patients exactly one month, two months and/or one year after administration of the drug X. Accordingly, if the analysis routine is run on a query result which only returns a list of available tumor size measurements for each patient from the database, the results produced by the analysis routine can be inaccurate and invalid.
Therefore, there is a need for an effective technique for generating query output which is suitable as input to statistical analysis routines.