1. Field of the Invention
The present invention generally relates to data processing in databases and, more particularly, to constructing queries capable of returning classified information related to data in a database.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Regardless of the particular architecture, a DBMS can be structured to support a variety of different types of operations for a requesting entity (e.g., an application, the operating system or an end user). Such operations can be configured to retrieve, add, modify and delete information being stored and managed by the DBMS. Standard database access methods support these operations using high-level query languages, such as the Structured Query Language (SQL). The term “query” denominates a set of commands that cause execution of operations for processing data from a stored database. For instance, SQL supports four types of query operations, i.e., SELECT, INSERT, UPDATE and DELETE. A SELECT operation retrieves data from a database, an INSERT operation adds new data to a database, an UPDATE operation modifies data in a database and a DELETE operation removes data from a database.
To retrieve data from a database, e.g., using a SELECT operation in the case of an SQL query, one or more result fields are specified. These result fields are the fields the user is requesting data for. For example, consider the following SQL query issued against an EMPLOYEE database table:
SELECT EMP_NO, FIRSTNAME, MIDINIT, LASTNAME,FROM EMPLOYEEWHERE Age >65
In this example, the SELECT statement specifies a unique identification number (EMP_NO), a first name (FIRSTNAME), and a last name (LASTNAME). The SQL query further includes a FROM statement indicating that the result fields are found in the EMPLOYEE table. Moreover, the SQL query includes a WHERE clause which specifies the conditions of query. In this example, only one condition is specified, i.e., that the employees for which records are returned be over 65 years of age.
In the example above, only three result fields are requested. Such a request is within the limits of conventional database applications. However, database applications do have limits on the number of columns that can be returned for given request, and some environments are now pushing those limits. One environment in which the limits of databases are being strained is in research where data in a tabular format is required for input into analysis routines. For example, MAGE, or Microarray Gene Expression, is a method of obtaining information about genes. One of the central principles of MAGE is that data objects are regarded as 3-dimensional matrices, with bioassays (experimental steps or conditions) along a first dimension, design elements (spots) along a second dimension and quantitation types (e.g., signal intensity, background intensity) along a third dimension. Bioassay data objects can be represented in one of two ways: as a set of vectors (in the form: value, dimension1, dimension2, dimension3), or as a 3-D matrix (BioDataCube). Transformations (e.g., filtering, normalization) can be applied to one or more bioassay data objects, resulting in derived data objects. A transformation involves computing values of the resulting 3-D matrix from the values of source matrices, and it also transforms dimensions.
Storing multiple Microarray Gene Expressions that have Bioassay objects which exploit the three dimensions of the matrix in conjunction with multiple steps of each experiment which produce Bioassay data that is normalized could result in massive amounts of data stored over many columns and tables. Query statements could be composed to select excessive numbers of columns in order to mine and search the results of such analysis results.
The limit placed on database applications' abilities to return large numbers of columns is a substantial limitation that prevents certain functions from being performed. As the need to store and retrieve more and more data increases, this limitation will render database applications in capable of performing critical functions, and therefore require implementation of alternative solutions.
Therefore, there is a need for a database environment capable of accommodating requests for voluminous data.