1. Field of the Invention
The present invention relates to the field of data processing. More particularly, the present invention relates to a method, system and article of manufacture for processing queries on database record multi-state fields.
2. Background
A database is an organized collection of data records that can be readily searched, sorted and updated. The database is often logically arranged as tables identified by table names, with columns identified by column names and rows of records. Access to the database is typically managed by a software application known as a database management system (DBMS). RDBMS are a particular type of database management system (DBMS) applying a tuple and field architecture and have grown to be the dominant form of DBMS.
Before the advent of networks, RDBMS were employed on standalone computers, providing limited location service of data needs. However, RDBMS applications have continued to grow with and on the different hardware and network architectures to provide the organization of data, storage, updating, programmed and on demand retrieval of requested information based on the data in storage. For example, in a distributed computing environment, an application running on a first computer may require access to a database residing on a second computer. The computers each include some form of memory and a processor for the storage and execution of instructions, but they are also interconnected through communications equipment to collectively form a computer network so that they may share information. A three-tier model for database access is a common computer network architecture. As the name implies, the three-tier model includes three layers of processing, namely (1) a client, (2) a server and (3) a database server. A person, using a client application program that needs to access a database, will be using the client. For example, the client application may be an Internet browser and the database may be a directory of an Internet search engine. The client host communicates a request to the server host. The request typically includes a uniform resource locator (URL) that identifies the server host, the communications protocol, and the desired resource. The server determines the location of the database and establishes a connection to the database host, to include any handshaking necessary to secure the connection. Thereafter, the server machine relays the request to the database server. The request is managed by a DBMS also located on the database server. The requested data is transmitted from the database host to the server host, and it is thereafter relayed to the client where it is used by the client application program.
Computer communication occurs over one or more of the seven network layers provided using many public and proprietary protocols. A common method to exploit RDBMS applications evolved in the form of a 4th Generation Language (4GL) called Structured Query Language (SQL). RDBMS application level requests and responses are submitted in SQL format syntax, usually at the application layer. The functions provided by a RDBMS are driven partly by user commands sent to it in SQL. SQL is not itself a DBMS nor a stand-alone product, but an integral product of an RDBMS which facilitates communication with the DBMS. Structured query language is also a standard protocol for accessing databases and records. SQL can be used to formulate a database interface statement for subsequently executing a database transaction. SQL includes features that allow users to query, update, insert data into, and delete data from, a database. User requests to a DBMS are made in the form of an SQL command or statement.
SQL is not a “complete” computer language in that is does not have a complete complement of constructs normally attributed to a computer language. Inside the RDBMS application, SQL commands are generally parsed and translated into intermediate languages, which can then embed and expand requests/commands in a language less user friendly but richer in lower or system level constructs. Moreover, the intermediate language can differ with implementation without changes to the SQL.
Because databases can be extremely large containing millions of records and more, even one SQL command requesting a result set can produce an operation requirement in the RDBMS that includes the access, retrieval, manipulation or storage of many millions of records. Accordingly, even small changes which reduce the time of an operational step repetitively employed over millions of times in the overall process of obtaining results can significantly affect the performance of an RDBMS and consequently user response time. Response time is a function of the RDBMS internal tasks of parsing, interpreting, searching, finding, retrieving, calculating, formulating and returning results. These programming tasks are typically accomplished in high level programming languages such as C, C++, JAVA and others. Any of these tasks may use common routines in these languages, which can strategically be made more efficient through algorithmic streamlining. These incremental efficiencies, even when conceptually or programmatically small, can have significant impacts on overall RDBMS performance.
The SQL statement can have four basic root “data manipulation language” commands (SELECT, UPDATE, INSERT, and DELETE)) with a myriad of qualifiers also called clauses, aggregates and logic. These produce expressions, which allow users to define what information in the database is to be accessed and how the access or results are to be viewed. SQL expressions can be arbitrarily complex and various. The software implementing the SQL parser and downstream programmable source code must be able to dynamically interpret and execute the query. This is typically done by a compiled program parsing the SQL expression and encoding this expression into an intermediate language instruction stream. However, the bookkeeping involved with this implementation can be substantial and ways to improve performance at any of these levels are needed. Such bookkeeping may include identifying tokens, buffering of temporary variables and data, function call ordering and placing or program stack, instantiating objects for intermediate component operations, executing stack operations, keeping track of relationships between intermediate components and operations, executing the intermediate operations and applying the requested logical operations, etc.
The four basic SQL commands use clauses and qualifiers to filter out record or rows from a query and include only those wanted in the query results. For example, an SQL WHERE clause is used to specify in more detail those records of interest. The following are illustrative SQL statements using a SELECT command with a WHERE clause:    SELECT city, sales, target FROM offices WHERE sales>target;    SELECT name, sales, quota FROM salesreps WHERE empl_num=100;
City, sales, target, name, quota and empl_num are typically associated with fields or attributes in a database table. Thus, a query with fields which can have different values or attributes is called a multi-state variable query. Offices and salesreps are tables in a database. Even one SQL statement can have many logical comparisons, range tests, membership tests and pattern matching, relationship tests and/or arithmetic tests. From these simple SELECT statements, it is clear that this process can be a significant task when multiplied over millions of records.
WHERE clauses as well as other SQL constructs, can have many field comparisons, logical operations, range tests, membership tests, pattern matches and arithmetic operations. These may have to be done repeatedly for many records. Some of these records may have a null field (missing data) or one that can give the SQL statement cause to return an error due to anomalous data such as data causing a divide by zero or out of range error. This can occur at anytime and wastes previous work done by not processing the remaining records or otherwise intelligently handling the nullity or error. Therefore, ways are needed to transform nulls and errors produced by field variables to manageable True/False logic without impairing the process progress or process integrity.
User SQL commands, statements, requests or clauses are parsed and processed by computer programs built from software applications running on various computers and computer systems. Although built to handle the SQL statements for a particular common industry standard syntax and grammar, software applications are uniquely architectured from a programming standpoint to achieve different goals and to comply with different programming philosophies and methodologies. Many SQL parsers use the interpretive method of accepting and processing SQL statements sequentially. Using the interpretive method, each statement is completely translated and executed before the next statement without building an object file and executable program.
One problem with the foregoing approach is the resulting substantial bookkeeping overhead required for such an implementation. Once the instruction stream is generated, much of the program logic is structured and an interpretation phase is setup to respond along prescribed paths. For example, when encountering Boolean logical operators such as AND and OR the execution logic may be set to only handle True and False. The generated instruction stream poses limitations to implementations which force a less than optimal approach when encountering nulls or errors. These approach limitations dictate function calls, returns and return handling, temporary buffers, larger working sets etc. What is needed is a more streamlined implementation of this process which reduces function calls, stack space use, cycle time, returned call handling, path length and all around working set size while operating under the dynamic input response requirements of the application.