The present invention relates generally to information processing systems and, more particularly, to assisting a user of such systems, such as a Database Management System (DBMS).
Before a digital computer may accomplish a desired task, it must receive an appropriate set of instructions. Executed by the computer's microprocessor, these instructions, collectively referred to as a "computer program," direct the operation of the computer. Expectedly, the computer must understand the instructions which it receives before it may undertake a specified activity.
Owing to their digital nature, computers essentially only understand "machine code," i.e. the low-level, minute instructions for performing specific tasks--the sequence of ones and zeros that are interpreted as specific instructions by the computer's microprocessor. Since machine language or machine code is the only language computers actually understand, all other programming or computer languages represent ways of structuring human language so that humans can get computers to perform specific tasks.
Computer languages are fundamental to modern-day computing. Although it is possible for humans to compose meaningful programs in machine code, practically all software development today employs one or more of the available computer languages. The most widely used of these are the "high-level" programming languages, such as C or Pascal. These languages allow data structures and algorithms to be expressed in a style of writing which is easily read and understood by fellow programmers.
Central to all high-level computer languages is the notion of evaluating expressions. An expression may be thought of in terms of its constituents: operands and operators. Operators are the members of the expression which define action--what type of operation is to occur upon execution of the expression. In the C programming language, for instance, a "+" operator specifies an addition operation. An "&&" operator, on the other hand, specifies a logical "And" operation. The former is an example of an arithmetical operator; the latter is a logical or Boolean one.
Operators typically require one or more operands--the data members which operators are to operate on. The addition operator (+), for instance, is a binary operator requiring two operands. Thus the expression 3+2 contains two operands--2 and 3--separated by the + operator. In this simple example, the expression evaluates to or "returns" a value of 5. Thus, The two numeric operands have been combined by an arithmetic operator to give (return) a numeric expression (i.e., an expression of a numeric data type).
An expression can be used as part of a larger expression, and so on without limit. This is usually thought of combining and evaluating operands (such as literal constants, named constants, and variables) with arithmetical, logical and other operators. To distinguish these levels, one uses the terms subexpression, subsubexpression, and so on. Thus the expression (3+2)*(1+(7+8)) contains two subexpressions: (3+2) and (1+(7+8)). The latter, in turn, contains a nested subexpression of (7+8). By convention, special parentheses operators--"("and")"--are used to group a sequence of operands and operators, as is known in both the computer and mathematical arts.
Each computer language provides rules or "syntax" for constructing valid expressions. These rules are usually recursive in the sense that valid subexpressions can be legally combined to give valid expressions. Expression validation therefore starts by evaluating the lowest level subexpressions, checking that these are correctly combined at the next level, and so on until the whole expression has been vetted. Consider the following (illegal) expression: EQU (3+2)*/(1+(7+8))
Here, the subsubexpression (7+8) and the subexpressions (1+(7+8)) and (3+2) would each pass the syntactical test, but the complete expression would fail because the resulting expression &lt;numeric&gt;*/&lt;numeric&gt; contains an illegal combination of the multiplication and division operators--*/.
Also attendant to the construction and evaluation of expression is the notion of data types. In an effort to reduce programming errors, most computer languages require that constant and variable identifiers be assigned to a particular data-type attribute. Thus, a data type restricts the range of legal values that can be assumed by an identifier. Consider, for instance, the foregoing example of 3+2=5, restated in terms of variables (using the C++ programming language): EQU int a=3, b=2, c; c=a+b; // evaluates to 5, also int data type
Here, both operands are numeric data types (e.g., C's integer or "int" type); the result of the expression (5) is also a numeric data type. Typically, a strongly-typed language demands the explicit declaration of an identifier's data type before the identifier is used in an expression. For the example at hand, the variables are declared to be of type int (integer) before they are used. In this manner, an identifier's usage may be monitored throughout the program.
Many computer languages do not enforce data types at the programming level. Database manipulation languages (DMLs) often allow a user programmer to mix data types. In dBASE.RTM., for instance, one may program: EQU a=3 && assigns numeric quantity to "a" a="three" && assigns text string to "a"
Weakly-typed languages deduce the identifier's data type from the context so that, for example, the assignment statement a="three" gives "a" the implied data type of String. Also shown in the above example, some languages allow changing of data type during program execution.
Undoubtedly the weakly-typed approach is flexible. Without data type information, however, ambiguities arise. Consider, for instance, the following: EQU a=3 && assigns numeric quantity to "a" b="2" && assigns text to "b" a=a+b && what result???
How the different data types are to be combined is unclear. The final expression could resolve to a numeric or a string value, or simply be flagged as an error. Because data types play a major role in the validation of expressions, a particular problem a programmer must contend with is what is the data-typing philosophy of the computer language employed.
Another problem facing the programmer is the overwhelming plethora of "functions" and "reserved words" which must somehow be correctly managed. Each function is defined by a prototype mandating a particular set of arguments and a return type. Expectedly, a significant proportion of human programming effort is devoted to inputting the correct formulation of expressions. And to make matter worse, many programming systems do not report invalid expressions until later in the development cycle.
What is needed is immediate feedback regarding the legality, evaluation, and data-typing of an expression as it is being composed. Not only would considerable programming time and effort be saved, but overall quality of the resulting code would be improved. The present invention fulfills this and other needs.