This invention relates generally to computers, and more particularly to names in a computer programming language.
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright(copyright) Microsoft Corporation, 2000. All Rights Reserved.
A natural language is expression that humans use to communicate with one another, e.g. English. Natural languages are highly effective at compressing and unambiguously expressing complex concepts. Words, such as names, provide a concise encoding that provides significant compression with little loss of information.
Compression is achieved in natural languages in two ways: large vocabularies and pronouns. Natural languages have very limited forms of user-defined names (proper nouns) and instead support great expressiveness by providing large fixed vocabularies. Further compression is achieved by providing pronouns whose referent is context dependent. For example, most people would consider the sentence xe2x80x9cThe Archbishop of Canterbury entered the pub where the Archbishop of Canterbury ordered a pint of ale,xe2x80x9d too long. Substituting the pronoun xe2x80x9chexe2x80x9d for the second occurrence of xe2x80x9cthe Archbishop of Canterbury,xe2x80x9d improves the sentence considerablyxe2x80x94making it easier to read (and write). Note that the use of a pronoun does not require creation of a new name in order to shorten the sentence.
Unlike natural languages, computer programming languages (expressions that computers understand) typically have a small fixed vocabulary, such as built-ins and keywords, and a larger user-defined vocabulary, such as function names, types, and variables. As a result, a significant part of the effort of writing a computer program is deciding what things to name and what to name them. While programmers have many naming decisions to make, languages typically provide few mechanisms beyond definition facilities to help them make these decisions.
Every additional name added to a program has associated costs and adds to the difficulty of writing the program. The programmer has the burden of choosing an appropriate name, declaring the entity being named, and ensuring that the name does not conflict with pre-existing names. As more names are introduced, the mental task of remembering all names and their scopes becomes increasingly difficult. Likewise, a person reading a program with many unfamiliar names has the burden of first knowing and then remembering each name""s meaning.
In the early days of computers, programming languages forced names to be short, and thus cryptic, because the name itself took up computer memory, which was expensive. This increased the burden on the programmer and reader in knowing and remembering the meaning of the name. Now that memory is inexpensive, names in programming languages are much longer, which potentially helps readability, but long names are difficult and annoying to write, especially when multiple programming-language statements use the same long name repeatedly.
From the earliest designs, prior programming languages have attempted to simplify naming and make programs easier to write and read with mixed success. For example:
1) The Fortran programming language has implicit type declarations based on the starting character of a variable name. But, this solution only deals with declaring a variable and does not help with using a variable.
2. Many programming languages have macro processors that allow one code statement to be substituted with another statement or statements. But, macros require the creation of a new name: the macro name, which complicates rather than simplifies naming. Also, macros are preprocessing transformations, and hence, have syntactic effect but do not perform semantic analysis, which limits their usefulness. Further, macros are often awkward to use and hard to read.
3. Many programming languages have predefined symbols that refer to predefined objects or functions. Also, shorthand notations for naming aggregatesxe2x80x94plural valuesxe2x80x94are common in programming languages. Array assignment, list and array comprehensions, and array slicing notations are all examples of plural shorthands. Examples of pre-defined symbols include:
a) The Java programming language uses xe2x80x9cthisxe2x80x9d to refer to the current object within a method.
b) The AWK programming language uses xe2x80x9c$1xe2x80x9d to refer to first field of a parsed input record.
c) The Perl programming language provides a number of pre-defined symbols. First, Perl allows referring to a sub-match of a regular expression by putting the sub-expression inside parentheses and then referring to the matched value as xe2x80x9c$nxe2x80x9d for the nth such sub-expression. Second, Perl provides xe2x80x9c@_xe2x80x9d for accessing a subroutine""s parameter array. Third, Perl provides the variable xe2x80x9c$_xe2x80x9d, which refers, depending on context, to the current input record, the current pattern string, or the current for each loop iterator variable, among other things. Finally, Perl also defines many arguments to built-in functions to have defaults that are defined by the context.
d) The Pascal programming language provides a xe2x80x9cwithxe2x80x9d construct, which eliminates the need to repeat references to the same structure. Pascal also provides a xe2x80x9cwritexe2x80x9d procedure, which takes an optional first argument to specify the output filexe2x80x94if it is missing, it defaults to xe2x80x9coutput.xe2x80x9d
e) Object-oriented languages such as C++, SmallTalk, and Java provide shorthand forms for referring to the instance object inside methods of the object""s class. For example, foo( ) may be a shorthand for this.foo( ).
f) Most languages with package mechanisms, such as Ada, provide a xe2x80x9cusexe2x80x9d declaration that eliminates the need to qualify fully external references to symbols in other packages.
g) The C programming language provides the shorthand xe2x80x9cX++xe2x80x9d for xe2x80x9cX=X+1.xe2x80x9d
Unfortunately, all of these pre-defined symbols suffer from the problem that the predefined objects or functions are defined by the programming language and not the programmer, which restricts their usefulness. Thus, in order to boost programmer productivity, a solution is needed that will increase the ease of writing and reading computer programs, achieve conciseness in programs without resorting to creating new names, and ease the difficulties in using long names in repeated statements.
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification. The present invention encompasses programming language constructs called pronouns and referents, and a method, system, and apparatus for translating computer source code that contains the pronouns and referents.
A referent is any semantic or syntactic construct in the source code (e.g., a statement, a portion of a statement, an expression, or a value) to which a pronoun refers. A pronoun is a programming-language defined source-code symbol or a sequence of symbols that refers to the referent. As a result, pronouns eliminate the need to define new names or macros for repeated program segments. When a translator encounters the pronoun in the source code, the translator searches the source code for the referent and substitutes the referent for the pronoun. Thus, by using pronouns and referents, the programmer can write programs faster and easier and eliminate program redundancy without losing readability.