5.1 Field of the Invention
There are many styles of programming language grammars in use today. Two of these styles are brought together into the programming language grammar of the present invention. The first style encompasses all languages which derive from the C programming language, introduced some 20 years ago, including Java, C++, and JavaScript, and now, the grammar described herein. The second style includes a large group of languages which derive their usefulness by offering the programmer the ability to create “regular expressions” for searching, recognizing, and tokenizing text. The grammar of the present invention combines these two styles by making regular-expressions a primitive data type of a C-style language.
The grammar of the present invention introduces several new forms of regular-expressions, not seen in the art, including theoretical textbooks on regular-expressions. The virtual machine language of the present invention is a variant of FORTH designed expressly for the purpose of building the engine for its interpreter. The interpreter for this language has the normal components of an entry-level C interpreter known in the art.
5.2 Description of Related Art
The related art falls into two general categories: (1) those programming language grammars including or similar to C or C++ and (2) those programming languages including or similar to Perl.
References in this document such as “similar to C/C++”, indicate a general style for statement blocks, function declarations, loop-constructs, if-statements, and most importantly, the exact C-style grammar for expressions, including the C-operators, and the C-rules of associativity and precedence for binding sub-expressions. This is particularly important, because regular expressions are indeed expressions, and the grammar of the present invention is the first to “overload” the C-expression as a way to encapsulate regular-expressions.
References such as “similar to Perl/Tcl” encompass those languages which allow programmers to create powerful constructs, called regular-expressions, that are used to help solve problems involving searching, recognizing, and tokenizing text. At the core of these languages, are implementation engines based on two varieties of finite automata, NFAs and DFAs. Likewise, the engine, upon which the present grammar is implemented, is also based on the principles of NFAs and DFAs. In addition, the present invention introduces several regular expression grammar forms that are, practically speaking, not reducible to Perl regular expressions.
The present invention overlaps a little in functionality with Lex-Yacc—which is a compiler-building language. Lex-Yacc is a language (actually a combination of 2 grammars) which combines the concepts of tokenizing via regular-expressions and using production rules to associate side-effects with recognized tokens. Similarly, the grammar of the present invention also involves production rules, as well as the generation of instructional (statement-level) “side-effects”, but the similarity is only on the surface. First of all, the engine for the grammar of the present invention (as described by examples and diagrams and as implemented in the source code) does not implement recursive production rules (modeled in theoretical texts via Pushdown Automata—PDA). Problems modeled by recursive production rules cannot be solved by Non-Deterministic Finite Automata (NFA), whereas, ultimately, all of the grammar forms of the present invention can be modeled by NFAs, so long as recursive production rules are disallowed. The second major difference between the engine of the present invention and that of Lex-Yacc, is in how statement-level side-effects are generated. The Lex-Yacc engine uses a tokenize phase (based on classical regular expressions) and a subsequent phase to determine side-effects based on production rules. In contrast, side-effects of the present invention are compiled directly into the NFA, a feat made more manageable by the creation of a variant of FORTH for use as the “virtual machine” language of the present invention.
The grammar of the present invention is fundamentally different from that of Lex-Yacc since the grammar integrates general statement-level (instructional) side-effects directly into the regular-expression grammar. Logically, the support diagrams (of this document) show that instructions are embedded directly into the NFAs, something not seen in the art of “finite automata”. In contrast, Lex-Yacc separates the tokenizing phase and the production rule “analysis” phase in order to solve the problem of recursive production rules. Lex-Yacc was designed to solve the problem of recursive rules because it is a compiler-builder for compiling programming languages, and programming languages generally offer a syntax for recursively defined expressions. In contrast, the grammar of the present invention solves the same class of problems for which the Perl/Python grammars are normally chosen. Such grammars, including that of the present invention, make it easier for the programmer to solve problems of searching, recognizing, extracting in textual documents—not compiler-building. Therefore, solving the problems imposed by recursive production rules is not required to achieve the primary goal of the present invention. Further, as discussed in section 8.6, it was realized that the novel subjunctive grammar form—so important in allowing the present invention to extend the art—cannot be combined with recursive rules without additional support algorithms, not known in the art, that are best covered outside the scope of this patent. Thus the purpose of the production rules of the present invention, in contrast with the production rule feature of Lex-Yacc, is to enhance readability and re-usability of a regular expression by allowing it to have parameters.
Therefore, in terms of allowing a programmer to create regular expressions, the present invention matches the intent of languages such as Tcl/Perl/Python.
5.3 List of Prior Art
Languages:                Perl        Python        Tcl        C        
Other Publications:                Compilers: Principles, Techniques, and Tools, by Aho, Sethi, Ullman, Addison-Wesley, 1986        Introduction to Automata Theory, Languages, and Computation, by Hopcroft, Motwani, Ullman, Addison-Wesley, 2001, 2nd Edition        