5.1 Field of the Invention
The field of the present invention encompasses principles of (1) object-oriented design and polymorphism based on virtual class members, (2) C++ and derivative languages (Java ™, etc.) which are reductions to practice of the former, (3) regular expressions and production rules, and (4) Perl-like and Lex-Yacc like languages which are reductions to practice of regular expression grammars and production rule grammars.
Little commonality is presently found between the fields of (1)/(2) relative to the fields of (3)/(4). The present invention applies the principles of polymorphism and virtual member derivations to production rules of the present invention (which are embodiments of regular expressions), introducing commonality, thereby allowing regular expression programmers to gain the benefits of virtual member polymorphism, previously only available across various “OO” grammars for member function derivations.
5.2 Description of Related Art
The related art includes C++, Perl, Lex-Yacc, and the inventor's previously filed U.S. provisional application Ser. No. 60/469,036 [filed May 6, 2003] and related, pending U.S. non-provisional application Ser. No. 10/691,414 [filed Oct. 22, 2003], hereinafter referred to as ‘first invention’. The ‘first invention’ shows how to integrate regular expressions into a C-style language, by overloading/adapting the C-expression syntax for regular expressions, and, additionally, shows how parameterizable production rules can be used to create re-usable regular expressions.
In terms of Perl, related art aspects involve regular expressions, and how they can be incorporated in solutions to problems of document recognition and tokenization. In terms of Lex-Yacc (i.e. its usefulness in building parsers), related art aspects involve the use of regular expressions to create “atomic” tokens, and how Lex-Yacc production rules are used to combine tokens into higher-level tokens, implying a parsing tree for all of the tokens of the document (normally a very “well-formed” document conforming to a particular grammar). In terms of C++and its derivatives such as Java™, related art aspects involve the concept of virtual function members of class definitions, and how virtual function members relate to the concepts of polymorphism. The term “polymorphism” implies in the art that an instance of a structured object can be assigned to and used/viewed as multiple (poly) data-types (morph), but invocation of its virtual functions will always call the most-derived function within the class hierarchy of the object's actual type.
In the present invention, which extends the ‘first invention’ by incorporating the essential elements of object-oriented class declarations, it will be seen that those same object-oriented design goals are applied to parameterizable production rules—in that production rules, previously available in the ‘global scope’ of the scripts of the ‘first invention’, are now also available polymorphically as virtual members of class/struct definitions, with the same advantages that normally accrue to virtual function members in the (object-oriented) art. This novelty is core to the present invention, in that the production rules described by the ‘first invention’ (re-usable parameterizable regular expressions) are additionally offered to the programmer of the present invention in a form consistent with the principles of object-oriented languages, and can therefore be used polymorphically.
Additionally, it will be demonstrated that the present invention's collection of grammar forms and engine optimizations offer a novel and preferred design approach for solving problems of document tokenization, hereinafter referred to as the ‘document design pattern’. This design pattern allows for a more effective approach to problem decomposition and modeling, an approach encouraging object-oriented principles of encapsulation, behavior derivation, and improved library design and re-use, ultimately allowing solutions of fewer lines of code, without performance disadvantages relative to solutions given in the grammar of the ‘first invention’ or in other related art.
Further, it will be demonstrated that by adoption of the ‘document design pattern’ in the present invention, a reliable estimate of performance can be made by the programmer at design time, and that in general, the performance of a ‘document-level production rule’ is within the same order of magnitude as ‘DFA-maximum speed’, as demonstrated by examples.
Just as programmers have learned a variety of optimal techniques to leverage virtual class member functions in C++ and Java ™ for functional decomposition, programmers of the present invention will learn analogous advantages and techniques of formulating regular expressions as (polymorphic) virtual production rule members of struct/class definitions.
5.3 List of Prior Art
Languages:                Perl        Java ™        C++        Lex/Yacc        
Other Publications:                The ‘first invention’        