1. Field of the Invention
The present invention relates to the field of lexical analysis of source code and static code review and more particularly to source code parsing.
2. Description of the Related Art
Lexical analysis refers to conversion of a sequence of characters in a body of text into tokens. Once a sequence of characters has been converted into tokens, the tokens can be characterized according to function in order to provide meaning and context to the body of text. The initial stage of the lexical analysis generally involves the application of a finite state machine to an ordered sequence of text in order to emit finite character strings according to the configuration of the finite state machine, often referred to as a scanner. Subsequently, the tokenization stage performed by a tokenizer demarcates and classifies the finite sections of strings into tokens. Finally, an evaluator attaches meaning to tokens through the application of rules to the tokens.
Lexical analysis, known to the skilled artisan as parsing, forms an integral part of software development as a fundamental stage of code building. In this regard, in the context of code building, parsing is the process of analyzing a sequence of tokens in source code to determine its grammatical structure with respect to a given formal grammar of a programming language. Parsing transforms input text in source code into a data structure, usually a tree, which is suitable for later processing during the compilation phase of code building and which captures the implied hierarchy of the input.
Parsing finds particular application to static code review. Static code review refers to the parsing of source code to identify program code constructs for the purpose of optimizing the source code and detecting programmatic and syntactical errors within the source code. In static code review, source code can be parsed and compared to existing rules in order to flag portions of the source code of concern. Advanced forms of static code reviewing tools provide for code modification based upon pre-defined rules included as part of static code reviewing tools.
Source code often must be ported from one platform to another. Historically, the process of porting source code involved the manual review and modification of source code to account for the particular nuances of a target platform. Given the complexity of modern software design, manually porting source code can be tedious at best and more often than not is a virtual impossibility. To address the difficulty in porting source code, static code review tools have been configured to apply a set of predefined rules in order to parse source code and to make required changes according to the rules. Other sophisticated tools merely suggest manual changes where an automatic change is not appropriate.
The development task of porting source code from one platform to the next can vary in difficulty and complexity. Different developers approach the problem differently, in consequence. Yet, static code reviewing tools can be inflexible in that the rules incorporated into static code reviewing tools are hard-coded rules defined without regard to the specific task of porting source code from one particular platform to another. To modify the hard-coded rules of a static reviewing tool requires language-specific coding skills and access to the code base of the static reviewing tool itself.