American and European businesses have billions of lines of production software that are written in legacy computer languages like COBOL, RPG, PL/I, Fortran and Natural. These businesses are highly motivated to modernize their software, but the process is often either extremely expensive or extremely low quality. The available tools are often not optimized for complex software systems that can have tens of millions of lines of code. The first step in an application modernization project is parsing and analyzing all the existing software.
When parsing, a parser analyzes a string of symbols within source code in accordance with the rules of a language within a grammar. On the basis of the analysis, the parser produces, for example, abstract syntax trees (AST). Based on the information within the abstract syntax trees, a semantic analyzer creates a database that includes data flow (typically in the form of symbol tables) and control flow information (indicating, for example, who calls whom). An analysis tool can be used to traverse the abstract syntax trees looking for specific named entities. The analysis tools depend on the names of entities as listed in the grammar. If someone changes any name listed in the grammar, it can cause a problem for the analysis tool searching in the grammar for that old name for that entity.
Unfortunately, grammars often are changed to take into account variations in hardware, operating systems and business-specific conventions. To avoid problems, the analysis tools need to keep apprised of these changes. If changes in the grammar are not properly communicated and taken into account in operation of the analysis tools, this can raise serious difficulties for correct analysis of the original source programs. There is ample opportunity for analysis tools to get out of sync with a grammar when many changes are made to the grammar by many different people. For this reason, in general, having only a few people maintain a grammar, a parser and associated analysis tools can help to decrease the possibility of a loss of synchronization between the grammar and the analysis tools. However, when only a few people maintain a grammar, a parser and associated analysis tools, this makes it difficult to scale up to millions of lines of source code.