Large-scale software development requires effective tool support, such as source code browsers, bug finders, and automated refactorings. This need is especially pressing for C, since it is the language of choice for critical software infrastructure, including the Linux kernel and Apache web server. However, building tools for C presents a special challenge. C is not only low-level and unsafe, but source code mixes two languages: the C language proper and the preprocessor. These tools, therefore, need to process C itself and the preprocessor. The preprocessor adds facilities lacking from C itself. Notably, file includes (#include) provide rudimentary modularity, macros (#define) enable code transformation with a function-like syntax, and static conditionals (#if, #ifdef, and so on) capture variability. The preprocessor is oblivious to C constructs and operates only on individual tokens. Real-world C code reflects both points: preprocessor usage is widespread and often violates C syntax.
Existing C tools do not process both languages. Rather, they either process one configuration at a time (e.g., the Cxref source browser, the Astree bug finder, and Xcode refactorings), rely on a single, maximal configuration (e.g., the Coverity bug finder), or build on incomplete heuristics (e.g., the LXR source browser and Eclipse refactorings). Processing one configuration at a time is infeasible for large programs such as Linux, which has over 10,000 configuration variables. Maximal configurations cover only part of the source code, mainly due to static conditionals with more than one branch. For example, Linux' allyesconfig enables less than 80% of the code blocks contained in conditionals. And heuristic algorithms prevent programmers from utilizing the full expressivity of C and its preprocessor.