In many industries such as telecommunications, defense, aerospace and manufacturing, software applications have become increasingly large and complex. Applications can extend to many millions of lines of code. The code may be written by many developers compounding the difficulty of managing and understanding the applications. Navigating paper listings to understand program structure, file interdependencies, and the like is cumbersome and inefficient. As a result, sophisticated software engineering environments including integrated development environments (IDE) and source code analysis tools have been developed to aid developers in coping with the complexity.
One aspect of code visualization and navigation tools, sometimes referred to as source code browsing tools, is file-referencing. File references, like a book index, provide information about where in a collection of files and, preferably, how a source code identifier is referenced throughout the files. Source code identifiers comprise information symbols (names or tokens) assigned by programmers to variables, constants, functions, procedures, classes and other constructs and the like within the source code. The reference may classify the type of use, showing where the identifier is defined or declared, where its value is modified or otherwise referenced. At a more detailed level, the cross-reference system may pinpoint the relative location of the identifier in the file by line number and possibly column position.
Often, source code browsing tools present cross-reference information gleaned from a collection of files in a hierarchy of views. In a first level, identifiers may be presented while in a second or “Global” level, the files where the identifiers are referenced and the way the identifier is referenced in those files is illustrated. In a third or “Local” level, detailed information is presented to pinpoint the line and possibly the column where each reference to an identifier is made in a file. The partitioning of cross-reference information into global and local levels permits user queries to be performed at various levels of resolution, selecting more detailed views only when desired. This approach is important when browsing large scale systems because there may be thousands of files in which a given identifier is used. Presenting detailed information for all such files may overwhelm a user. Partitioning also aids in query performance.
File-reference information is typically gleaned by analyzing the source code, such as by parsing. For large applications, file-reference information poses storage and retrieval issues that must be addressed to ensure a suitable level of performance. Often a balance must be struck between storage conservation and run-time retrieval performance. Some source code tools use standard relational databases to store all “detailed” or local cross-reference information. However, these databases can grow very large and become difficult to manage. Response times for information retrieval can degenerate as the database grows. In addition, these large databases become costly to manage and store for systems that provide version control functionality.
Other file cross-referencing applications store less detailed or global file-reference information and, when additional information is needed for a given source code file, detailed information is reconstructed from the source code and summary information.
The retrieval of detailed information from a large database system or the construction of detailed information in a summary system may result in run-time performance delays that annoy users and contribute to software maintenance overhead.
It is therefore desirable to have a method and data structure for storing and accessing file-reference information that conserves storage space, and permits rapid responses to user information queries.