A conventional compiler may save limited source code information in a symbol table for a debugging tool after compilation of a source code listing. The ctags and/or etags programs, often found in “UNIX®-like” operating systems, generate an index (or “tag”) file for a variety of language objects found in source code file(s). The tag index files assist editors such as Vi and Emacs to locate the construct associated with a name/symbol appearing in a source code file and jump to the file and line which defines the name. However, a tag index file preserves only symbol definitions/references, and may generate inaccurate tags occasionally. A source code presentation tool such as an IDE (Integrated Development Environment) presents source code with typical features such as easy access of referenced symbol, syntax highlighting, outline of symbol definitions, and collapsing/expanding of source code constructs. Symbol tables and/or tag index files are not enough to supply information to support all those features. U.S. Pat. No. 4,931,928 provides a method for analyzing source code with a dedicated parser to extract source code information to be inserted into a database. In general, source code presentation, source code metrics collection, software reverse engineering, and other analysis tools require a parser to process the source code listings in order to obtain source code information of interest. Thus, there is a need to preserve lexical, syntax and semantic information of source code listings for source code presentation as well as analysis, especially after compilation.
Source code listings of computer software are likely a mixture of syntaxes of one or more programming languages, preprocessing, and documentation, and thus will be supplied to those language or syntax processors respectively. For example, a source code listing in JavaServer Pages™ (JSP) is a mixture of HTML and Java™. AST (Abstract Syntax Trees) of source code is typically used to represent the source code during compilation or source code analysis. DATRIX™ ASG (Abstract Semantic Graph) is an extension of AST, and offers a method to save source code syntax as well as semantics in flat files using data records. However, both AST and ASG are not a choice for representation of multiple syntaxes. Extending a programming language by means of a preprocessor has both merits and drawbacks. A preprocessor allows certain language extensions such as macro substitution, file inclusion, and conditional compilation. However, source code in a computer language with preprocessing syntax causes a syntax dependent on another syntax (preprocessing syntax), and is often context sensitive. As a result, a source code analyzer or a software reverse engineering tool based on AST for C or C++ often has to impose restrictions on the use of preprocessing. Preprocessing is an important feature of C and C++, however, there is no standard way of recording macro definition and expansion in a datastore. For programming languages such as Java™ and C# that do not use or use limited preprocessing, a source code file in those languages is often a mixture of syntaxes of a programming language and structured documentation in comments. For example, Javadoc is a document standard for generating Java™ API documents from Java™ source code, and Doxygen is a documentation system for C, C++, Java™ and many other languages.
Browsing source code through a web browser often takes two approaches: static HTML pages and dynamically generated pages. An approach of the former is described in U.S. Pat. No. 5,940,615 that provides a method to generate static HTML pages from source code listings. A method using static HTML pages does not support user preferences and selections. In a latter approach, upon a request of a web browser, an HTML page is dynamically generated from a datastore maintaining source code information. Dynamically generating web pages allows the control of the page content on demand and the display of source code listings with preferred user settings, and drawings of dynamically generated graphs from the source code information datastore. Examples of graphs for source code listings are class relationship, method/function call graph, and reverse engineered design graph.
Open source web sites (such as SourceForge.net, Tigris.org and GNU.org) manage software release packages for download and version control. Some of the sites provide links to view individual source files. However, it is not possible to browse symbol definitions and references among large number of files, nor is it possible to show program structure or design through various graphs. In addition, a user cannot conveniently search a symbol usage across many packages.
At present, there are web sites, such as Google™'s source code engine and Koders.com, for searching open source software. In Google™'s source code engine, source code browsing page does not provide syntax and semantic information such as symbol reference. Koders.com is a site with all packages installed or copied, to a local system, then source code files are processed locally to extract source code information with a parser, and it is not implemented for distributed servers hosting source code packages. For a distributed source code search engine, the search engine and hosting servers are not integrated, and the search engine does not have to perform syntax and semantic analysis of source code packages on distributed hosting servers.
Integrated development environments (IDEs) such as Eclipse, Redhat Source Navigator™, Microsoft® Visual Studio®, and JetBrains IntelliJ® IDEA, are used to manage projects for software development. Browsing and presenting source code are often limited to source files of managed projects. They are stand-alone tools and are not designed to search and browse software over Internet. In addition, they are not targeted to manage thousands of software packages. There is a need to provide a method to search and present source code packages through an IDE from a network of distributed servers as if those packages are managed projects.