Software projects contain a number of components including, for example, modules and packages, functions, types, variables, dependencies—relationships specifying conditions (frequently the installed presence of other software projects) required to install and/or run code components, comments—annotations of human-readable and/or machine-readable information about the components, and exports/public application programming interface (API) definitions—specifications describing how code components may be used from/with external components.
The structure of the software project is defined by the way that its components are configured. By way of example, a simple software project using modular programming for generating random numbers may include a nested structure for a module named “random.” The module “random” depends on an external software project called “rand” and a function named “getRandomNumber.” The function “getRandomNumber” returns a numeric value of type “integer,” contains a local variable named “n” of the type “integer,” and includes one or more comments (i.e., human-readable documentation) for using the function. A software project defined by the same structural representation of components could be written in almost any computer language in use today.
However, the specific code required (e.g., the level of express/explicit code) would vary greatly depending on the language. Some languages require full specification of all components (e.g., every function must specify the types of its parameters and return values); some languages can automatically infer specifications for some components (e.g., type inference); and some languages don't require any specification of types or other attributes of components. In the example discussed above, some languages require express specification that “getRandomNumber” returns a numeric value of type “integer,” whereas other languages allow the return value type to be inferred.
To recognize components in programming code, a system contains a set of rules and patterns (e.g., syntactic and semantic rules) that describe each component and how to recognize it. As an example, a programming language may specify that a function component is defined whenever it encounters the string “function NAME( ) { },” where NAME becomes the name of the function.
Using the rules and patterns for a specific language, “compilers” or “interpreters” are programs (or set of programs) that convert the programming language (source code) into machine-readable code files (target language, often having a binary form known as object code), which can be executed later by a computer. Typically, interpreters immediately execute the source code, often using an ephemeral, intermediate machine code (non-human-readable) representation.
However, because specific information may not be required to execute the source code, conventional compilers and interpreters may omit and alter information in the source code (e.g., the names of components, the types of components, comments/annotations, the original locations where components were defined, the dependencies, etc.).
Furthermore, at a macro level, software components are organized into distinct units known as “packages” and “modules,” as briefly discussed above. A package contains a collection of related, individual components (e.g., types, functions, constants) that, together, implement a higher level capability or behavior. For instance, a function that can respond to Hyptertext Transfer Protocol (HTTP) requests, a type that represents an HTTP request, and a type that represents an HTTP response can be grouped into a single HTTP package.
In addition to serving as a unit of organization, a package also provides an API. Every package has a name that can be referenced and used by other packages. For example, a package to handle HTTP requests might reference a string parsing package that is able to parse Uniform Resource Locators (URLs). The API is the set of components that a package makes accessible to code from other packages. A reference is any instance of the name of a component (e.g., package, function, type, etc.) other than the component definition. A cross-reference occurs when a package refers to another package, either directly via its name or via a reference to one of its sub-components. Stated in another way, a cross-reference is any reference that refers to an entity outside the current code's package (i.e., any instance where one package makes use of the logic and behavior of another package).
To cross-reference and make use of another package, the referenced package first must be imported into the code currently being written. The import process includes resolving the package name into a location and fetching the package from that location. Most computer languages have at least one package manager (also known as a dependency management system) that is responsible for the import process. Depending on the language, a package's name can be universally unique (e.g., “github.com/gorilla/mux”). However, if the language does not guarantee unique naming conventions (e.g., “django”), the package manager must be manually configured, or enhanced, to ensure the proper resolution of package names to unique locations.
Nevertheless, despite the variances introduced across multiple programming languages, the software project's structure maintains a level of consistency that can be beneficial across various platforms. Unfortunately, conventional methods for understanding software projects (e.g., compilers discussed above) are rarely language-independent and may omit/alter useful information. Accordingly, a need exists for an improved system and method for developing a representation of software project structure in an effort to overcome the aforementioned obstacles and deficiencies of prior art systems.