In the field of computer software, the process of “building” a software product involves taking as input the product's source files (and potentially other types of files such as pre-compiled libraries, data, content, etc.) and converting those input files into output files that can be used by end-users to run the software product on their respective computing devices. For input files that are source files or libraries, this build process can include, e.g., compiling and/or linking the source files/libraries to generate executable binaries for the product. The build process can also merge, split, compress, copy and/or otherwise manipulate the various input files (as well as intermediate files created from the input files) so that they are in an appropriate format for product release. Typically, the total set of files that may be provided as input to the build process is maintained in a directory structure known as a “build tree.”
As software has become more complex, it has become increasingly common for software developers to incorporate “external files” into their software products (i.e., files that originate, either wholly or in part, from entities that are different from the developer that produces a given product). Examples of external files include source/library files from open source projects, content files (e.g., images, audio clips, etc.) from stock content agencies, and so on. While the use of external files can speed-up/ease software development, in many cases such files are subject to third-party property rights or restrictions (e.g., licenses, copyrights, patents, etc.) that can affect the property rights owned by a developer in its overall software product. Thus, it is important for software developers to monitor which external files are used in their products so that they can understand and comply with those third-party rights/restrictions. This monitoring generally involves (1) determining the files in a build tree that are external files, and (2) identifying, for a given product build/release, which of those external files actually contribute to one or more of the output files of the build process.
For very small-scale software projects, it is possible to carry out steps (1) and (2) above manually. However, this manual approach quickly becomes unworkable as project size and complexity increases. For instance, a large software product may have tens of thousands of files in its build tree, of which a significant percentage are external files. Similarly, the software product may include tens of thousands of output files in its released form. In such a scenario, manually tracking the external files in the build tree and mapping the files from build output to build input can be extremely time-consuming, cumbersome, and error-prone.
There are existing tools (referred to as “static build tree analysis tools”) that can automate step (1) to an extent—in particular, these tools can analyze a build tree and generate a list of files in the build tree that originate, wholly or in part, from an external source (such as files/code subject to an open source license). However, static build tree analysis tools generally do not help with respect to step (2) (i.e., identifying which external files are actually used/incorporated in a released product). To understand this, note that the set of files in a build tree and the set of files that are used to generate build output are not necessarily the same; there are many reasons why a file in the build tree may not contribute to any of the output files of a build process. As one example, the file may have been placed in the build tree for testing/prototyping purposes, and thus may be excluded from the build specification for a final product release. Accordingly, static build tree analysis tools, which simply determine the external files in a build tree, do not address the problem of tracking which of those external files actually make it into the product that reaches end-users.
Further, there are some open source licenses where the nature of the license not only depends on whether a particular source file/code portion is used in a product release, but also on how the file/code is incorporated. For instance, one type of open source license may allow for unrestricted use of source code if the code is linked as a dynamic library, but may include restrictions if the same code is linked in a static fashion. For external files/code that are subject to these and other similar licenses, software developers have the added burden of tracking not just if, but also how, they make use of such files/code when building their products.