Typically, a software build process takes source code and other configuration data as input and produces artifacts, also referred to as derived objects, as output. Software artifacts define a superset that includes output (also known as derived objects) including, but not limited to, specifications, architecture and design models, source and executable code, configuration, test data, scripts, process models, project plans, and documentation. The exact number and definition of steps in the software build process depends greatly on the types of inputs (e.g., Java, C/C++, or Perl/Python/Ruby source code) and the type of desired output such as the goals of the build process (e.g. creating an intermediate output for test and verification, working or continuous production, or full production build) of a desired version of the software, and even the desired output environment (e.g., CD image, downloadable zip file, or self-extracting binary for a particular system). If the source code includes a compiled language, the build process would likely include a compilation and perhaps a linking step.
The identification of the inputs to the build process is itself part of the process. Often selection of inputs is automatic and/or implicit and is therefore hidden from view of users, making the selection step easy to overlook. For example, when a developer runs a development build, the selection of inputs (e.g., source code and configuration data) is implicit, but still occurs. The sources that are present on that particular developer's development environment are effectively selected as inputs to the build process. When the build process is performed in a clean environment, the identification of the inputs has to be explicit. For example, when a build takes place on a build server, one of the first steps is to retrieve explicitly specified source code, such as from a source code repository or other artifact storage system. The retrieved source code is used as the input for the build.
A controlled build process is a build process that makes the definition of the explicit inputs and is comprised of a series of explicitly defined steps used to produce the output (e.g., artifacts or derived objects). A controlled build process is repeatable; given the same inputs, it should produce the exact same outputs. A controlled build process also provides traceability from the sources to artifacts and from the artifacts back to the sources, thereby ensuring tests results are related to both artifacts and the resulting output from the build process simultaneously. In order to provide this traceability, each invocation of the controlled build process is uniquely identifiable. In one embodiment, an identifier such as a build number is used to identify separate builds. Controlled build processes are not necessarily automated, although automation certainly facilitates the process and helps ensure consistency.
In general, two primary types of builds take place during a software development project: developer builds and authoritative builds. Developer builds are performed by the developers in the course of development in their software development environments. By their very nature, these are non-controlled builds. There is no explicit definition of the inputs to the build process. The inputs are the sources and configurations that happen to exist in the developer's environment at the time of the software build. Typically, there is no way to track these inputs; therefore, the historical revision of any of the source files is unknown. In fact, some of the source files may include changes that have not been committed to the Source Code Management (SCM) system, or worse yet, they may include files that were modified outside the knowledge of the SCM and without permission or authority to modify those files.
Authoritative builds, on the other hand, are the embodiments of the controlled build process. Authoritative builds take place on a clean environment where all inputs are explicitly defined and controlled. Typically, an authoritative build takes the source code directly from the SCM and uses only explicitly defined environmental dependencies, which also can be defined by the SCM. During the authoritative build, explicitly defined steps are executed in a deterministic order. In general, a mechanism is provided to allow for traceability such as a build number and/or a SCM label. The build number uniquely identifies each particular authoritative build and its artifacts (derived objects) and the SCM label (or baseline) provides traceability back to the inputs.
A build management system (build management system 102) is a software tool typically used by developers in conjunction with an SCM to perform and manage authoritative builds. build management system 102 software is responsible for providing a clean environment for the authoritative build, determining what inputs are to be used for a build and then obtaining those inputs from a known source, such as from an SCM. The build management system 102 also performs and manages the steps required to execute the build process and generate the artifacts.
The generated artifacts from the build process also must be stored and maintained. For example, the artifacts are sometimes in the source control management (SCM) system or they can be stored various locations in the system 100, including with a flat file storage system or on the file system of a host computer. SCMs, however, are designed to store source code and thus optimized for use cases related to the storage and retrieval of source code and not for build artifacts. In particular, SCM systems have difficulties tracing artifacts to the build that produced them because the structure of an SCM does not provide direct mechanisms for code traceability. Instead, SCMs attempt to create a mechanism to implying traceability. For example, labels or tags, snapshots or baselines, depending on the SCM, branches or streams; and the file system namespace organization can all be used in an attempt to trace the build that produced a particular artifact. Further, SCMs are typically not designed to optimize access to the artifacts, or optimize the storage of artifacts. In particular, storing artifacts in a SCM is burdensome because SCMs are designed to track the evolution of file contents, not to store efficiently sets of files and provide traceability to a build. Furthermore, access to artifacts is not optimized because most SCMs optimize access to the latest revision of a file. The build management system 102 in contrast requires access to artifacts that were not generated by the latest build. Finally, storage of artifacts in a SCM is often not optimized because most SCMs do not optimize the storage of binary files and most build artifacts are binary.
In a conventional software development environment, projects utilize multiple types of builds, such as local development builds, continuous integration builds, nightly builds, and release builds. In this conventional context, the build process includes the transformation of source code to a program object and other artifacts associated with a given build, as well as additional processing, such as testing or deployment of a given build to a desired environment or target build purpose. It is common for organizations to utilize large numbers of different build types, varying, for example, according to build scope, purpose, targets, and environment, during the development of complex software.
Build types are separated into two distinct classes of builds: local development builds and authoritative builds, where authoritative builds take place on a build management server. Authoritative builds such as continuous integration, nightly, or release builds are not truly different types of builds, but rather different stages of the build life. A build may start out as a continuous integration build, designed to run only the most important automated tests to verify specific aspects of the build. For a continuous integration build the focus is on providing quick feedback to the development team. The last successful build for any day may be run through a more exhaustive test suite during the night and thus become the nightly build. Based upon a large number of factors, such as the features included in a build or the timeline of the project, a build in some instances is a release build. A build may even be deployed or promoted to the quality assurance (QA) environment or production.
In conventional build systems, each build is a single event in time. Once a conventional pure build is completed, nothing else can be added to that build. It is impossible to add a process to run automated functional tests to a traditional build after that build has completed. To deal with this problem, multiple build types are created. For example a continuous integration build performs a pure build, namely getting the source from the SCM, transforming it, and then running continuous integration tests on that pure build. Similarly a conventional nightly build separately gets the source from the SCM, transforms it, and then runs functional test on that separate pure build. Then once again, for a release build source is once again separately obtained from the SCM, separately built as a new pure build, and followed by deployment of the separately built software and artifacts to the target environment. In this manner the pure build 200 process is repeated multiple times, even if the underlying code has not been modified or changed in any manner.
During software development, source code files are in a constant state of change. This means that the files that were used by the continuous integration build 302 are most likely different than the files used by the nightly build 304, which are likely to be different than the files used by the release build 306.
Additionally, every build process in a conventional system produces a separate set of artifacts. The continuous integration build produces a first set of artifacts that it uses for continuous integration tests. The nightly build then produces a second, distinct set of artifacts, even if the nightly build utilizes the exact same source code as the continuous integration build. Furthermore, the release build creates yet another set of artifacts, regardless of whether it also uses the same source code as the other builds. Consequently, the artifacts being deployed by the release build are not the same as those that were used in the exhaustive nightly test test suite run during as part of the nightly build.