Software engineers often use agile development techniques, which include incremental and iterative design and development, to provide new software functionality to end users. Agile development may require fast turn-around between an end user's request for functionality and the delivery of software changes. Additionally, new software may need to be deployed to multiple datacenters since the software may be multi-homed, meaning that the software is designed for large-scale distributed systems and architected to execute from multiple locations simultaneously.
Agile development cycles and multi-homed software may create challenges for release engineers who make new software functionality available to end users. Releasing new software is often done manually and can be very time-consuming since release procedures may be complex and software code may be released in multiple locations. For example, as shown in FIG. 1, release procedures may include finding changes to the software and building software with the new changes. The software may then be pushed to a test environment where regression tests may be performed to check for errors. If the software runs in the test environment with no problems, the software may then be moved into a production environment where it can be provided to real end users. After putting the software in production, release engineers may still check the health of the production environment to ensure that the software is functioning properly. There may be several software problems that block the software release including: health check failures; regression test failures; transient software faults; or permanent software faults. If release procedures fail, a release engineer may roll back the process and restart the procedures using previously deployed software packages. Within conventional release workflow processes, it may be difficult to determine where errors occur and how to tolerate them if errors are detected since the workflow may be complex.
An entire release process may take upwards of several hours in order to complete. This time is approximately the time of one engineer fully dedicated to releasing software for two days. If software is being developed using agile techniques, this release process may need to be repeated frequently. Additionally, for multi-homed software, these release procedures may need to be repeated across multiple datacenters in multiple locations. Such repetitions are not always parallelizable because a certain number of replicas may be alive all the time (e.g., a majority if Paxos-based consensus algorithm is used to manage multi-homed software). Therefore, it may be necessary for a software development team to have least one release engineer whose full-time job is to release software.
As recognized by the inventor, in order to manage software releases in an efficient manner, requiring less manual time of release engineers, there should be a workflow framework for release operations that automates release processes and provides fault tolerance to tolerate transient failures at runtime and to make it easier to determine where errors occur.