Engineers working on software projects such as browsers or graphic libraries can optimize their code and test new features by running benchmarks, scripts, and other tasks on multiple computing machines using a repository of archived web pages as test data.
Conventional systems allow engineers to manually capture reusable archives of small sets of webpages to test code changes in continuous builds. However, such a small sample set has limited benefits since code changes may affect webpages differently. Expanding a small set of webpages to a massive repository with thousands to millions of webpage archives is difficult due to constraints such as disk space, processing power, and time. Additionally, deciding the number of machines necessary to perform a certain benchmark, script, or task may be problematic to determine.