Benchmarks are heavily used in different areas of computer science to evaluate processes and tools. Different specialized benchmarks enable researchers and practitioners to evaluate different components of systems while reducing the influence of unrelated components. In program analysis and testing, open-source and commercial programs are used as benchmarks to evaluate different aspects including scalability, test coverage, code translation, optimization, loading, and refactoring, among many others. Unfortunately, these programs are written by programmers who introduce different biases, and it is very difficult to find programs that can serve as benchmarks with high reproducibility of results on other programs.