The present disclosure relates to the fields of molecular biology, molecular evolution, bioinformatics, and digital systems. More specifically, the disclosure relates to methods for computationally predicting the activity of a biomolecule and/or guiding directed evolution. Systems, including digital systems, and system software for performing these methods are also provided. Methods of the present disclosure have utility in the optimization of proteins for industrial and therapeutic use.
Protein design has long been known to be a difficult task if for no other reason than the combinatorial explosion of possible molecules that constitute searchable sequence space. The sequence space of proteins is immense and is impossible to explore exhaustively using methods currently known in the art. Because of this complexity, many approximate methods have been used to design better proteins; chief among them is the method of directed evolution. Today, directed evolution of proteins is dominated by various high throughput screening and recombination formats, often performed iteratively.
In parallel, various computational techniques have been proposed for exploring sequence-activity space. While each computational technique has advantages in certain contexts, new ways to efficiently search sequence space to identify functional proteins would be highly desirable.