Because of the immense size of sequence space, there is no effective way to systematically screen all possible permutations of a polymeric biological molecule such as a nucleic acid or protein for a desired property. To test each possible amino acid at each position in a protein, or each possible nucleotide base at each position in a gene, rapidly leads to such a large number of molecules to be tested such that no available methods of synthesis or testing are feasible, even for a polymer of modest length. Furthermore, most molecules generated in such a way would lack any measurable level of the desired property. Total sequence space is very large and the functional solutions in this space are sparsely distributed.
Two primary approaches have to date been used to identify polymeric biological molecules with desired properties: mechanistic and empirical. There are significant limitations to both of these approaches. The mechanistic approach is often hampered by insufficient knowledge of the system to be improved, meaning either that considerable resources must be devoted to characterizing the system (for example by obtaining high quality protein crystal structures and relating these to the properties of interest), or that meaningful predictions cannot be made. In contrast, the empirical approach requires no mechanistic understanding, but relies upon direct measurements of a biopolymer's properties to select those variants that are improved. This strength is also its weakness; large numbers of variants cannot typically be tested under conditions that are identical to those of the final application. High throughput screens are widely used to provide surrogate measurements of the properties of interest, but these are often inadequate: binding of a protein to a receptor in a phage display assay may have little bearing on its ultimate usefulness as a therapeutic protein, the activity of an enzyme in a microtitre plate may be unrelated to its activity in a biocatalytic reactor.
Empirical engineering of nucleic acids, proteins and other biopolymers relies upon creating and testing sets of variants, then using this information to design and synthesize subsequent sets of variants that are enriched for components that contribute to the desired activity. A key limitation for any empirical biopolymer engineering is in developing a good assay for biopolymer function. The assay must measure biopolymer properties that are relevant to the final application, but must also be capable of testing a sufficient number of variants to identify what may be only a small fraction that are actually improved. The difficulty of creating such an assay is particularly relevant when optimizing biopolymers for complex functions that are difficult to measure in high throughput. Examples include proteins or nucleic acids for therapeutic purposes and catalysts for the synthesis or degradation of polymers or chiral molecules.
Large numbers of variants cannot typically be tested under conditions that are identical to those of the final application. High throughput screens are widely used to provide surrogate measurements of the properties of interest, but these are often inadequate. As examples, binding of a protein to a receptor in a phage display assay can have little bearing on its ultimate usefulness as a therapeutic protein and the activity of an enzyme in a microtitre plate can be unrelated to its activity in a biocatalytic reactor.
Limitations in current methods for searching through biopolymer sequences for specific commercially relevant functionalities creates a need in the art for methods that can design and synthesize small numbers of variants for functional testing and that can use the resulting sequence and functional information to design and synthesize small numbers of variants improved for a desired commercially useful activity. Limitations in current methods for choosing surrogate screens appropriate for empirical biopolymer engineering creates a need in the art for methods that can design and create small numbers of variants that can then be tested for specific commercially relevant functionalities.