One of the continuing objectives of molecular biology research is to clone genes for proteins and then to characterize the domains and activities in the proteins. Generally if one has an antibody that recognizes a protein of interest, or a ligand to which the protein binds, it is possible to isolate and to purify the gene which encodes the protein of interest. Several techniques exist for using antibodies to screen proteins encoded by cloned DNA insert libraries in plasmid or phage expression vectors in a host. Foreign proteins encoded by the cloned DNA are accessible to the known antibody or ligand and, for example, may be separated from a population of uninteresting insert-containing phage using standard affinity techniques, such as chromatography with bound antibody or ligand. Using such screening techniques, one may isolate the bacterial clones which contain the gene of interest and may then undertake a wide array of molecular biological analyses of that gene.
At a finer scale, it is often useful to determine which amino acid sequences of a protein bind to a ligand or known antibody. Such sequences are referred to here as ligand binding domains and include antigenic determinants, or epitopes, as well as domains that bind to biological receptors. A ligand binding domain is a three-dimensional region of a protein molecule whose ability to bind a ligand or antibody is a function of three attributes. Of foremost importance is the linear sequence of amino acids that form the ligand binding domain of the protein. Secondly, the proper folding and twisting of a linear amino acid chain into a three-dimensional structure can form a ligand binding domain. Finally, ligand binding domains may form in crevices created during the interaction of several amino acid chains in a multi-chain protein.
One method for determining a priori which amino acid sequences form ligand binding domains is to use the antibody or ligand of interest to challenge a library of short amino acid sequences expressed as a peptide in a host cell. Along these lines, in efforts to generate diverse epitope libraries, collections of synthetic oligonucleotides encoding all possible hexapeptides (6-mers) and decapentapeptides (15-mers) have been produced and cloned into gene III of filamentous bacteriophage expression vectors such as FUSE5, M13LP67 and fAFF1. Gene III encodes pIII, a minor virion coat protein which tolerates short insertions between its internal structural domain and its external functional domain. See Scott, J. K. and G. P. Smith, "Searching for Peptide Ligands with an Epitope Library," 249 Science 386-390 (1990), Devlin, J. J. et al., "Random Peptide Libraries: A Source of Specific Protein Binding Molecules," 249 Science 404-406 (1990), and Cwirla, S. E., et al., Peptides on Phage: A vast library of peptides for identifying ligands," 87 P.N.A.S. 6378-6382 (1990). One may identify bacterial clones having a phage that encodes antibody- or ligand-binding peptides by selecting nitrocellulose-bound bacterial colonies with antibody- or ligand-binding affinity. The DNA sequence encoding the selected peptide or peptides can then easily be determined by standard DNA sequencing techniques.
Of course, mere binding of a hexapeptide to an antibody does not guarantee that the naturally-occurring epitope is identical or even related to the short peptide. For that reason, such synthetic epitopes are often referred to as mimetopes because they merely mimic the behavior of natural epitopes. While epitopes isolated in this manner may prove useful in the development of synthetic drugs and the like, they do not necessarily help a researcher discover epitope sites on genuine proteins of interest. The random oligonucleotide approach is further limited by the fact that naturally occurring ligand binding domains may be longer than fifteen amino acids long. As the length of the tested sequence increases, the number of possible epitopes increases exponentially. For instance, there are approximately 4.times.10.sup.7 different hexapeptide epitopes and 3.times.10.sup.19 possible 15-residue peptides. In general, the present practical limit on the creation and screening of random octapeptide libraries is a library containing approximately 2.5.times.10.sup.10 clones. For representative libraries of still longer test sequences, the ability to generate and screen libraries having sufficient numbers of distinct insert-containing clones becomes an issue if the desired ligand binding domain is longer than just a few amino acids. It is also time consuming and expensive to generate very long oligonucleotides. Furthermore, since binding sites are not necessarily encoded by contiguous bases, it may be important to consider longer peptides when searching for these ligand binding sites.
Determining the location of a ligand binding domain on a protein has often been a difficult undertaking. Typically, one would express and test deletion mutants of cloned genes for loss of activity such as antibody binding or enzymatic function. After broadly localizing a binding site to a particular domain, it would be necessary to chemically synthesize individual peptides from that protein domain, and to further demonstrate binding of a synthesized peptide to the antibody or ligand of interest. Furthermore, construction of deletion mutants has frequently required the presence of advantageous restriction enzyme sites within the protein coding region. However, when preparing deletion mutants by removing restriction enzyme fragments, one always risks cleaving a ligand binding site in two in the process.
Other methods used have involved cleavage of purified proteins into constituent peptides by protease digestion, and then determining the ligand binding region by immunological assay of the protease digestion products. This approach is extremely time consuming and has two major disadvantages. One disadvantage is that the cleavage of the protein is often incomplete and difficult to control so that the fragments are irregular. A second problem is that when the target epitope is identified, the amount of peptide available, which must be isolated, purified, and sequenced to obtain useful information, can be extremely small.
In summary, then, while it is possible to isolate genes using antibodies directed to a gene product, and to ask a priori which short amino acid sequence binds a known antibody or ligand, no existing convenient system permits one to quickly and easily determine which amino acids on a known protein of interest are antigenic or which nucleotide bases on a gene of interest encode a ligand binding site. No convenient systematic approach exists for routinely producing peptide libraries from all regions of a known protein-encoding gene. Typically, peptide analysis of known coding regions is performed by chemically synthesizing individual peptides, an expensive and time consuming task. As a result, detailed peptide analysis of entire coding regions has been limited to a few proteins of major economic importance, such as insulin. What is desired is a system that permits rapid subcloning of entire protein coding regions for subsequent fine-scale mapping of amino acids that encode ligand binding domains.