This application describes discrete populations of oligopeptides of random sequences, polypeptides comprising those oligopeptides, oligonucleotides encoding those oligopeptides and recombinant vectors comprising those oligonucleotide sequences. The population of oligopeptides represents the universe of peptide epitopes. Also disclosed are discrete populations of antibodies (or hybridomas) capable of binding to the populations of oligopeptides. The disclosure of the present application relates to the identification and characterization of peptide epitopes, or recognition sites, of antibodies. More particularly, the determination of the linear amino acid sequence recognized by the antibody and of a nucleic acid sequence encoding that amino acid sequence are enabled by the disclosure herein.
The clonal selection theory of Burnet, which explains the general basis of antibody production, has gained virtually complete acceptance. Burnet, M. (1961) Sci. Am. 204:58; Jerne, N. K. (1976) Harvey Lecture 70:93. The theory is based on several premises: (1) as individual cells, i.e., lymphocytes, in the immune system differentiate, each becomes capable of producing only one species of antibody molecule; (2) the entire spectrum of possible antibody-producing cells is present within the lymphoid tissues prior to stimulation by any antigen; that is, the step in which each lymphocyte becomes specified to produce only one type of antibody molecule occurs in the absence of a potential antigen for that antibody; and (3) lymphocytes capable of producing an antibody specific to a particular antigen are induced, by the presence of that antigen, to proliferate and to produce large quantities of the antibody. An enormous range of genetically unique lymphoid cells is present in the lymphoid organs, e.g., the spleen, of each mammal. The spleen can be considered a library of cells, each of which can manufacture a unique antibody, and the library is so large that for any particular antigen, at least one lymph cell exists within the library that is capable of recognizing the antigen and producing antibodies specific to the antigen.
Heretofore, the production of an antibody that will recognize an antigen of interest has required the antigenic stimulation of a laboratory animal. Typically, the antigen is injected into a laboratory animal, and, after a suitable incubation period, a second injection is given. The spleen cells of the animal are then harvested and fused to myeloma cells. When fused to a spleen cell, the myeloma cell confers to the spleen cell its ability to grow in culture. Surviving colonies of fused cells, i.e., hybridomas, are then screened to identify clones that produce antibodies that specifically recognize the antigen. This procedure must be repeated each time it is desired to produce an antibody to a particular antigen. For each antigen of interest, it is necessary to (1) antigenically stimulate an animal, (2) remove its spleen and hybridize the spleen cells with myeloma cells, and (3) dilute, culture, and screen clones for specific antibody production. Though antibodies that recognize the antigen are produced, this technique does not identify the epitope, i.e., the specific site on the antigen that an antibody recognizes; and one cannot direct the development of antibodies specific to a particular predetermined site or region of the antigen. Also, hybridoma techniques are not effective in the direct development of monoclonal antibodies that recognize haptens, i.e., molecules that contain constitute antibody recognition sites, but which do not elicit an antigenic reaction when injected without a carrier into a laboratory animal. Since antigenic stimulation and antibody production are potentially hazardous to the host, the use of human hosts has been precluded in the development of monoclonal antibodies.
The universe of antibody binding specificities may be open or closed. If the universe of antibody binding specificities is closed, then the following basic tenets apply:
a) one can design and prepare any given epitope and isolate any antibody (for example, a monoclonal antibody produced by a member of a random set of hybridomas) from a universe of antibodies without having first immunized an experimental animal with an antigen containing that epitope. A self-addressing sorting scheme can be used to screen to identify the proper paired correspondence between antibody and epitope;
b) the universe of epitopes can be specified in at least a theoretical fashion, and in principle, can be synthesized; and
c) one can independently isolate and identify an antibody-producing hybridoma with the same epitopic specificity as one previously isolated and identified. Such a repeated isolation occurs in a xe2x80x9csecond hitxe2x80x9d experiment, and can be used to estimate the effective size of the universe of antibody specificities. Such an approach is similar in logic to defining a complementation group in genetics.
Even if the universe of epitopes is large, if it is closed, it can be defined by rules, algorithms or iterative analyses.
In the alternative, if the universe of antibody specificities is open, the following principles apply:
a) one cannot isolate an antibody specific for an epitope without prior immunization with an antigen containing that epitope;
b) the universe of epitopes cannot be specified or synthesized; and
c) one should not be able to independently isolate more than one antibody with the same target specificity.
The binding domain of a monoclonal antibody specific to a malaria virus surface protein has been identified as being no larger than 40 amino acids long. Cochrane, A. H. et al. Proc. Natl. Acad. Sci. U.S.A. 79:5651 (1982), inserted a 340 base pair sequence from a Plasmodium knowlesi gene into the pBR322 vector. The engineered vector produced in E. coli a beta-lactamase fusion polypeptide that reacted with a monoclonal antibody specific for a P. knowlesi circumsporozooite (CS) protein. This finding indicated that the binding domain of the monoclonal antibody was limited to a region of the CS protein encoded by the inserted sequence, or approximately 110 amino acids. Lupski, J. R. et al., Science 220:1285 (1983), used the same system and, employing transposition mapping techniques, further localized the binding domain to a 40-amino acid region of the CS protein.
Green, N. et al., published PCT application 84/00687, produced antibodies by inoculating laboratory animals with synthetic peptides. Antibodies produced in response to peptides having a length of 8 to 40 amino acid residues and corresponding to sequences in an influenza virus protein were cross-reactive with the virus in vitro.
Dame, J. B. et al., Science 225:593 (1984), sequenced the CS gene of Plasmodium falciparum and discovered 41 tandem repeats of a tetrapeptide, with some minor variations. Using synthetic peptides of 4, 7, 11, and 15 amino acid residues of the predominant repeating amino acid sequence, Dame et al. then conducted competitive binding assays to determine what length of peptide would inhibit the binding of the CS protein with a monoclonal antibody specific to that protein. Dame et al. found that the synthetic 4 amino acid sequence did not significantly inhibit binding, but the 7, 11 and 15 amino acid sequences did inhibit binding. These results suggest that this monoclonal antibody to the CS protein recognizes a 5 to 7 amino acid sequence comprising the repeating tetrapeptide.
The known crystal structures of the Fab fragment and lysozyme show that there are two contact points on the lysozyme molecule for the antibody combining site, and each contact point spans over about five amino acids. Earlier work on antibody binding to carbohydrate antigens and glycosidase cleavage protection experiments show that 5-6 sugar residues are protected from glycosidase cleavage. Studies with antibody binding to haptens also suggests that antibody sites are small. Peptide competition experiments, also called epitope mapping experiments, show that oligopeptides 4 to 5 amino acids in length can specifically compete for antibody binding.
In addition, linear sequences which differ in only one amino acid, can compete for antibody binding with varying degrees of specificity (see, e.g., Geysen et al. (1986) in Synthetic Peptides as Antigens; Ciba Foundation Symposium 119, R. Porter and J. Wheelan, Eds. (New York, Wiley) pp. 130-149).
While five amino acids is a representative length of peptide sequence which can bind with differential specificity to an antibody, five amino acid residues is not necessarily the size of an immunogenic peptide. Generally, when an oligopeptide is the desired immunogen, it is first conjugated to a larger carrier molecule. The actual operational relationship between the immunizing entity and the binding entity can only be resolved when an in vitro immunization-dependent antibody synthesis system is developed.
In one aspect the invention features a discrete population of oligonucleotides, each comprising the same length of from about 4 to about 12 nucleic acid coding triplets in random order. Each oligonucleotide encodes a corresponding oligopeptide of from about 4 to about 12 L-amino acid residues, and the entire population represents at least about 10% of all oligopeptide sequences of the selected length. In preferred embodiments, each member of the oligonucleotide population has a single copy of the random sequence of nucleotide triplets, the oligonucleotide sequence has between 4 and 7 triplets, and the oligonucleotide population can be generated by random shearing of mammalian genetic material or is chemically synthesized from the component nucleotides.
It is particularly preferred that each oligonucleotide sequence comprises five coding triplets. The oligonucleotide population may also be composed of members, each of which contains the same number of tandem repeats of each peptide coding sequence, where the number of tandem repeats is from two to about fifty. It is particularly preferred that the oligonucleotide population be sufficiently redundant so that each of all possible encoded oligopeptide sequences is present at least 10 times on average.
In a second aspect the invention features a discrete population of oligopeptides each of random amino acid sequence of the same length, of about 4 to about 12 L-amino acid residues, and the population makes up at least 10% of all peptide sequences of the predetermined length. In preferred embodiments each member of the population has a single copy of the peptide sequence, the oligopeptide sequence has between 4 and 7 L-amino acid residues, and the population can be generated by shearing of proteins, by chemical synthesis from the component L-amino acids, or by the translation of the oligonucleotides of random coding sequences.
It is particularly preferred that there be five amino acid residues in each oligopeptide. It is particularly preferred that the population of oligopeptides is sufficiently large so that each sequence is represented at least 10 times on average. The peptide population can also be composed of member peptides, each of which contains the same number of tandem repeats of the amino acid sequence, where the number of repeats is from two to about fifty.
In a third aspect, the invention features a discrete recombinant vector population of substantially identical autonomously replicating nucleic acid sequences including a structural gene and a population of oligonucleotide inserts therein, each insert containing a uniform length selected from between about 4 to about 12 nucleic acid coding triplets, preferably between 4 and 7, and most preferably five. Each insert is recombinantly inserted in frame into the structural gene of one of the nucleotide sequences, and preferably the oligonucleotide population encodes all oligopeptide sequences of the predetermined length. Preferably the recombinant vector population is redundant, i.e., contains a sufficient number of random oligonucleotide members so that all possible members are represented at least once. It is particularly preferred that the population is sufficiently redundant so that the population contains at least 10 copies of oligonucleotides encoding each possible peptide sequence, on average. In preferred embodiments each member of the insert population has a single copy of the sequence of nucleotide triplets, and the insert has coding triplets; the replicating sequence can be a plasmid such as pBR322, a virus such as xcexgt11 or vaccinia, or a filamentous bacteriophage, such as f1, fd or M13. The recombinant vector population can also be made up of individual vectors each containing the same number tandem repeats of an oligonucleotide sequence as defined above. The number of tandem repeats can be from two to about fifty in number.
The recombinant vector population can also be made up of individual vectors each containing the same number tandem repeats of an oligonucleotide sequence as defined above the number of tandem repeats can be from two to about fifty in number.
In a fourth aspect, the invention features a discrete heterogeneous population of antibodies comprising member antibodies capable of binding to substantially all members of discrete oligopeptide population featured in the second aspect of the invention, above.
In a fifth aspect, the invention features a discrete population of binding pairs that includes the discrete population of peptide sequences all of the same length selected from about 4 to about 12 L-amino acid residues and the heterogeneous population of antibodies capable of binding to substantially all the peptide sequences, where substantially every member of the peptide population is bound to a corresponding antibody.
In a sixth aspect, the invention features a matrix including a discrete population of random peptide sequences and a heterogeneous population of antibodies.
In a seventh aspect, the invention features a method for constructing a matrix including the steps of (1) obtaining a population of peptides or polypeptides comprising peptides as described above, having a uniform length of between about 4 and about 12 L-amino acid residues of random sequence and including at least about 10% of all peptide sequences of the predetermined length; (2) obtaining a discrete heterogeneous population of antibodies capable of binding to substantially every member of the polypeptide population; and (3) contacting the antibodies with the antigens for a sufficient amount of time and under appropriate conditions so that binding occurs. Preferably, the peptide length is 4 to 7 amino acids, and most preferably, 5 amino acids. In preferred embodiments: each of the peptides and each of the antibodies is isolated and each is contacted individually with each of the antibodies until at least one peptide antibody binding pair is identified; the peptides can be immobilized on an appropriate substrate and the antibodies can be labeled; the antibodies can be immobilized and the peptide sequences can be labeled; or the peptide sequences can be excised from the polypeptides.
It is preferred in all of the foregoing aspects of the invention that the populations be sufficiently large so as to contain all theoretical members of the population, and it is particularly preferred that each population of the invention is sufficiently redundant so that it is statistically unlikely that sampling for a particular member will fail, as is understood in the art.
The invention provides an efficient and convenient means for the identification and production of monoclonal antibodies to any specific region of any antigen or hapten of interest. Monoclonal antibody production, according to the invention, does not require antigenic stimulation of a host animal. This is a critical concept of the present invention. Such antigenic stimulation can be employed to increase the frequency for cognate hybridoma formation, but there will be a member of an antibody population (of a sufficiently large number of members) which will recognize the particular epitope even in the absence of such stimulation.
The invention involves the antibody binding properties of a test species, e.g., a peptide, but is totally independent of the ability of the test species to induce an antigenic response in vivo. The invention permits the identification of the specific peptide sequence on a protein that is recognized by an antibody, i.e., the epitope. The specificity of antibodies recognizing distinct sequences, or epitopes, on the same antigen can be differentiated. In addition, the invention permits the characterization and the localization on a chromosome of the nucleotide sequence encoding the amino acid sequence recognized by an antibody.
Using conventional monoclonal techniques, one can produce antibodies that might react, for example, with an undetermined site on a particular Plasmodium circumsporozooite protein or a particular influenza virus. Using the present invention, one can identify all the epitopes on that molecule or organism and obtain antibodies recognizing each of these epitopes. By judiciously combining a number of distinct antibodies, each of which recognizes a different epitope on the surface of a particular antigen, a material with any desired degree of specificity can be obtained. Also using the invention, one can identify epitopic sequences that are common to, e.g., the circumsporozooite proteins of several Plasmodium species or common to several strains of influenza, and screen for antibodies recognizing these common sequences, thereby identifying a single set of antibodies, each of which is effective against a broad range of malarial or influenza infections.
Certain viruses, such as the LAV or HTLV-III virus, contain on their surfaces both highly mutable regions and constant regions. The viruses"" ability to alter their surface characteristics has hampered the development, through standard monoclonal techniques, of antibodies to these viruses. Any antibody that recognizes a mutable region of a virus would become ineffective as the virus mutated to produce strains having altered configurations in the region recognized by the antibody. Once the constant regions of a virus have been identified and characterized, the invention permits the identification and production of antibodies that recognize these constant regions, even if the peptide sequences comprising these constant regions would not themselves elicit an immunogenic response in vivo. Such antibodies would be effective against various mutated strains of the virus.
Other features and advantages of the invention will be apparent from the following description of the preferred embodiments and from the claims.
It is believed that an epitope has limited dimensions of between about 30 and 50 angstroms. An antibody that recognizes a specific peptide sequence or configuration or carbohydrates on the surface of an antigen will recognize that same configuration if it is duplicated or closely approximated on a different antigen. This phenomenon underlies the cross-reactivity sometimes encountered with monoclonal antibodies.
The size of the antibody recognition site corresponds to a peptide sequence in the range of between about 4 and 7 amino acid residues with the majority of recognition sites spanning about 4 to 6 amino acids. Mammalian proteins and polypeptides are composed almost exclusively of the twenty naturally occurring amino acids, i.e., glycine and the L-isomers of alanine, valine, leucine, isoleucine, proline, phenylalanine, tyrosine, tryptophan, serine, threonine, aspartic acid, glutamic acid, asparagine, glutamine, cysteine, methionine, histidine, lysine, and arginine. There are about three million (205) different possible sequences of the twenty amino acid residues taken five at a time, and about sixty million if the amino acid residues are taken six at a time. This finite number of peptide sequences represents the full range of possible antibody recognition sites which can be represented or mimicked by linear peptide epitopes. Production and maintenance of a representative sample of the full range of antibodies and of a representative sample of the peptide sequences of the appropriate length provides the means (1) to screen any antibody of interest in order to determine the precise epitopic peptide sequence it binds to and (2) to screen any protein in order to find an antibody specific to that protein.
The present invention identifies epitopic (antibody-binding) sites that comprise a primary peptide sequence. The identified linear epitope may mimic a discontinuous peptide epitope or a non-peptide epitope, e.g., a carbohydrate sequence that can be closely approximated by a peptide sequence with respect to antibody recognition.
In view of these considerations, the invention provides the means and methods for the identification and characterization of peptide epitopes, and of the antibodies that bind to them.
Antibody Production
According to the clonal selection theory, an unchallenged mammalian host has the capacity to produce antibodies to a vast array of foreign antigens. The presence of an antigen triggers the proliferation of those lymphocytes already present having the ability to produce antibodies specific for that antigen. Since there is a finite number of linear peptide sequences of the length that is recognized by antibodies, it can be expected that each mammal has the capability to produce antibodies that will recognize most, if not all of these sequences. Thus, the spleen of a mouse or another laboratory animal can serve as an appropriate source for a full range of antibodies. The spleen can be harvested from a laboratory animal, and, using standard techniques, the individual cells are fused to myeloma cells and hybridoma strains are developed.
Depending on the desired characteristics of the resulting hybridoma population, either antigenically stimulated animals can be used, or animals that have not been specifically challenged with the antigenic material of interest can be used.
If antigenically stimulated animals are used, then a higher proportion of the resulting hybridomas will produce antibodies specific to the antigen used. If, on the other hand, unchallenged animals are used, then it can be expected that the antibodies retrieved from the resulting population of hybridomas will represent a broader range of the antibodies that the animals are capable of producing. The predominant antibodies produced by a mature animal raised under standard laboratory conditions will reflect and be limited by its individual exposure history. If spleens are harvested from several (at least about 10) unchallenged mature animals and combined together, and the spleen cells fused to myeloma cells, then the resulting discrete population of hybridomas will produce a more complete range of antibodies then would hybridomas from any single individual. Antibodies produced by the hybridomas derived from the spleen cells of mature animals that were raised aseptically or from fetal or neonatal animals that were raised aseptically or from fetal or neonatal animals will not reflect any exposure history and can be expected to represent a random sample of the full range of antibodies that the animals are capable of producing.
Since this procedure does not require antigenic stimulation of donor animals before harvesting the spleens, it is now possible to develop antibodies derived from human cells. Normal spleen cells can be collected from one or a number of human donors and the harvested cells fused to myeloma cells and cultured as described above. Alternatively, a library of human antibodies can be developed over time by obtaining cell cultures from, e.g., a large number of myeloma patients, each patient having a distinctive tumor.
It is now possible to use a recombinant library to generate the universe of antibody binding specificities instead of a hybridoma library. Huse et al. (1989) Science 246:1275-1281, describes the generation of a large combinational library of mouse Fab fragments. Alting-Mees et al. (1990) Strategies in Molecular Biology 3:1-2,9 describes bacteriophage (A) expression libraries for antibody production.
Production of Peptide Sequences
Numerous methods are available for the production of the desired population of peptide sequences. For certain embodiments of the invention these peptide sequences can be produced directly either by randomly shearing proteins and then recovering by electrophoresis the peptide sequences of the appropriate length, or by synthesizing the desired random peptide sequences from the component amino acids.
Alternatively, these peptides can be produced through genetic engineering techniques. Peptides produced according to this general method can be termed coded peptides. A population of nucleotide sequences of the correct length to encode random peptide sequences of the desired length is generated. This can be accomplished either by random cleavage of biological genetic material followed by electrophoresis to recover those nucleotide sequences that were cut or sheared to the desired length, or by chemical synthesis from the component nucleotides or codons.
Depending on the desired characteristics of the resulting population of nucleotide sequences and ultimately, of the peptide sequences to be produced, different techniques are used to obtain the population of nucleotides. If a random population of nucleotide sequences is desired, then the nucleotides can be synthesized by adding the four nucleotides with equal frequency at each position of the growing nucleotide chains. If it is desired that the synthesized nucleotide triplets more closely reflect the distribution of naturally occurring triplets, then the frequency of each nucleotide employed at the first, second, or third position of each triplet can be manipulated to approximate the frequencies at which each nucleotide residue appears at each position in nature, as suggested in Crick F. H. C. et al., Origin of Life, 7:389-397 (1976). Any of several sources of genetic material can be selected to obtain by shearing nucleotide sequences of the desired length, e.g., cellular DNA or cDNA. cDNA, of course, would provide a closer representation of the naturally occurring coding sequences. Alternatively, chemically synthesized oligonucleotides of tandem sequence may be used.
When the desired population of nucleotide sequences has been obtained, the population can then be treated to facilitate the insertion of each sequence into a vector and to facilitate the subsequent recovery of the desired peptide sequence from the culture of host cells incorporating the engineered vector. For example, using known techniques, AUG sequences can be ligated to each end of each member of the population of nucleotide sequences. When each nucleotide sequence is translated, the desired peptide sequence will be flanked by methionine residues. The translated protein can then be treated with cyanogen bromide, which cleaves peptides at methionine sites, to excise the desired peptide sequence from the protein. The cleavage product can then be purified by electrophoresis. Preferably, a restriction endonuclease recognition sequence can be ligated to each end of each member of the population of nucleotide sequences and then the population of nucleotide sequence can be treated with the endonuclease recognizing the ligated sequence to produce xe2x80x9csticky endsxe2x80x9d which facilitate the insertion of the nucleotide sequence at the restriction site in a vector recognized by the endonuclease. When the population of nucleotide sequences is chemically synthesized, flanking restriction sites may be designed into the oligonucleotide nucleotide sequence, as understood in the art.
Each nucleotide sequence is then inserted into an appropriate vector. The ratio of nucleotide sequences to vectors can be controlled to ensure that, on the average, no more than one nucleotide sequence is inserted into any vector. The nucleotide sequence must be inserted at a location in the vector where it will be translated in phase when the vector is transferred into an appropriate host cell, and where it will not interfere with the replication of the vector under the experimental conditions employed, i.e., the nucleotide sequence must be inserted into a non-essential region of the vector. Pieczenik, U.S. Pat. Nos. 4,359,535, and 4,528,266 hereby incorporated by reference, disclose a method for inserting foreign DNA into a non-essential region of a vector.
Smith (1985) Science 228:1315-1317 describes the insertion of heterologous coding sequences into the unique BamHI within the minor coat protein (pIII) gene (gene III) of f1 and immunological screening for recombinant phage expressing the heterologous coding sequence. Parmley and Smith (1988) Gene 73:305-318 describe an f1 derivative which allows for the insertion of heterologous coding sequences at an engineered cloning site, allowing for the expression of a heterologous coding sequence near the mature N-terminus of pIII. Immunoaffinity purification can be used to purify recombinant phage expressing a desired epitopic sequence(s).
The nucleotide sequence is advantageously inserted in such a way that the peptide sequence encoded by the nucleotide sequence is expressed on the outside surface of the bacteriophage or the host cells with plasmids containing the nucleotide sequence. To prepare inserts having these characteristics, a vector, e.g., a phage or plasmid, with an appropriate cloning site, is first selected.
A suitable position for a cloning site may be determined empirically by performing an experiment to identify an insertion site in a structural gene which will allow expression of an inserted oligonucleotide coding sequence, and which will result in the expression of the encoded oligopeptide as an epitope within or at one end of a structural gene product such that recognition of the epitope in the recombinant virus or genetically modified host cell or protein is possible. That oligopeptide sequence can be detected using an antibody specific for an epitope of that sequence (or specific for an epitope mimicked by the conformation of that sequence).
The vector can then be cleaved at random sites according to the method disclosed in U.S. Pat. Nos. 4,359,535 and 4,528,266 to yield a population of linear DNA molecules having circularly permuted sequences, where the breakpoint in the circular molecule is at a random location in each molecule. After the cleavage steps, a synthetic oligonucleotide linker bearing a unique nucleotide sequence not present on the original unmodified vector can be attached to both ends of each linearized vector by blunt end ligation. The random linear DNA molecules can then be treated with the restriction endonuclease specific to the attached sequences, to generate cohesive ends.
All such recombinant vectors which allow immunologic detection of the encoded oligopeptide express that epitope in a context-insensitive fashion. For the purposes of this invention, context-insensitive means that the milieu in which the oligopeptide is expressed does not prevent recognition by the cognate antibody. The actual insertion site on the vector can be determined by sequence analysis, as understood in the art, and that site can be modified to contain an appropriate cloning site. As understood in the art, the insertion and immunological detection should be repeated to confirm functionality in context-insensitive expression of an epitopic sequence. Such an engineered vector can be used in the practice of the invention. The immunological detection of an inserted oligonucleotide sequence encoding a context-insensitive epitope is to be called a xe2x80x9ctopological mappingxe2x80x9d of the surface of the vector. The topological mapping of a vector allows the optimum design of an expression vector.
DNA sequences encoding a gene product, e.g., human hemoglobin, where these sequences are not naturally present in the vector, can be cleaved by any method known to the art and fractionated to the desired size, e.g., fifteen nucleotides long, and the nucleotide sequences ligated to the same type of linker used with the random linears. The fractionated nucleotide sequences are then inserted into the random linears, and the modified vectors are transferred into appropriate host cells. The host cells are diluted, plated, and the individual colonies (or plaques) grown up. On replica plates, the colonies (or plaques) are screened with a monoclonal or polyclonal antibody specific to the gene product. A suitable control to insure that selected colonies or plaques express epitopes of the desired specificity is the host cell into which unmodified vector has been introduced, as understood by the skilled artisan.
A positive reaction with the antibody identifies a colony wherein the inserted nucleotide sequence is translated in phase, and the encoded peptide sequence is on the outside surface of the polypeptide or protein, or otherwise accessible to the antibody screening assay. If a monoclonal antibody is employed in the screening step, then this procedure will identify only those colonies where the specific peptide sequence comprising the site recognized by that antibody is inserted on the outside surface of the polypeptide or protein unless appropriate pretreatment has been carried out. If a polyclonal antibody is employed, or a mixture of several monoclonals, then any colony, virus, polypeptide or protein expressing a cognate epitope in a manner accessible for antibody binding will be identified. This procedure identifies recombinant vectors which can be advantageously used in the present invention.
The insertion step creates a discrete population of vectors, each member of the population containing an oligonucleotide insert encoding a different peptide from a population of random amino acid sequences, each encoded peptide sequence containing the same desired number of amino acid residues, preferably five. The discrete population of vectors is then transferred into a population of appropriate host cells. Concentrations of vectors and of host cells can be controlled to ensure that, on the average, no more than one vector is transferred into any individual host cell. Cells are plated and cultured, and the translated proteins are harvested therefrom.
The population of recombinant f1 bacteriophage, as described in Example IV, with random oligonucleotides inserted, will express fusion proteins containing the heterologous peptides of random amino acid sequence. In this embodiment, the heterologous peptides are located within the pIII minor coat protein other insertion sites may be utilized as understood by the skilled artisan for particular desired purposes. For example, Parmley and Smith (1988) Gene 73:305-318 demonstrates the expression of foreign epitopes at the N-terminal end of pIII of f1. Devlin et al. (1990) Science 249:404-406 describes a novel expression vector (M13LP67) derived from M13mp19; foreign epitopes were expressed near the N-terminus of the processed form of B-galactosidase. Cwirla et al. (1990) Proc. Natl. Acad. Sci. USA 87:6378-6382 reports the expression of a population of peptides expressed fused at the N-terminus of pIII of modified bacteriophage fd.
Creating the Matrix
The particular construction of the matrix created from the full range of antibodies or from the peptide sequences described above depends on its use. Either the antibodies or the peptide sequences are immobilized on a solid support substrate or an immobile phase, e.g., nitrocellulose if a two dimensional support is desired or material which can be incorporated in a column if a three dimensional support best serves its purpose, as will be understood by the ordinary skilled artisan. The immobilization can be accomplished by covalently linking the antibodies or peptide sequences to the substrate. Each site on the matrix is occupied by a single chemical species, i.e., a monoclonal antibody or a purified peptide. The source of each individual immobilized species is maintained as a separate culture. In general, the antibodies, the peptide sequences, or the test species are labeled with an appropriate label, such as a fluorescent compound, an enzyme, or a radioactive tracer, as known in the art. The peptide sequence itself can serve as a sensitive biological tag where it occurs on the surface of a protein, virus or modified host cell.
Where the antibodies are immobilized, the peptide sequences or polypeptides comprising those peptide sequences are then contacted with the antibodies under appropriate conditions and for a sufficient amount of time so that each immobilized antibody binds to the peptide sequence to which it is specific. Where the peptide sequences are immobilized, the antibodies are then contacted with the peptide sequences so that each immobilized peptide sequence is recognized and bound by an antibody specific for that particular sequence. Each complex of peptide sequence and its bound antibody can be termed a binding pair. In some cases, the antibodies or peptide sequences themselves are immobilized on the substrate; in other cases the cell cultures producing the antibodies or the modified host cells expressing the peptides are immobilized. Binding pairs are created in a single step, taking advantage of the natural affinity of antibodies for the peptide sequences to which they are specific. If a sample of peptides is contacted with a population of immobilized antibodies, then the peptides will self-sort and each will bind to its corresponding antibody. Similarly, if a sample of antibodies is contacted with a population of immobilized peptides, then the antibodies will self-sort and each will bind to its cognate peptide. The sorting will occur notwithstanding that there is no prior knowledge as to the functional characteristics of any of the individual antibodies or peptides.
A matrix where the antibodies are immobilized on the substrate will be designated an antibody-immobilized matrix, or AIM. Where each immobilized antibody forms a binding pair with a corresponding peptide sequence, the matrix will be designated P-AIM. Similarly, a matrix where the peptide sequences are immobilized matrix, or PIM. Where each immobilized peptide sequence forms a binding pair with a corresponding antibody, the matrix will be designated A-PIM.
Generally, the method of the invention involves contacting a test species with an intact P-AIM or an intact A-PIM, the specific characteristics of the matrix depending on the nature of the information sought as the skilled artisan will readily understand. Considering the large number of different hybridomas, recombinant vectors and genetically modified host cells that are involved in the practice of the invention, the antibodies or peptide sequences can be immobilized very densely on the substrate. Areas of competitive binding are identified when the test species is contacted with the matrix.
Recombinant vectors or modified host cells or colonies from these areas of competitive binding can then be retrieved, repeated less densely, and the competitive binding step with the test species repeated in order to specifically identify the individual colony producing the antibody or amino acid sequence where pairing was disturbed.
Screening an Antibody or Test Species of Interest
A P-AIM is used both to identify and obtain antibody clones that are specific to a test species of interest and to identify the specific peptide sequence recognized by an antibody of interest. The test species can be, for example, a virus, a bacteriophage, a virus coat protein, a surface protein of a viral or bacterial pathogen, a protein on the surface of a malignant cell, an enzyme, or a peptide having the sequence of a selected portion of a protein of interest. The test species need not contain peptides, but may be, e.g., a drug or carbohydrate having a three dimensional structure that is closely approximated by a peptide sequence.
The test species is contacted with a P-AIM in a competitive binding assay with each of the completed binding pairs. Each binding pair occupies a unique site on the matrix. Where these pairs have been labeled, any pairings disturbed by the presence of the test species can be identified.
A particularly sensitive labeling technique is obtained where the peptide sequences bound to the immobilized antibodies are on the surface of a protein or vector. After the P-AIM is created and the binding pairs are established, the P-AIM is thoroughly washed to remove any unbound peptide sequences. The test species is then contacted with the P-AIM. Any peptide sequences that are displaced from their corresponding antibodies by the presence of the test species can be directly titered off the P-AIM. Available techniques are sufficiently sensitive to detect the presence of as few as ten molecules of protein, recombinant vector or modified host cells in the titered supernatant.
Where the test species is labeled, its binding can be detected directly. Each clone producing an antibody that binds to a test species is identified and cultured to provide a source of the antibody. Each culture producing a peptide sequence displaced by the presence of an antibody of interest is identified and cultured to provide a source of that peptide sequence.
A PIM is used both to identify the specific sequences on a test protein or polypeptide that can be recognized by antibodies and to identify the specific peptide sequences recognized by an antibody of interest. Each clone or peptide in a PIM represents the expression or presence of at least 104-107 copies of the individual peptide sequence so that detection of labeled antibody binding or of the displacement of bound labeled antibody is readily accomplished using techniques known to the art. The procedure for screening on a PIM is analogous to the procedure, above, for screening on an AIM. The test protein or peptide sequence, or the test antibody, is contacted with an intact A-PIM in a competitive binding assay with each of the antibody-peptide sequence pairs. The pairings disturbed by the presence of the test protein or polypeptide or test antibody are noted, and the clones producing the amino acid sequence to which pairing was disturbed are identified and cultured. By this method, not only is it possible to determine the amino acid sequence recognized by the antibody, but it is now possible as well to identify a nucleic acid sequence encoding this amino acid sequence, as the oligonucleotide insert in the vector contained in the clone that produces the recognized amino acid sequence.