1. Field of the Invention
This invention relates to compositions, methods and kits for generating libraries of recombinant expression vectors and using these libraries in screening of affinity-binding pairs, and, more particularly, for generating libraries of recombinant human antibodies and screening for their affinity binding with target antigens.
2. Description of Related Art
Antibodies are a diverse class of molecules. Delves, P. J. (1997) xe2x80x9cAntibody production: essential techniquesxe2x80x9d, New York, John Wiley and Sons, pp. 90-113. It is estimated that even in the absence of antigen stimulation a human makes at least 1015 different antibody moleculesxe2x80x94its Permian antibody repertoire. The antigen-binding sites of many antibodies can cross-react with a variety of related but different antigenic determinants, and the Permian repertoire is apparently large enough to ensure that there will be an antigen-binding site to fit almost any potential antigenic determinant, albeit with low affinity.
Structurally, antibodies or immunoglobulins (Igs) are composed of one or more Y-shaped units. For example, immunoglobulin G (IgG) has a molecular weight of 150 kDa and consists of just one of these units. Typically, an antibody can be proteolytically cleaved by the proteinase papain into two identical Fab (fragment antigen binding) fragments and one Fc (fragment crystallizable) fragment. Each Fab contains one binding site for antigen, and the Fc portion of the antibodies mediates other aspects of the immune response.
A typical antibody contains four polypeptides-two identical copies of a heavy (H) chain and two copies of a light (L) chain, forming a general formula H2L2. Each L chain is attached to one H chain by a disulfide bond. The two H chains are also attached to each other by disulfide bonds. Papain cleaves N-terminal to the disulfide bonds that hold the H chains together. Each of the resulting Fabs consists of an entire L chain plus the N-terminal half of an H chain; the Fc is composed of the C-terminal halves of two H chains. Pepsin cleaves at numerous sites C-terminal to the inter-H disulfide bonds, resulting in the formation of a divalent fragment [F(abxe2x80x2)] and many small fragments of the Fc portion. IgG heavy chains contain one N-terminal variable (VH) plus three C-terminal constant (CH1, CH2 and CH3) regions. Light chains contain one N-terminal variable (VL) and one C-terminal constant (CL) region each. The different variable and constant regions of either heavy or light chains are of roughly equal length (about 110 amino residues per region). Fabs consist of one VL, VH, CH1, and CL region each. The VL and VH portions contain hypervariable segments (complementarity-determining regions or CDR) that form the antibody combining site.
The VL and VH portions of a monoclonal antibody have also been linked by a synthetic linker to form a single chain protein (scFv) which retains the same specificity and affinity for the antigen as the monoclonal antibody itself. Bird, R. E., et al. (1988) xe2x80x9cSingle-chain antigen-binding proteinsxe2x80x9d Science 242:423-426. A typical scFv is a recombinant polypeptide composed of a VL tethered to a VH by a designed peptide, such as (Gly4-Ser)3, that links the carboxyl terminus of the VL to the amino terminus of the VH sequence. The construction of the DNA sequence encoding a scFv can be achieved by using a universal primer encoding the (Gly4-Ser)3 linker by polymerase chain reactions (PCR). Lake, D. F., et al. (1995) xe2x80x9cGeneration of diverse single-chain proteins using a universal (Gly4-Ser)3 encoding oligonucleotidexe2x80x9d Biotechniques 19:700-702.
The mammalian immune system has evolved unique genetic mechanisms that enable it to generate an almost unlimited number of different light and heavy chains in a remarkably economical way by joining separate gene segments together before they are transcribed. For each type of Ig chainxe2x80x94xcexa light chains, xcex light chains, and heavy chain-there is a separate pool of gene segments from which a single peptide chain is eventually synthesized. Each pool is on a different chromosome and usually contains a large number of gene segments encoding the V region of an Ig chain and a smaller number of gene segments encoding the C region. During B cell development a complete coding sequence for each of the two Ig chains to be synthesized is assembled by site-specific genetic recombination, bringing together the entire coding sequences for a V region and the coding sequence for a C region. In addition, the V region of a light chain is encoded by a DNA sequence assembled from two gene segmentsxe2x80x94a V gene segment and short joining or J gene segment. The V region of a heavy chain is encoded by a DNA sequence assembled from three gene segmentsxe2x80x94a V gene segment, a J gene segment and a diversity or D segment.
The large number of inherited V, J and D gene segments available for encoding Ig chains makes a substantial contribution on its own to antibody diversity, but the combinatorial joining of these segments greatly increases this contribution. Further, imprecise joining of gene segments and somatic mutations introduced during the V-D-J segment joining at the pre-B cell stage greatly increases the diversity of the V regions.
After immunization against an antigen, a mammal goes through a process known as affinity maturation to produce antibodies with higher affinity toward the antigen. Such antigien-driven somatic hypermutation fine-tunes antibody responses to a given antigen, presumably due to the accumulation of antibody responses to a given antigen, presumably due to the accumulation of point mutations specifically in both heavy-and light-chain V region coding sequences and a selected expansion of high-affinity antibody-bearing B cell clones.
Great efforts have made to mimic such a natural maturation of antibodies against various antigens, especially antigens associated with diseases such as autoimmune diseases, cancer, AIDS and asthma. In particular, phage display technology has been used extensively to generate large libraries of antibody fragments by exploiting the capability of bacteriophage to express and display biologically functional protein molecule on the its surface. Combinatorial libraries of antibodies have been generated in bacteriophage lambda expression systems which may be screened as bacteriophage plaques or as colonies of lysogens (Huse et al. (1989) Science 246: 1275; Caton and Koprowski (1990) Proc. Natl. Acad. Sci. (U.S.A.) 87: 6450; Mullinax et al (1990) Proc. Natl. Acad. Sci. (U.S.A.) 87: 8095; Persson et al. (1991) Proc. Natl. Acad. Sci. (U.S.A.) 88: 2432). Various embodiments of bacteriophage antibody display libraries and lambda phage expression libraries have been described (Kang et al. (1991) Proc. Natl. Acad. Sci. (U.S.A.) 88: 4363; Clackson et al. (1991) Nature 352: 624; McCafferty et al. (1990) Nature 348: 552; Burton et al. (1991) Proc. Natl. Acad. Sci. (U.S.A.) 88: 10134; Hoogenboom et al. (1991) Nucleic Acids Res. 19: 4133; Chang et al. (1991) J. Immunol. 147: 3610; Breitling et al. (1991) Gene 104:147; Marks et al. (1991) J. Mol. Biol. 222: 581; Barbas et al. (1992) Proc. Natl. Acad. Sci. (U.S.A.) 89: 4457; Hawkins and Winter (1992) J. Immunol. 22: 867; Marks et al. (1992) Biotechnology 10: 779; Marks et al. (1992) J. Biol. Chem. 267: 16007; Lowman et al (1991) Biochemistry 30: 10832; Lerner et al. (1992) Science 258: 1313). Also see review by Rader, C. and Barbas, C. F. (1997) xe2x80x9cPhage display of combinatorial antibody librariesxe2x80x9d Curr. Opin. Biotechnol. 8:503-508.
Various scFv libraries displayed on bacteriophage coat proteins have been described. Marks et al. (1992) Biotechnology 10: 779; Winter G and Milstein C (1991) Nature 349: 293; Clackson et al. (1991) op.cit.; Marks et al. (1991) J. Mol. Biol. 222: 581; Chaudhary et al. (1990) Proc. Natl. Acad. Sci. (USA) 87: 1066; Chiswell et al. (1992) TIBTECH 10: 80; and Huston et al. (1988) Proc. Natl. Acad. Sci. (USA) 85: 5879.
Generally, a phage library is created by inserting a library of a random oligonucleotide or a cDNA library encoding antibody fragment such as VL and VH into gene 3 of M13 or fd phage. Each inserted gene is expressed at the N-terminal of the gene 3 product, a minor coat protein of the phage. As a result, peptide libraries that contain diverse peptides can be constructed. The phage library is then affinity screened against immobilized target molecule of interest, such as an antigen, and specifically bound phages are recovered and amplified by infection into Escherichia coli host cells. Typically, the target molecule of interest such as a receptor (e.g., polypeptide, carbohydrate, glycoprotein, nucleic acid) is immobilized by covalent linkage to a chromatography resin to enrich for reactive phage by affinity chromatography) and/or labeled for screen plaques or colony lifts. This procedure is called biopanning. Finally, amplified phages can be sequenced for deduction of the specific peptide sequences. During the inherent nature of phage display, the antibodies displayed on the surface of the phage may not adopt its native conformation under such in vitro selection conditions as in a mammalian system. In addition, bacteria do not readily process, assemble, or express/secrete functional antibodies.
Transgenic animals such as mice have been used to generate fully human antibodies by using the XENOMOUSE(trademark) technology developed by companies such as Abgenix, Inc., Fremont, Calif. and Medarex, Inc. Annandale, N.J. Strains of mice are engineered by suppressing mouse antibody gene expression and functionally replacing it with human antibody gene expression. This technology utilizes the natural power of the mouse immune system in surveillance and affinity maturation to produce a broad repertoire of high affinity antibodies. However, the breeding of such strains of transgenic mice and selection of high affinity antibodies can take a long period of time. Further, the antigen against which the pool of the human antibody is selected has to be recognized by the mouse as a foreign antigen in order to mount immune response; antibodies against a target antigen that does not have immunogenicity in a mouse may not be able selected by using this technology. In addition, there may be a regulatory issue regarding the use of transgenic animals, such as transgenic goats (developed by Genzyme Transgenics, Framingham, Mass.) and chickens (developed by Geneworks, Inc., Ann Arbor, Mich.), to produce antibody, as well as safety issues concerning containment of transgenic animals infected with recombinant viral vectors.
Antibodies and antibody fragments have also been produced in transgenic plants. Plants, such as corn plants (developed by Integrated Protein Technologies, St. Louis, Mo.), are transformed with vectors carrying antibody genes, which results in stable integration of these foreign genes into the plant genome. In comparison, most microorganisms transformed with plasmids can lose the plasmids during a prolonged fermentation. Transgenenic plant may be used as a cheaper means to produce antibody in large scales. However, due to the long growth circles of plants screening for antibody with high binding affinity toward a target antigen may not be efficient and feasible for high throughput screening in plants.
The present invention compositions, methods and kits for efficiently generating and screening for protein-protein or protein DNA binding pairs in vivo. The production and screening of the binding pairs can be adopted for high throughput screening in vivo.
In one aspect of the present invention, compositions are provided. These compositions may be used for screening affinity binding pairs between a tester protein and a target molecule including protein, peptide, DNA, RNA, and small molecules in vitro or in vivo.
In one embodiment, a library of yeast expression vectors is provided. The yeast expression vectors forming in the library comprise a first nucleotide sequence encoding a first polypeptide subunit; a second nucleotide sequence encoding a second polypeptide subunit; and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. The first polypeptide subunit, the second polypeptide subunit, and the linker polypeptide are expressed as a single fusion protein. In addition, the first and second nucleotide sequence each independently varies within the library of expression vectors.
According to the embodiment, the yeast expression vector may be a 2xcexc plasmid vector, preferably a yeast-bacterial shuttle vector which contains a bacterial origin of replication.
In another embodiment, a library of expression vectors is provided. The expression vectors forming in the library comprise: a transcription sequence encoding an activation domain or a DNA binding domain of a transcription activator; a first nucleotide sequence encoding a first polypeptide subunit; a second nucleotide sequence encoding a second polypeptide subunit; and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. The activation domain or the DNA binding domain of the transcription activator, the first polypeptide subunit, the second polypeptide subunit, and the linker polypeptide are expressed as a single fusion protein. In addition, the first and second nucleotide sequences each independently varies within the library of expression vectors.
According to this embodiment, the expression vector may be a bacterial, phage, yeast, mammalian and viral expression vector, preferably a yeast expression vector, and more preferably a 2xcexc plasmid yeast expression vector.
Also according to this embodiment, the transcription activator sequence may be located 5xe2x80x2 relative to the first nucleotide sequence, the linker sequence, and the second nucleotide sequence. Alternatively, the transcription activator sequence may be located 3xe2x80x2 relative to the first nucleotide sequence, the linker sequence, and the second nucleotide sequence.
In yet another embodiment, a library of transformed yeast cells is provided. The library of yeast cells comprises a library of yeast expression vectors. The expression vectors in the library of transformed yeast cells comprise: a transcription sequence encoding an activation domain or a DNA binding domain of a transcription activator; a first nucleotide sequence encoding a first polypeptide subunit; a second nucleotide sequence encoding a second polypeptide subunit; and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. The activation domain or the DNA binding domain of the transcription activator, the first polypeptide subunit, the second polypeptide subunit, and the linker polypeptide are expressed as a single fusion protein. In addition, the first and second nucleotide sequences each independently varies within the library of expression vectors.
According to this embodiment, the yeast cells may be diploid yeast cells. Alternatively, the yeast cells may be haploids such as the a and xcex1 strain of yeast haploid cells.
In another aspect of the present invention, methods are provided for generating a library of yeast expression vectors that may be used for screening protein-protein or protein-DNA binding pairs.
In one embodiment, the method comprises: transforming into yeast cells a linearized yeast expression vector having a 5xe2x80x2- and 3xe2x80x2-terminus sequence at the site of linearization and a library of insert nucleotide sequences that are linear and double-stranded. The insert sequences comprise a first nucleotide sequence encoding a first polypeptide subunit, a second nucleotide sequence encoding a second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first and second polypeptide subunits. Each of the insert sequences also comprises a 5xe2x80x2- and 3xe2x80x2-flanking sequence at the ends of the insert sequence. The 5xe2x80x2- and 3xe2x80x2-flanking sequence of the insert sequence are sufficiently homologous to the 5xe2x80x2- and 3xe2x80x2-terminus sequences of the linearized yeast expression vector, respectively, to enable homologous recombination to occur. The homologous recombination occurring between the vector and the insert sequence results in inclusion of the insert sequence into the vector in the transformed yeast cells.
In this embodiment, the first polypeptide subunit, the second polypeptide subunit, and the linker polypeptide are expressed as a single fusion protein. Also, the first and second nucleotide sequences each independently varies within the library of expression vectors.
According to the embodiment, the 5xe2x80x2- or 3xe2x80x2-flanking sequence of the insert nucleotide sequence may be preferably between about 30-120 bp in length, more preferably between about 40-90 bp in length, and most preferably between about 60-80 bp in length.
In another embodiment, a method is provided for generating a library of yeast expression vectors. The method comprises:
a) transforming into yeast cells
i) a linearized yeast expression vector having a 5xe2x80x2- and 3xe2x80x2-terminus sequence at a first site of linearization, and
ii) a library of first insert nucleotide sequences that are linear, double stranded, each of the first insert sequences comprising a first nucleotide sequence encoding a first polypeptide subunit, a 5xe2x80x2- and 3xe2x80x2-flanking sequence at the ends of the first insert sequence which are sufficiently homologous to the 5xe2x80x2- and 3xe2x80x2-terminus sequences of the vector at the first site of linearization, respectively, to enable homologous recombination to occur;
b) having homologous recombination occur between the vector and the first insert sequence in the transformed yeast cells, such that the first insert sequence is included in the vector;
c) isolating from the transformed yeast cells the vectors that contain the library of the first insert sequences;
d) linearizing the vectors containing the library of the first insert sequences to generate a 5xe2x80x2- and 3xe2x80x2-terminus sequence at a second site of linearization;
e) transforming into the transformed yeast cells
i) the linearized yeast expression vectors in step d), and
ii) a library of second insert nucleotide sequences that are linear, double stranded, each of the second insert sequences comprising a second nucleotide sequence encoding a second polypeptide subunit, a 5xe2x80x2- and 3xe2x80x2-flanking sequence at the ends of the second insert sequence which are sufficiently homologous to the 5xe2x80x2- and 3xe2x80x2-terminus sequences of the vector at the second site of linearization, respectively, to enable homologous recombination to occur; and
f) having homologous recombination occur between the linearized yeast expression vector at the second linearization site and the second insert sequences in the transformed yeast cells, such that the second insert sequence is included in the vector and the first and second nucleotide sequences are linked by a linker sequence.
The expression vectors formed by this method express the first polypeptide subunit, the second polypeptide subunit, and the linker polypeptide as a single fusion protein. Also, the first and second nucleotide sequences each independently varies within the library of expression vectors formed by this method.
According to the embodiment, the 5xe2x80x2- or 3xe2x80x2-flanking sequence of the insert nucleotide sequence are preferably between about 30-120 bp in length, more preferably between about 40-90 bp in length, and most preferably between about 60-80 bp in length.
In a variation of the above-described method, the diversity of the library of expression vectors formed by this method may be increased by chain shuffling via site-specific recombination. Accordingly, the method may further comprise: causing site-specific recombination between the members of the library of the yeast expression vectors at the 5xe2x80x2- and 3xe2x80x2-recombination sites, the recombination resulting in exchange of the first or second nucleotide sequences between the members of the library of the yeast expression vectors.
According to this variation, the 5xe2x80x2- and 3xe2x80x2-flanking sequences at the ends of the first or second insert nucleotide sequence comprise a 5xe2x80x2- and 3xe2x80x2-recombination site, respectively, that are recognized by a site-specific recombinase.
Also according to the variation, the 5xe2x80x2- and 3xe2x80x2-site-specific recombination sites are preferably different site-specific recombination sites, more preferably sites which are each independently selected from the group consisting of SEQ ID Nos: 1-13, most preferably loxP of coliphase P1, and the other being a mutant IoxP sequence.
Also according to this variation, the site-specific recombinase may be constitutively or inducibly expressed in the yeast cells. The site-specific recombinase may be CRE recombinase that cause the site-specific recombination.
In yet another aspect of the present invention, methods are provided for selecting tester proteins capable of binding to a target peptide, protein, or DNA.
In one embodiment where the target molecule is a target peptide or protein, the method comprise: expressing a library of tester proteins in yeast cells, each tester protein being a fusion protein comprised of a first polypeptide subunit whose sequence varies within the library, a second polypeptide subunit whose sequence varies within the library independently of the first polypeptide, and a linker peptide which links the first and second polypeptide subunits; expressing one or more target fusion proteins in the yeast cells expressing the tester proteins, each of the target fusion proteins comprising a target peptide or protein; and selecting those yeast cells in which a reporter gene is expressed, the expression of the reporter gene being activated by binding of the tester fusion protein to the target fusion protein.
According to this embodiment, expression of the reporter gene may be activated by a functional transcription activator being formed by the binding of the tester protein to the target peptide or protein as in a yeast two-hybrid system.
According, in a variation of the embodiment involving the yeast two-hybrid system, the step of expressing the library of tester fusion proteins may include transforming a library of tester expression vectors into the yeast cells which contain a reporter construct comprising the reporter gene whose expression is under transcriptional control of a transcription activator comprising an activation domain and a DNA binding domain. Each of the tester expression vectors comprises a first transcription sequence encoding either the activation domain or the DNA binding domain of the transcription activator, a first nucleotide sequence encoding the first polypeptide subunit, a second nucleotide sequence encoding the second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. Optionally, the step of expressing the target fusion proteins includes transforming a target expression vector into the yeast cells simultaneously or sequentially with the library of tester expression vectors. The target expression vector comprises a second transcription sequence encoding either the activation domain or the DNA binding domain of the transcription activator which is not expressed by the library of tester expression vectors; and a target sequence encoding the target protein or peptide.
In another variation of the embodiment involving the yeast two-hybrid system, the steps of expressing the library of tester fusion proteins and expressing the target fusion protein includes causing mating between first and second populations of haploid yeast cells of opposite mating types. The first population of haploid yeast cells comprises a library of tester expression vectors for the library of tester fusion proteins. Each of the tester expression vector comprises a first transcription sequence encoding either the activation domain or the DNA binding domain of the transcription activator, a first nucleotide sequence encoding the first polypeptide subunit, a second nucleotide sequence encoding the second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. The second population of haploid yeast cells comprises a target expression vector. The target expression vector comprises a second transcription sequence encoding either the activation domain or the DNA binding domain of the transcription activator which is not expressed by the library of tester expression vectors; and a target sequence encoding the target protein or peptide. Either the first or second population of haploid yeast cells comprises a reporter construct comprising the reporter gene whose expression is under transcriptional control of the transcription activator.
In this variation, the haploid yeast cells of opposite mating types may preferably be xcex1 and a type strains of yeast. The mating between the first and second populations of haploid yeast cells of xcex1 and a type strains may be conducted in a rich nutritional culture medium.
Optionally, a plurality of target fusion protein may be expressed and screened against the library of tester proteins at the same time. According to this variation, the steps of expressing the library of tester fusion proteins and expressing the plurality of the target fusion proteins include causing mating between first and second populations of haploid yeast cells of opposite mating types. The first population of haploid yeast cells comprises a library of tester expression vectors for the library of tester fusion proteins. Each of the tester expression vector comprises a first transcription sequence encoding either the activation domain or the DNA binding domain of the transcription activator, a first nucleotide sequence encoding the first polypeptide subunit, a second nucleotide sequence encoding the second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. The second population of haploid yeast cells comprises a plurality of target expression vectors. The target expression vectors comprise a second transcription sequence encoding either the activation domain or the DNA binding domain of the transcription activator which is not expressed by the library of tester expression vectors; and a target sequence encoding the target protein or peptide. Either the first or second population of haploid yeast cells comprises a reporter construct comprising the reporter gene whose expression is under transcriptional control of the transcription activator.
According to this variation, the haploid yeast cells of opposite mating types may preferably be xcex1 and a type strains of yeast. The mating between the first and second populations of haploid yeast cells of xcex1 and a type strains may be conducted in a rich nutritional culture medium.
Also according to this variation, members of the library of tester expression vectors may be arrayed as individual yeast clones in one or more multiple-well plates.
Also according to this variation, the plurality of the target expression vectors may be arrayed as individual yeast clones in one or more multiple-well plates.
Also according to this variation, the mating may be based on clonal mating in which each yeast clone containing a members of the tester expression vectors is mated individually with each of the plurality of target expression vectors.
Also according to this variation, the plurality of the target expression vectors may be a library of expression vectors containing a collection of human EST clones or a collection of domain structures.
According to any of the above-described methods for selecting protein-protein binding pairs, the target fusion protein comprises an antigen associated with a disease state such as a tumor-surface antigen. Optionally, the target fusion protein may comprises a human growth factor receptor such as epidermal growth factors, transferrin, insulin-like growth factor, transforming growth factors, interleukin-1, and interleukin-2.
In another embodiment, a method is provided for screening protein-DNA binding pairs in a yeast one-hybrid system.
The method comprises: expressing a library of tester fusion proteins in yeast cells which contain a reporter construct comprising a reporter gene whose expression is under a transcriptional control of a target DNA sequence; and selecting the yeast cells in which the reporter gene is expressed, the expression of the reporter gene being activated by binding of the tester fusion protein to the target DNA sequence. Each of the tester fusion proteins comprises an activation domain of a transcription activator, a first polypeptide subunit whose sequence varies within the library, a second polypeptide subunit whose sequence varies within the library independently of the first polypeptide subunit, and a linker peptide that links the first polypeptide subunit to the second polypeptide subunit.
In a variation of the embodiment, the step of expressing the library of tester fusion proteins includes transforming into the yeast cells a library of tester expression vectors for the library of tester fusion proteins. Each of the tester expression vectors comprises a transcription sequence encoding the activation domain of the transcription activator, a first nucleotide sequence encoding the first polypeptide subunit, a second nucleotide sequence encoding the second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence.
In another variation of the embodiment, the step of expressing a library of tester fusion proteins in yeast cells includes causing mating between a first and second populations of haploid yeast cells of opposite mating types. The first population of haploid yeast cells comprises a library of tester expression vectors for the library of tester fusion proteins, each tester expression vector comprising a transcription sequence encoding the activation domain of the transcription activator, a first nucleotide sequence encoding the first polypeptide subunit, a second nucleotide sequence encoding the second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. The second population of haploid yeast cells comprises the reporter construct.
According to the variation, the haploid yeast cells of opposite mating types may preferably be xcex1 and a type strains of yeast. The mating between the first and second populations of haploid yeast cells of xcex1 and a type strains is preferably conducted in a rich nutritional culture medium.
According to any of the above-described methods for selecting protein-DNA binding pairs, the target DNA sequence in the reporter construct is preferably positioned in 2-6 tandem repeats 5xe2x80x2 relative to the reporter gene.
The target DNA sequence in the reporter construct is preferably between about 15-75 bp in length and more preferably between about 25-55 bp in length.
In yet another embodiment, a method is provided for screening protein-protein binding pairs in a yeast one-hybrid system. The method comprises: expressing a library of tester fusion proteins in yeast cells which contain a reporter construct comprising a reporter gene whose expression is under a transcriptional control of a specific DNA binding site; expressing a target protein in the yeast cells expressing the tester fusion proteins, where the target protein binds to the specific DNA binding site; and selecting the yeast cells in which the reporter gene is expressed, the expression of the reporter gene being activated by binding of the tester fusion protein to the target protein. Each of the tester fusion proteins comprises an activation domain of a transcription activator, a first polypeptide subunit, a second polypeptide subunit, and a linker peptide that links the first polypeptide subunit to the second polypeptide subunit, wherein the sequences of the first and second polypeptide subunits each independently varies within the library of the tester fusion protein.
In a variation of the embodiment, the step of expressing the library of tester fusion proteins includes transforming into the yeast cells a library of tester expression vectors for the library of tester fusion proteins. Each of the tester expression vectors comprises a transcription sequence encoding the activation domain of the transcription activator, a first nucleotide sequence encoding the first polypeptide subunit, a second nucleotide sequence encoding the second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence.
In another variation of the embodiment, the steps of expressing the library of tester fusion proteins and expressing the target fusion protein includes causing mating between a first and second populations of haploid yeast cells of opposite mating types. The first population of haploid yeast cells comprises a library of tester expression vectors for the library of tester fusion proteins. Each of the tester expression vectors comprises a transcription sequence encoding the activation domain of the transcription activator, a first nucleotide sequence encoding the first polypeptide subunit, a second nucleotide sequence encoding the second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. The second population of haploid yeast cells comprises a target expression vector comprising a target sequence encoding the target protein. Either the first or second population of haploid yeast cells comprises the reporter construct.
In any of the above-described methods for selecting tester proteins capable of binding to a target peptide, protein, or DNA, the method may further comprise isolating the tester expression vectors from the selected yeast cells; and mutagenizing the first and second nucleotide sequences in the isolated tester expression vectors to form a library of mutagenized expression vectors.
Examples of mutagenesis methods include, but are not limited to, error-prone PCR mutagenesis, site-directed mutagenesis, DNA shuffling and combinations thereof. The library of mutagenized expression vectors may be screened against the same or different target peptide, protein or DNA by following similar procedures used for screening the tester expression vectors.
In yet another aspect of the present invention, methods are provided for producing a library of single chain antibodies. In an embodiment, the method comprises: expressing in yeast cells a library of yeast expression vectors. Each of the yeast expression vector comprises a first nucleotide sequence encoding an antibody heavy chain variable region, a second nucleotide sequence encoding an antibody light chain variable region, and a linker sequence encoding a linker peptide that links the antibody heavy chain variable region and the antibody light chain variable region. The antibody heavy chain variable region, the antibody light chain variable region, and the linker peptide are expressed as a single fusion protein. Also, the first and second nucleotide sequences each independently varies within the library of expression vectors to generate a library of single-chain antibodies with a diversity of at least 106.
According to the embodiment, the diversity of the library of single-chain antibodies is preferably between 106-1016, more preferably between 108-1016, and most preferably between 1010-1016.
In yet another aspect of the present invention, a kit is provided for selecting selecting tester proteins capable of binding to a target peptide, protein, or DNA.
In an embodiment, the kit comprises: a library of tester expression vectors and a yeast cell line. Each of the tester expression vectors comprises a first transcription sequence encoding either an activation domain or a DNA binding domain of a transcription activator, a first nucleotide sequence encoding a first polypeptide subunit, a second nucleotide sequence encoding a second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. The first and second nucleotide sequences each independently varies within the library of expression vectors. A reporter construct may be contained in the yeast cell line. The reporter construct comprises a reporter gene whose expression is under a transcriptional control of a specific DNA binding site.
Optionally, the kit may further comprise a target expression vector which comprises a second transcription sequence encoding either the activation domain or the DNA binding domain of the transcription activator which is not expressed by the library of tester expression vectors; and a target sequence encoding the target protein or peptide.
In another embodiment, the kit comprises: a first and second populations of haploid yeast cells of opposite mating types. The first population of haploid yeast cells comprises a library of tester expression vectors for the library of tester fusion proteins. Each of the tester expression vector comprises a first transcription sequence encoding either an activation domain or a DNA binding domain of a transcription activator, a first nucleotide sequence encoding a first polypeptide subunit, a second nucleotide sequence encoding a second polypeptide subunit, and a linker sequence encoding a linker peptide that links the first nucleotide sequence and the second nucleotide sequence. The second population of haploid yeast cells comprises a target expression vector. The target expression vector encodes either the activation domain or the DNA binding domain of the transcription activator which is not expressed by the library of tester expression vectors; and a target sequence encoding the target protein or peptide. Either the first or second population of haploid yeast cells comprises a reporter construct comprising a reporter gene whose expression is under transcriptional control of the transcription activator.
Optionally, the second population of haploid yeast cells comprises a plurality of target expression vectors. Each of the target expression vectors encodes either the activation domain or the DNA binding domain of the transcription activator which is not expressed by the library of tester expression vectors; and a target sequence encoding the target protein or peptide. Either the first or second population of haploid yeast cells comprises a reporter construct comprising a reporter gene whose expression is under transcriptional control of the transcription activator.
According to any of the above-described compositions, methods and kits, the diversity of the first and/or the second polypeptide subunit encoded by the first and second nucleotide sequences within the library of expression vectors is preferably between 103-108, more preferably between 104-108, and most preferably between 105-108.
Also according to any of the above-described compositions, methods and kits, the diversity of the fusion proteins encoded by the library of expression vectors may be preferably at least 106-108, more preferably at least 109-1018  and most preferably at least 1010-1018.
Also according to any of the above-described compositions, methods and kits, the diversities of the first and second polypeptide subunits may be each independently derived from libraries of precursor sequences that are not specifically designed for the target peptide or protein.
Also according to any of the above-described compositions, methods and kits, the diversities of the first and second polypeptide subunits optionally are not derived from one or more proteins that are known to bind to the target peptide or protein.
Also according to any of the above-described compositions, methods and kits, the diversities of the first and second polypeptide subunits optionally are not generated by mutagenizing one or more proteins that are known to bind to the target peptide or protein.
Also according to any of the above-described compositions, methods and kits, the first and the second polypeptide subunits may be subunits of a multimeric protein whose sequence varies within a library of multimeric proteins. Examples of multimeric proteins include, but are not limited to, growth factor receptors, T cell receptors, cytokine receptors, tyrosine kinase-associated receptors, and MHC proteins.
Also according to any of the above-described compositions, methods and kits, the first nucleotide sequence may be 5xe2x80x2 relative to the second nucleotide sequence. The first nucleotide sequence in the library of expression vectors comprises a coding sequence of an antibody heavy-chain variable region, and the second nucleotide sequence comprises a coding sequence of an antibody light-chain variable region. The source of the coding sequences of the antibody light-chain and heavy-chain variable regions may be from human, non-human primate, or rodent. Optionally, the source of the coding sequences of the antibody light-chain and heavy-chain variable regions may be from one or more non-immunized animals. Preferably, the source of the coding sequences of the antibody light-chain and heavy-chain variable regions may be from human fetal spleen, lymph nodes or peripheral blood cells.
Also according to any of the above-described compositions, methods and kits, the linker peptides expressed by the library of expression vectors may provide a substantially conserved conformation between the first and second polypeptide subunits across the fusion proteins expressed by the library of expression vectors. This may be achieved by having the sequence of the linker peptides be substantially conserved across the library.
Also according to any of the above-described compositions, methods and kits, the conformation of the fusion protein having the first and second polypeptide subunits linked by the linker peptide may mimic a conformation of a single chain antibody. This may be achieved by selection of a linker peptide sequence comprising a Gly-Gly-Gly-Gly-Ser peptide in 3 or 4 tandem repeats.
Also according to any of the above-described compositions, methods and kits, the linker sequences in the library of expression vectors is preferably between 30-120 bp in length, more preferably between 45-102 bp in length, and most preferably between 45-63 bp in length. The linker sequences in the library of expression vectors may optionally comprise a nucleotide sequence encoding an amino acid sequence of Gly-Gly-Gly-Gly-Ser in 3 or 4 tandem repeats.
Also according to any of the above-described compositions, methods and kits, each of the expression vectors may further comprise a sequence encoding an affinity tag. Examples of affinity tags include, but are not limited to, polyhistidine tags, polyarginine tags, glutathione-S-transferase, maltose binding protein, staphylococcal protein A tag, and EE-epitope tags.
Also according to any of the above-described compositions, methods and kits, the transcription activator may be any transcription activator having separable DNA-binding and transcriptional activation domains. Examples of transcription activators include, but are not limited to, GAL4, GCN4, and ADR1 transcription activators.
Also according to any of the above-described compositions, methods and kits, the reporter protein encoded by the reporter gene may be any reporter gene, expression of which shows a distinct genotype or phenotype in a cell. Examples of such a reporter protein include, but are not limited to, xcex2-galactosidase, xcex1-galactosidase, luciferase, xcex2-glucuronidase, chloramphenicol acetyl transferase, secreted embryonic alkaline phosphatase, green fluorescent protein, enhanced blue fluorescent protein, enhanced yellow fluorescent protein, and enhanced cyan fluorescent protein.