The DNA and protein sciences have made great strides over the past two decades. Researchers have accomplished the previously unthinkable by sequencing the entire genomes of several microorganisms. The genomes of several higher eukaryotes, including mammals, are nearly completely sequenced and available on a variety of databases. Although sequenced, much of the genome remains a mystery as to the function of the gene products. An important area of genetic research is matching nucleotide sequences with a cellular function or activity.
Similarly, researchers have developed technology to enable rapid determination of the amino acid sequence of a selected protein or peptide. Peptide sequencing reactions that previously took days to accomplish can now be completed in mere hours, the results of which are presented in a meaningful format. Large amounts of information regarding nucleic acid sequences and amino acid sequences have been entered into databases around the world. This rapid dissemination of information has enabled those in the art to associate a function with proteins that have similar functions in different species of organisms. It is further possible to associate a function to short amino acid motifs of these proteins. Functional motifs, or nucleic acid sequences or amino acid sequences commonly associated with a particular function, can aid in predicting the function or activity of a peptide. For example, many proteins associated with carbohydrate metabolism may comprise a similar active site. An unknown protein that also comprises the amino acid sequence of the identified active site might be predicted to be involved in carbohydrate metabolism. However, proposed functional motifs can vary in activity depending on surrounding sequences, location of the peptide in a cell, and the type of host cell, thereby complicating any assumptions regarding peptide function.
Potential use of the sequence information collected to date is limitless if links between genetic sequence and cell function can be established. Therapeutic peptides, disease indicators, regulatory mechanisms, and the like are waiting to be discovered in the sequence code. In addition, the compiled sequence information may also provide insight into the functional relationships between gene products. Such information will be sought after as biotechnology companies search for combinations of biological factors to be used as treatment modalities in the next generation of gene therapy products.
In order to capitalize on the seemingly endless supply of sequenced genomes, researchers have developed genetic libraries that can be screened to associate a nucleic acid sequence with a protein or peptide or cellular function. In many instances, detection involves hybridizing to the unknown DNA sequence a probe specific for a desired sequence. Yet, as discussed above, peptide function cannot be accurately predicted by the mere presence of motifs. Alternatively, nucleic acid sequences are incorporated into a vector and introduced into a host cell. The gene product encoded by the nucleic acid is expressed and detected. Often, screening is accomplished in vitro (see, for example, DeGraaf et al., Gene, 128 (1), 13–17 (1993)). Some methods of screening proteins or peptides involve the formation of fusion proteins. However, incorporating, for example, a marker peptide with an unknown peptide of interest may interfere with the normal functioning of both peptides.
Other methods of using libraries require a physical association between the peptide of interest and the nucleic acid that encodes the peptide. For example, U.S. Pat. No. 5,270,170 (Schatz et al.) describes a method of generating and screening random peptides comprising putative ligands that bind to target receptor molecules. A random peptide library is prepared such that the random peptide is expressed as a fusion protein comprising the random peptide and a DNA binding protein. The DNA binding protein will bind to the recombinant DNA expression vector that encodes the fusion product containing the peptide of interest. Therefore, once a peptide is identified, the corresponding expression vector is readily available. However, the nucleic acid molecule that encodes the fusion product must comprise a binding site for the DNA binding protein. Furthermore, fusion of a random peptide with a DNA binding protein can interfere with the functioning of the random peptide by altering or blocking active sites, disrupting protein folding, and the like. Phage peptide display libraries are also used to express and screen proteins for binding to a target molecule. Phage display libraries have been used to screen proteins in vitro by association of the expressed peptide with a target ligand. However, the utility of phage display libraries to associate function with a genetic sequence in vitro is limited in that few target molecules have been identified, much less successfully expressed in their native conformation. Phage display libraries also have been utilized to identify peptides in vivo (see, for example, U.S. Pat. No. 5,622,699 (Ruoslahti et al.)). Yet, gene products identified by function in the context of phage may not necessarily have similar function or activity in other contexts or environments. For example, phage have limited utility in screening in vitro and in vivo for ligands that are efficiently internalized within a cell.
Aside from the technical difficulties associated with screening library-encoded gene products, construction of genetic libraries as described in the prior art is time and labor intensive. For example, methods that require formation of fusion proteins necessitate an understanding of the nucleic acid sequence encoding the random peptide. Extensive manipulation of the nucleic acid sequence is also required. Moreover, the level of transduction efficiency for vectors commonly used to generate genetic libraries, such as plasmids, is low, further complicating the expression and screening of encoded gene products. Vectors with greater transduction efficiency are routinely constructed via homologous recombination. Yet, homologous recombination is time consuming and difficult to perform in large scale. If a skilled artisan attempts to construct expression vectors using homologous recombination and does not create the desired vector, the artisan will not be able to readily distinguish between the need to modify the construction technique and the possibility that the vector is not viable or does not encode the selected function. When working with a multiplicity of genetic elements encoding unknown products, this dilemma is even more complicated. Current vector construction techniques lack the flexibility and means of selection required to efficiently produce expression vectors used to generate a genetic library.
In view of the above, there is a need in the art for an efficient and reliable method for using a genetic library to associate functions to nucleic acid sequences. In particular, there remains a need for a method of identifying functionally-related coding sequences. There also remains a need in the art for a reliable method for constructing a genetic library. The present invention seeks to satisfy at least some of these needs. The present invention is directed to multi-gene genetic libraries and methods of identifying functionally-related coding sequences. The present invention is further directed to a method of constructing a genetic library comprising or consisting of a multiplicity of vectors. These and other advantages of the present invention, as well as additional inventive features, will be apparent from the description of the invention provided herein.