The present invention relates generally to screening of mixed populations of organisms and more specifically to sequence-based profiling of environmental samples.
A central core of modem biology is that genetic information resides in a nucleic acid genome, and that the information embodied in such a genome (i.e., the genotype) directs cell function. This occurs through the expression of various genes in the genome of an organism and regulation of the expression of such genes. The expression of genes in a cell or organism defines the cell or organism""s physical characteristics (i.e., its phenotype). This is accomplished through the translation of genes into proteins.
In order to more fully understand and determine potential therapeutics, antibiotic and biologics for various organisms, efforts have been taken to sequence the genomes of a number of organisms. For example the Human Genome Project began with the specific goal of obtaining the complete sequence of the human genome and determining the biochemical function(s) of each gene. To date, the project has resulted in sequencing a substantial portion of the human genome (J. Roach, available on the internet at weber.u.Washington.edu/xcx9croach/human_genome_progress2.html) (Gibbs, 1995). At least twenty-one other genomes have already been sequenced, including, for example, M. genitalium (Fraser et al., 1995), M. jannaschii (Bult et al., 1996), H. influenzae (Fleischmann et al., 1995), E. coli (Blattner et al., 1997), and yeast (S. cerevisiae) (Mewes et al., 1997). Significant progress has also been made in sequencing the genomes of model organism, such as mouse, C. elegans, Arabadopsis sp. and D. melanogaster. Several databases containing genomic information annotated with some functional information are maintained by different organizations, and are accessible via the internet, for example, at the addresses tigr.org/tdb; genetics.wisc.edu; stanford.edu/xcx9cball; hiv-web.lanl.gov; ncbi.nlm.nih.gov; ebi.ac.uk; Pasteur.fr/other/biology; and genome.wi.mit.edu. The raw nucleic acid sequences in a genome can be converted by one of a number of available algorithms to the amino acid sequences of proteins, which carry out the vast array of processes in a cell. Unfortunately, these raw protein sequence data do not immediately describe how the proteins function in the cell nor their relationship and role in biological samples. Understanding the details of various cellular processes (e.g., metabolic pathways, signaling between molecules, cell division, etc.) and which proteins carry out which processes, is a central goal in modern cell biology.
Accordingly, determining the organism, protein and nucleic acid sequence profiles present in an environmental sample can provide valuable information about the role of these organisms or proteins in the environments. In addition, such information can help in the development of biologics, diagnostics, therapeutics, and compositions for industrial applications.
The present invention overcomes many of the problems in the art by providing a method of obtaining a nucleic acid profile of a sample, by obtaining a plurality of nucleic acid sequences from the sample, wherein the sample includes a mixed population of organisms. The method includes creating a DNA library from the plurality of nucleic acid sequences and sequencing at least one clone in the DNA library. The sequence information is used to perform a database search using an algorithm to compare the sequence of the at least one clone with a database contains a plurality of nucleic acid sequences from a plurality of organisms and identifying sequences in the database which have homology to the at least one clone. This is performed repetitively as needed to obtain a nucleic acid profile of the sample. In one embodiment, the mixed population of organisms can be derived from uncultivated or cultivated microorganisms, such as those in an environmental sample. In another embodiment, the nucleic acids can be RNA, DNA (e.g., genomic DNA or fragments thereof).
The present invention also provides a method of obtaining a nucleic acid profile of a sample, by obtaining a plurality of nucleic acid sequences from the sample, wherein the sample includes a mixed population of plants. The method includes creating a DNA library from the plurality of nucleic acid sequences and sequencing at least one clone in the DNA library. The sequence information is used to perform a database search using an algorithm to compare the sequence of the at least one clone with a database contains a plurality of nucleic acid sequences from a plurality of organisms and identifying sequences in the database which have homology to the at least one clone. This is performed repetitively as needed to obtain a nucleic acid profile of the sample. In one embodiment, the mixed population of plants can be derived from uncultivated or cultivated plants, such as those in an environmental sample. In another embodiment, the nucleic acids can be RNA, DNA (e.g., genomic DNA or fragments thereof).