Genome sequencing
Although biological science finds its roots in a grand tradition of exploratory investigation, for many years, basic research in biology and medicine has focused on a constructionist approach. With the advent of powerful manipulative techniques in molecular biology, most researchers in recent decades have focused on constructing new biological “scenarios” rather than merely observing existing systems. They have done this by perturbing various parameters of otherwise naturally-occurring systems and observing the effect on system dynamics, functional characteristics, etc.
The federally sponsored Human Genome Project (HGP) has recently re-legitimized the exploratory approach for life scientists. The new availability of complete genome sequence information for a variety of species has motivated many large new projects focused entirely on “mining” these data in order to learn more about the basic functions of biological structures and their development through time.
Early progress in the HGP took a directed approach. The federally funded sequencing centers concentrated on the targeted sequencing of specific important genes, working out the gene sequence from start to finish. This approach promised a long and difficult road to completing the entire genome.
Craig Venter, a former NIH researcher, advocated taking a different approach. His idea was rather to take the approach of splitting up the entire genome into small fragments and working on them en masse. This involved dividing the sequencing task among many automatic sequencing machines and attacking the task in parallel, with large numbers of short sequences being determined, and then proceeding to process more batches of the short fragments. Computer scientists then proceeded to reconstruct the fragments' proper order using algorithmic overlap-analysis methods first proposed by Leroy Hood. This method became called “shotgun sequencing” and although persistently derided by the established authorities in the HGP, it proved to be extremely effective in making rapid progress toward the goal of sequencing an entire genome. This work led to the joint announcement on Jun. 26, 2000 by Craig J. Venter, president of Celera Genomics and National Human Genome Research Institute director Francis S. Collins of completion of “the first survey of the entire human genome.” The “survey” is the “working draft” of the human genome produced by the publicly funded international consortium HGP and the “first assembly of the human genome” produced by privately funded Celera Genomics.
With the sequencing of the genome nearly complete, the major focus of research is changing. Since gene sequences code for amino acids, the basic building blocks for proteins, many molecular biologists feel that the best place to focus is on creating large libraries of the specific proteins that are coded for by the known genes in the genomic sequence. This field of research is referred to as proteomics [Pandey, A. and M. Mann, Nature, 405(6788):837-46 (2000)]. Other scientists are focused on the task of computational prediction of the 3-dimensional structure of protein molecules directly through analysis of the primary genomic sequences. This area of work is called structural genomics.
Still other scientists, recognizing that the ultimate goal for most life scientists is understanding biological function in normal and diseased states, are focusing more directly on the task of attempting to find specific correlations between gene systems and phenotypic patterns, linking gene sequences directly to clinically-relevant effects. This work is part of what is called functional genomics [Eisenberg, D., et al., Nature, 405(6788):823-6 (2000)]. Functional genomics begins with all available sequence information in pursuit of biological understanding [Lockhart, D. J. and E. A. Winzeler, Nature, 405(6788):827-36 (2000)].
A primary focus of functional genomics is gene expression analysis. This involves the use of a variety of techniques to detect the presence of mRNA sequences within specific tissues. This is done by taking advantage of an effect first observed by Southern, that of the tendency of free nucleotide sequence fragments to hybridize with their complementary mates (see [Southern, E. et al., Nat Genet, 21(1 Suppl):5-9 (1999)] for a recent review). By attaching these sequence fragments to solid supports, and by taking advantage of the binding of various marker molecules to solubilized mRNA, researchers are able to image specific gene expression activity.
cDNA Microarrays
Since that early work, DNA hybridization technology took a tremendous leap forward when the ability was provided to screen a broad spectrum of gene messages at once, through the use of cDNA microarrays [Eisen, M. B. and P. O. Brown, Methods Enzymol, 303:179-205 (1999); Brown, P. O. and D. Botstein, Nat Genet, 21(1 Suppl):33-7 (1999); Cheung, V. G., et al., Nat Genet, 21(1 Suppl):15-9(1999)]. “Gene chips” consist of a solid support to which is attached a regular array of DNA fragments. They are generally created through the use of a robotic system, which coordinates the laying down of a “raster” grid of the DNA probe fragments. The robot deposits this regular grid of pre-determined DNA sequence “spots” onto a fixed substrate, such as a specially-coated glass slide.
These broad-spectrum cDNA chips are organized so that a wide assortment of probes are arrayed in a geometric grid layout, so that the x,y grid coordinate of the grid can be used by a computer system to keep track of which probe is at each location.
The basic steps of a typical microarray analysis is as follows: 1) The tissue to be studied is selected and prepared for RNA extraction. This typically involves homogenization of the tissue to free into solution the desired macromolecules. 2) The mRNA is extracted using standard techniques and then is subjected to reverse transcription in order to produce complementary strands of cDNA molecules. 3) The cDNA molecules are usually synthesized using labeled nucleotides. Use of different labels allows for easy comparison of different mRNA populations. 4) The cDNA probes are then tested by hybridizing them to a DNA microarray. Arrays with more than 250,000 oligonucleotides or 10,000 different cDNAs per square centimeter can now be mass-produced [Lockhart, D. J. and E. A. Winzeler, Nature, 405(6788):827-36 (2000)]. 5) Finally, computer-based image acquisition, processing and analysis is used to quantitate the strength of fluorescent signal at each of the microarray grid locations, thereby providing evidence of the presence and concentration of mRNA corresponding to each of the genes associated with the microarray chip.
Laser Capture Microdissection
Since the gene expression activity of organs and tissues can be quite complex, it is desirable to use a technique which allows analysis of the gene expression, but which permits the morphologic localization of the area to be studied, thus avoiding the loss of morphological detail that results from the homogenization process. Laser capture microdissection (LCM) allows this to be done with great specificity [Bonner, R. F., et al., Science, 278(5342):1481, 1483 (1997); Cole, K. A. et al., Nat Genet, 21(1 Suppl):38-41 (1999); Emmert-Buck, M. R., et al., Science, 274(5289):998-1001 (1996)].
Microdissection-based gene expression analysis begins with the use of a nonaldehyde fixation of the tissue to be studied, using a fixative such as 70% ethanol, since aldehyde fixatives disrupt RNA structure. A low-temperature embedding medium, such as polyethylene glycol distearate, is used to embed the tissue in preparation for histological sectioning. Thin tissue sections are cut, at a thickness of 8 μm, for example, and then are mounted on uncovered glass slides. A thin membrane is typically applied to the section surface to prevent cross-contamination of macromolecules. A UV laser is then used to perform cold ablation of thin lines of tissue, creating an incision around a specific area of the tissue section without disturbing surrounding tissue. A specialized adhesive carrier film is used to transfer the incised portion of the tissue section to an eppendorf microfuge tube with lysis buffer. The cells are lysed in the buffer and can be used for mRNA analysis.
3D localization
The above microdissection technique has been used by Cole, et al. [Cole, K. A. et al., Nat Genet, 21(1 Suppl):38-41 (1999)], to study the cellular-level gene expression activity associated with prostate cancer. These investigators used serial-section histological techniques to precisely identify and then excise specific tumor cells within the prostate gland for microarray analysis of expression activity. The investigators then interactively annotated 3D volume reconstructions of gland section images to overlay expression data relating to the specific cells that had been micro dissected. It should be noted that this study focused on only small groups of specific tissue areas, since the microdissection approach requires a skilled operator and is extremely exacting work. Tissue that isn't used for expression analysis is stained for anatomical reconstruction of the gland architecture, rendering it unusable for further expression analysis. Since this approach is targeted to specific areas of the tissue, it is most useful for specifically targeted studies, and is poorly suited for survey-based exploratory analysis.
Volumetric reconstruction is well known for the macroscopic-level medical imaging techniques of MRI and CT scanning. These 3-dimensional raster-imaging techniques provide useful volumetric surveys for specific anatomical features, but are typically suited for imaging specific sorts of biologic activity. In order to increase the usefulness of these methods, various researchers investigated the combination of multiple imaging modalities, such as MRI and PET scanning, in order to take advantage of the anatomical structure imaging features of the MRI approach, while exploiting the functional data yielded by the PET scanning approach. These multiple datasets are sometimes superimposed upon the same 3-dimensional coordinate space in order to aid in visualization of the functional and structural details.
A similar capability can be provided at a microscopic histological level, through the use of multi-modal imaging of serial microscopic sections for 3D reconstruction and analysis. Alternating serial sections are placed on separate glass slides, with one set of alternating sections stained and coverslipped for histological detail, and the other set of adjacent alternating sections left uncovered for further processing. For each structure seen in a stained coverslipped section, the adjacent section could be easily processed using other techniques. This method is described in detail in Doyle [Doyle, M. D., The intraorgan lymphatic system of the rat left ventricle in normalcy and aging, Univ. of Illinois at Urbana-Champaign, University Microfilms, order number 9210786 (1991)], where it was used to coordinate light microscopic and electron microscopic examination of the three-dimensional aspects of tissue specimens.
Various tools are available for the interactive volume visualization of 3-D biomedical image data. One example is given by the MultiVIS client-server Internet-based distributed visualization system developed by Doyle, et al. [Doyle, M. et al., The Visible Embyro Project: A Platform for Spatial Genomics. in 28th AIPR Workshop: 3D Visualization for Data Exploration and Decision Making (2000); Doyle, M., et al., MultiVIS: A Web-based interactive remote visualization environment and navigable volume imagemap system. in 28th AJPR Workshop: 3D Visualization for Data Exploration and Decision Making (2000)] The MultiVIS system also is a good example of a system which allows for the mapping of both volume image data and other types of data, such as object identity information, onto a single x,y,z coordinate space. This system has been used for a variety of purposes, such as for providing an interactive online 3-D atlas of the Visible Human Project male dataset [Doyle, M., et al., MultiVIS: A Web-based interactive remote visualization environment and navigable volume imagemap system. in 28th AJPR Workshop: 3D Visualization for Data Exploration and Decision Making (2000)]. All the references listed in this paragraph are hereby incorporated by reference for all purposes.
Unsolved problems
Although the above-described existing technologies have enabled numerous advances in biomedical science and industry, there are several long-felt but unsolved needs for which a solution has not been obvious before the present invention. One need is to gather gene expression data in a manner that supports the types of exploratory research that can take advantage of the broad-spectrum types of biologic activity analysis enabled by today's microarray tools. Further, there is a serious need for methods to visualize the spatial distribution of the biologic activity of a wide range of genes, across a wide array of species and tissue types. There is a great need for technology to allow the collection of large volumes of these types of data, to enable exploratory investigations into patterns of biologic activity that may provide insights into both normal and abnormal biologic states. And there is certainly a need to correlate gene expression data with morphological structure in a useful and easy to understand manner, such as in a volume visualization environment.
Each of these needs is evident across all species and ages, however there is a particular need for these problems to be solved in order to enable researchers to make significant progress in the study of early development. Many breakthroughs in biomedical science will only occur through study of organism growth and development. Deciphering the delicate interplay between the spatial expression patterns of various genes and the timings of these biological events is among the most difficult of biomedical research questions. In order to solve such problems, tools are needed to allow the collection of larger volumes of expression data across a wider spectrum of gene types than ever before.