1. Field of the Invention
The present invention relates generally to nucleic acid arrays and to methods for designing and using nucleic acid arrays. In particular, the present invention relates to informative nucleic acid arrays and methods for making the same, with an emphasis on the selection criteria for the individual gene sequences to be included in informative nucleic acid arrays.
2. Description of the Related Art
In general, most of the cells of an organism contain the same gene sequences. In eucaryotic organisms (i.e., those having a nucleus) the number of individual genes typically is in the range of tens of thousands of genes. All of these genes, however, are not used or expressed by all cells all of the time. Some genes are activated, or expressed, only at a specific time, at a specific level, at a specific developmental stage, and/or in a specific cellular, physiologic, and/or tissue context. Determining when a gene is expressed, and what causes the gene to be expressed, may be key in better understanding the effects of various agents on cellular responses (e.g., potential pharmaceuticals, toxins, chemicals, temperature, pressure, or electromagnetic radiation, etc.). In addition, determining when a gene is expressed may yield a better understanding of the effects of various normal or variant genes on disease pathogenesis (e.g., induced or hereditary disorders, etc.).
To this end, a variety of molecular biology methods for classifying, indexing, and/or quantifying nucleic acid with regard to gene expression have been proposed. For example, U.S. Pat. No. 5,707,807 relates to a method for classifying cDNA that has been reverse-transcribed from tissue- or cell-derived RNA. The method is also applicable to the search and isolation of genes of physiologically active substances that are potential pharmaceuticals or causative genes of hereditary diseases, as well as the isolation of those genes that are useful for improving agricultural products.
Another example of gene expression analysis is in U.S. Pat. No. 5,968,784. This patent relates to a method for tagging and identifying all of the expressed genes in a given cell population. By comparing gene expression profiles among cells, the method may be used to identify individual genes whose expression is associated with a pathological phenotype. U.S. Pat. No. 5,968,784 also relates to methods for identifying gene expression patterns in an mRNA population to identify differential gene expression patterns among two or more cells or tissues.
Another useful contemporary methodology for analyzing simultaneously a plurality of genes for gene expression levels utilizes nucleic acid arrays (or microarrays or macroarrays, hereinafter collectively referred to as arrays). DNA arrays typically consist of hundreds to thousands of immobilized DNA sequences present on a surface of an object the size of a business card or smaller. The nucleic acids for the selected individual gene sequences are immobilized on the surfaces of nylon filters, glass, plastic, or gene chips, etc. Robotic technologies may be employed in the production of arrays. Labeled probe samples are prepared from RNA from biological samples. The probes are hybridized to the immobilized nucleic acids on the arrays, and a detector instrument collects the intensities of hybridization of the bound labeled probe sample to the individual gene sequences. Then, computer software typically analyses the results. For example, U.S. Pat. No. 5,807,522 relates to a method and apparatus for forming arrays of biological samples on a support in an automated fashion. U.S. Pat. No. 5,922,617 relates to a product having arrays of samples in tracks, wherein light emitting labels are excited and emitted light is detected. For example, the arrays may be located in circular tracks on a compact disc-like support. In addition, U.S. Pat. No. 5,922,617 relates to a reader for determining the occurrence of events on the array.
With regard to the methodologies for selecting individual gene sequences for inclusion on arrays, currently available gene expression arrays typically consist of one of two types of gene selection methods. In one case, gene sequences that are merely available to the designer are immobilized on an array, regardless of any biological significance of the selected gene sequences or as the result of experimentation. In the other case, gene sequences that are expressed in one cell type or tissue type are chosen for inclusion on the array. Both of these conventional gene selection methodologies produce general utility arrays that have limited xe2x80x9cinformativexe2x80x9d capabilities. This disadvantage is especially pronounced if the general arrays contain a relatively small number of immobilized gene sequences, thus reducing the likelihood that one will discover differentially expressed gene sequences during experimentation. Although arrays are becoming more widely known and used, very little attention has been paid to the creation or use of informative nucleic acid arrays.
Thus, a need has arisen for informative nucleic acid arrays, and for methods for selecting the individual gene sequences for inclusion on the informative arrays, and for making the same.
As embodied and broadly described herein, the present invention is directed to methods for making and using informative nucleic acid arrays (e.g., DNA including cDNA, RNA, PNA) for research and other applications in various disciplines or areas of interest. Examples of such disciplines include, without limitation, dermatology, pharmacology, toxicology, oncology, gynecology, urology, gastroenterology, as well as studies of sentinel gene discovery, signature gene discovery, mechanism of action, drug screening, drug metabolism, etc. The informative nucleic acid arrays of the present invention may contain only the gene sequences that are of interest in a particular area of interest or application, and may exclude other gene sequences.
According to one embodiment of the present invention, a method for identifying genes that are differentially expressed between dissimilar biological samples for use in an informative array is disclosed. The method includes the steps of (1) providing a first set of heterogeneous nucleic acid probes derived from a first biological sample; (2) providing a second set of heterogeneous nucleic acid probes derived from a second biological sample wherein the first and second biological samples are different and have a common biological process; (3) hybridizing a nucleic acid array comprising a plurality of sequences derived from genes of the biological process with the first set of probes and determining a first level of expression for sequences of the array; (4) hybridizing the array with the second set of probes and determining a second level of expression for sequences of the array; and (5) identifying a plurality of genes that are differentially expressed in the biological process by comparing the first level of expression with the second level of expression for hybridized sequences. The biological process may include the processes of adsorption, distribution, metabolism, and excretion of drugs, toxins and chemicals, and the biological process may affect the behavior of endogenous or exogenous chemicals in cells. The biological process may also be related to dermatology, pharmacology, toxicology, pathology, oncology, gastroenterology, urology, or gynecology.
In one embodiment, the first and second biological samples may originate from skin, skin appendages, oral tissue, gastrointestinal tissue, neural tissue, renal tissue, hepatic tissue, and/or urogenital tissue.
In one embodiment, a database may be created that includes the differentially expressed genes identified by the method above. In another embodiment, an informative array of sequences on a solid support may be created including sequences that are derived from the differentially expressed genes identified by the method above.
In another embodiment of the present invention, a method for selecting genes for an informative nucleic acid array for a biological process is provided. The method includes the steps of (1) identifying genes that are differentially expressed in the biological process; (2) establishing a ranking of the differentially expressed genes, wherein genes having a moderate level of expression are ranked over genes having a lower level of expression and genes having a higher level of expression; and (3) placing sequences derived from ranked differentially expressed genes on the informative array. The genes may be expressed in the biological process.
In one embodiment, the step of identifying may include the steps of (1) providing a first set of nucleic acid probes derived from a first biological sample; (2) providing a second set of nucleic acid probes derived from a second biological sample, wherein the second biological sample is dissimilar from the first biological sample; (3) hybridizing the first set of probes to the microarray and determining a first level of expression for hybridized genes; (4) hybridizing the second set of probes to the microarray and determining a second level of expression for hybridized genes; and (5) identifying genes that are differentially expressed by comparing the first level of expression with the second level of expression for hybridized genes.
According to another embodiment of the present invention, a method for converting a nucleic acid array into an informative array is disclosed. The method includes the steps of (1) providing a first set of heterogeneous nucleic acid probes derived from a first biological sample; (2) providing a different, second set of heterogeneous nucleic acid probes derived from a second biological sample; (3) hybridizing a nucleic acid array comprising a plurality of sequences with the first set of probes and determining a first level of expression for sequences of the array; (4) hybridizing the array with the second set of probes and determining a second level of expression for sequences of the array; (5) identifying a plurality of genes that are differentially expressed in the biological process by comparing the first level of expression with the second level of expression for hybridized sequences; and (6) selecting moderately expressed genes from the plurality of identified differentially expressed genes for inclusion on the informative array.
According to yet another embodiment of the present invention, an informative nucleic acid array for gene expression analysis is disclosed. The informative nucleic acid array includes sequences derived from genes of cells that are substantially relevant to a biological process; sequences derived from a plurality of differentially expressed genes that are relevant to the biological process as identified by the method for identifying genes that are differentially expressed between dissimilar biological samples for use in an informative array discussed above; and a platform for the sequences which is selected from the group consisting of a filter surface, a glass surface, a plastic surface, a solid bead surface, and a gene chip surface.
It is a technical advantage of the present invention for informative arrays to contain immobilized gene sequences that have been carefully selected from a larger set of candidate genes. It is another technical advantage of the present invention for informative arrays to increase the likelihood that the gene sequences immobilized on it will be more informative (e.g., differentially expressed) in a desired application, relative to a general array lacking a similar level of informative potential. It is another technical advantage of the present invention for informative arrays to increase the likelihood of identifying biomarkers, consisting of differentially expressed genes. Furthermore, it is another technical advantage of the present invention for informative arrays to permit reduction in the total number of gene sequences immobilized on the informative array. A reduction in the number of gene sequences may result in a reduction in the size of the informative array, due to the exclusion of non-informative genes from the list of candidate genes during the gene selection process.