The present invention is in the field of bioinformatics, particularly as it pertains to determining the associations of biological elements. More specifically, the present invention relates to the determination of associations among a set of biological elements using an algorithm that is capable of generating a Steiner tree.
Recent advances across the spectrum of the biological sciences have allowed researchers to compile large amounts of biological data from a myriad of organisms. For example, advances in techniques for sequencing long stretches of genomic deoxyribonucleic acid (DNA) have allowed investigators to collect vast nucleic acid sequence data rapidly. Similarly, advances in RNA transcript profiling have facilitated the rapid acquisition of large amounts of data on the relative rates of transcription of genes in varying conditions.
The relationships among the discrete elements within the data collected, however, are often difficult to ascertain. For example, an RNA transcript profiling assay will often produce results that indicate that a set of genes is transcribed at a relatively high rate under a certain environmental condition. After acquisition of the data, however, the operative associations that resulted in the higher rate of transcription of the set of genes are often poorly understood.
The difficulty of determining the associations among biological elements is not limited to genes, however. For example, correlations among seemingly unrelated enzymes, enzymatic pathways, non-enzyme proteins, substrates, or other biological characteristics are often easier to demonstrate than to explain.
One conventional method for determining associations among a group of biological characteristics involves the use of graphs that show relationships among those biological characteristics (biological elements). These graphs are networks comprising vertices and edges. The vertices, which can be represented by discrete shapes such as circles, represent the biological elements. A relationship between any two of the biological elements is shown by connecting the two vertices that represent the two biological elements with edges, which can be represented as a line segment that connects the two vertices. A single vertex can be connected to multiple other vertices with multiple edges. Multiple vertices connected by multiple edges form a network.
FIG. 1 shows an illustrative graph of a simple network of vertices and edges generally at 10. A first vertex 12 is shown connected to a second vertex 14 with an edge 16. The vertices of the network 10 are labeled with the letters A through P for illustrative purposes.
Graphs such as the one shown in FIG. 1 have been described. Examples of graphs of enzymatic and genetic networks can be found in the Kyoto Encyclopedia of Genes and Genomes (http://www.genome.ad.jp/kegg/). The KEGG graph represents enzymatic relationships among various proteins. Graphs such as those provided by KEGG can be used by researchers, for example, who have information that indicates that two or more enzymes are related somehow, but who are unsure in which enzymatic pathways the enzymes function and how those pathways connect. By examining the KEGG graphs associated with the enzymes, researchers can examine multiple known pathways for potential relationships.
Although KEGG graphs (see, for example, Eisenberg et al., Protein Function in the Post-Genomic Era, Nature, Volume 405, Number 6788, Pages 823-826 (2000), Uetz et al., A Comprehensive Analysis of Proteinxe2x80x94Protein Interactions in Saccharomyces cerevisiae, Nature, Volume 403, Number 6770, Page 623-627 (2000), each of which is herein incorporated by reference in its entirety) are useful for viewing associations, they are limited in their applicability. A researcher likely will examine significant amounts of information in an attempt to determine the associations that exist. Although KEGG graphs allow a researcher to examine the entire set of graphs known to contain the enzymes of interest for associations, they do not filter out certain unwanted and unrelated information.
One proposed solution to the problem of reducing the irrelevant or less relevant information in graphs with multiple enzymatic pathways is to input one or more enzymes and extract any valid pathways in which the enzymes of interest occur (Fellenberg and Mewes, Interpreting Clusters of Gene Expression Profiles in Terms of Metabolic Pathways, MIPS, Max-Planck-Institut f. Biochemie, http://www.bioinfo.de/isb/gcb99/poster/fellenberg/). This approach, however, is restricted to valid metabolic pathways, i.e. pathways with no unaccounted for intermediates.
What is needed in the art are refined methods for determining the associations among specified biological elements within a larger set of elements with known biorelationships.
The present invention is in the field of bioinformatics, particularly as it pertains to determining the associations of biological elements. More specifically, the present invention relates to the determination of associations among a set of biological elements using an algorithm that is capable of generating a Steiner tree.
The present invention includes and provides a method for analyzing biological elements, comprising: a) providing a first set of biological elements; b) providing a graph representing relationships among a second set of biological elements, wherein the biological elements of the second set of biological elements are represented as vertices of the graph and biorelationships between the biological elements of the second set of biological elements are represented as edges of the graph, and wherein the second set of biological elements comprises the first set of biological elements; and, c) applying an algorithm capable of generating a Steiner Tree to the first set of biological elements and the graph to create a Steiner subgraph, wherein the Steiner subgraph comprises vertices from the graph corresponding to the first set of biological elements and further comprises edges and vertices from the graph connecting the vertices from the graph corresponding to the first set of biological elements.
The present invention includes and provides a method for analyzing genes, comprising: a) providing a first set of genes; b) providing a graph representing relationships among a second set of genes, wherein the genes of the second set of genes are represented as vertices of the graph and biorelationships between the genes of the second set of genes are represented as edges of the graph, and wherein the second set of genes comprises the first set of genes; and, c) applying an algorithm capable of generating a Steiner Tree to the first set of genes and the graph to create a Steiner subgraph, wherein the Steiner subgraph comprises vertices from the graph corresponding to the first set of genes and further comprises edges and vertices from the graph connecting the vertices from the graph corresponding to the first set of genes.
The present invention includes and provides a program storage device readable by a machine, tangibly embodying a program of instructions executable by a machine to perform method steps to analyze biological elements, the method steps comprising: a) providing a first set of biological elements; b) providing a graph representing relationships among a second set of biological elements, wherein the biological elements of the second set of biological elements are represented as vertices of the graph and biorelationships between the biological elements of the second set of biological elements are represented as edges of the graph, and wherein the second set of biological elements comprises the first set of biological elements; and, c) applying an algorithm capable of generating a Steiner Tree to the first set of biological elements and the graph to create a Steiner subgraph, wherein the Steiner subgraph comprises vertices from the graph corresponding to the first set of biological elements and further comprises edges and vertices from the graph connecting the vertices from the graph corresponding to the first set of biological elements.
The present invention includes and provides a program storage device readable by a machine, tangibly embodying a program of instructions executable by a machine to perform method steps to analyze genes, the method steps comprising: a) providing a first set of genes; b) providing a graph representing relationships among a second set of genes, wherein the genes of the second set of genes are represented as vertices of the graph and biorelationships between the genes of the second set of genes are represented as edges of the graph, and wherein the second set of genes comprises the first set of genes; and, c) applying an algorithm capable of generating a Steiner Tree to the first set of genes and the graph to create a Steiner subgraph, wherein the Steiner subgraph comprises vertices from the graph corresponding to the first set of genes and further comprises edges and vertices from the graph connecting the vertices from the graph corresponding to the first set of genes.
The present invention includes and provides a method for analyzing biological elements, comprising: a) providing a first set of biological elements; b) providing a graph representing relationships among a second set of biological elements, wherein the biological elements of the second set of biological elements are represented as vertices of the graph and biorelationships between the biological elements of the second set of biological elements are represented as edges of the graph, and wherein the second set of biological elements comprises the first set of biological elements; and, c) applying an algorithm capable of generating a Steiner Tree to the first set of biological elements and the graph to create a Steiner subgraph.