The present invention relates generally to the field of bioinformatics. In particular, the invention relates to methods, media and systems for graphically displaying computer-based biomolecular sequence information.
Informatics is the study and application of computer and statistical techniques to the management of information. Bioinformatics includes the development of methods to search computer databases of biomolecular sequence information (e.g., nucleic acid and protein) quickly, to analyze and display biomolecular sequence information, and to predict protein sequence, structure and function from DNA sequence data.
Increasingly, molecular biology is shifting from the laboratory bench to the computer desktop. Today""s researchers require advanced quantitative analyses, database comparisons, and computational algorithms to explore the relationships between sequence and phenotype. Thus, by all accounts, researchers cannot and will not be able to avoid using computer resources to explore gene sequencing, gene expression, and molecular structure.
One use of bioinformatics involves studying an organism""s genome to determine the sequence and placement of its genes and their relationship to other sequences and genes within the genome or to genes in other organisms. Such information is of significant interest in biomedical and pharmaceutical research, for instance to assist in the evaluation of drug efficacy and resistance. To make genomic information manipulation easy to perform and understand, sophisticated computer database systems have been developed. Incyte Pharmaceuticals, Inc. of Palo Alto, Calif., has developed several such databases, including some in which genomic sequence data is electronically recorded and annotated with information available from public sequence databases. Examples of such public sequence databases include GenBank (NCBI) and SWISSPROT. The resulting information is stored in a relational database that may be employed to determine relationships between sequences and genes within and among genomes.
While genetic data processing and relational database systems such as those developed by Incyte Pharmaceuticals, Inc. provide great power and flexibility in analyzing genetic information, further improvements in these systems will help accelerate biological research for numerous applications.
One area of interest in this regard is the display of biomolecular sequence information. As noted above, an important goal of genome research to determine the sequence and placement of a organism""s genes and their relationship to other sequences and genes within the genome, to genes in other organisms, and to related protein sequences. The ability to clearly and effectively display gene loci information for a given organism or organisms would greatly assist this task.
Accordingly, the development of a display tool which allows a user to clearly and effectively display gene loci information for a given organism or organisms and/or other biomolecular sequence information is desirable.
The present invention meets this need by providing methods, media and systems for graphically displaying computer-based biomolecular sequence information. Generally, biomolecular sequence information may be graphically depicted in a variety of different forms in accordance with the present invention. The sequence information may be composed of nucleotide or amino acid sequence information or both. The graphical depictions may be in several different formats providing different information relating to the sequences, and may be displayed in one or more screens of a computer user interface.
A graphical viewer in accordance with the present invention preferably has a plurality of panels, each panel displaying information about the biomolecular sequence data of interest in a different way on a single screen or page. For example, a first panel could show a graphical representation of the entire biomolecular sequence, or the portion of the sequence of interest, with the locations of particular subsequences of interest indicated. A second panel could show a more detailed graphical representation of all or a selected portion of the sequence represented in the first window, allowing a user to focus on a particular subsequence of interest. This second panel view could depict additional information, such as annotations, relating to the particular subsequences of interest. A third panel could show information graphically representing the confidence level or origination, for example, of the biomolecular sequence data represented in one or more of the other panels. Additional panels on the same or additional screens could show, for example, the actual nucleotide or amino acid sequence of or relating to a selected subsequence of interest represented in one or more of the other panels, or other information relating to the biomolecular sequence data.
In one preferred embodiment, a graphical viewer in accordance with the present invention provides a graphical representation of all or a selected portion of an organism""s genome with its individual loci indicated. The viewer allows the user to focus on a particular region or locus of interest and have it also be graphically represented with additional information, such as annotations. A graphical depiction of sequence coverage for the sequence regions represented in the viewer may also be provided.
The viewer may also provide for the display of related loci from other portions of the organism""s genome (i.e., paralogs), and allows for the retrieval of information about the loci, such as actual nucleotide sequences or detailed annotations, from an associated relational database system. In addition, a graphical viewer in accordance with the present invention may provide for the graphical representation and comparison of multiple portions of the genome of one or more organisms based on a locus of interest and its corresponding paralogs and homologs (related loci from another organism""s genome).
A graphical viewer in accordance with a preferred embodiment of the present invention preferably provides graphical representations of the genomic data in a plurality of panels, each panel displaying information about the genomic data of interest in a different way. In a particularly preferred embodiment of the invention, the graphical viewer has three main panels on a single screen: a legend viewer, which shows the entire portion of the genome under consideration; a target viewer, which allows a user to focus (xe2x80x9czoom inxe2x80x9d) on areas of the genome portion of particular interest; and a sequence depth viewer, which contains graphical information illustrating the depth of coverage over the length of the genome portion under consideration.
In one aspect, the present invention provides a method implemented in a computer system for presenting biomolecular sequence data. The method involves retrieving biomolecular sequence data from a database in response to a user query, and graphically depicting elements of the biomolecular sequence data in a user interface for the computer system. The graphical depiction may include a plurality of panels representing different aspects of the biomolecular sequence data in a single frame.
In a preferred embodiment, the biomolecular sequence data my include gene locus data and be graphically depicted in three panels, the first panel graphically depicting at least a portion of a contig and its associated loci, the second panel graphically depicting at least a portion of the contig depicted in the first panel and annotated loci associated with the portion, and the third panel graphically depicting information indicating the number of sequencing operations conducted to determine the sequence data depicted in the second panel. The third panel may graphically depicts sequences used to assemble the portion of the contig depicted in the second panel, or depth of coverage information for the portion of the contig depicted in the second panel.
In another aspect, the invention provides another method implemented in a computer system for presenting biomolecular sequence data. The method involves retrieving biomolecular sequence data for a plurality of homologous loci from a database in response to a user query, and graphically depicting at least some of the homologous loci in a user interface for the computer system.
In yet another aspect, the invention provides a computer system. The computer system includes a database including biomolecular sequence data, and a user interface. The user interface is capable of receiving a query relating to the biomolecular sequence data, and graphically displaying the results of the query.
In still another aspect, the invention provides a computer-readable medium containing programmed instructions arranged to graphically display biomolecular sequence data. The computer-readable medium includes programmed instructions for retrieving biomolecular sequence data from a computer system database in response to a user query, and graphically depicting elements of the biomolecular sequence data in a user interface for the computer system.