Field of the Invention
The present invention generally relates to genomic visualization.
Description of the Related Art
Genomic visualization tools have been devised to assist researchers, laboratories, and other users to visually display and understand genomic data. The genomic data is often in the form of individual samples having chromosomal data (including measurements of at least one event at a particular location on the chromosomes). An event here would indicate some measurement related to the genome. Examples of such measurements include the expression of a gene, an exon at a particular location, the number of copies of a portion of the genome that have been gained or lost, the extent of methylation of the genome at a particular location, the affinity of certain promoters to bind to a particular area on the genome, etc. In some cases, users may calculate a frequency of event based on a frequency of occurrence of the event in the selected sample. For example, it may be desirable to calculate the frequency of aberration, such as the frequency of a gain or loss of chromosomal copies when compared to a reference sample in a selected population of samples. In other circumstances, it may be desirable to review an annotation regarding specific information as related to a particular chromosomal region of the chromosome. Such information might include items such as what genes are present in a location and if there are known copy number polymorphisms in that area (including a list of such polymorphisms). Other items might include information pertaining to the presence of miroRNAs and potential Single Nucleotide Polymorphism (SNP)s in the area, etc.
Genomic data are available from public or private databases and academic or commercial diagnostic laboratories. Genomic data can also be obtained by sequencing the entire genome of an individual, or a portion thereof. Suitable methods of DNA sequencing include Sanger sequencing, polony sequencing, pyrosequencing, ion semiconductor sequencing, single molecule sequencing, and the like. Sequenced genomic data can be provided as electronic text files, html files, xml files and various other regular databases formats.
Existing systems available for visualization of chromosomal or genomic annotations, such as the University of California of Santa Cruz browser and the Ensemble Genome Browser, display various annotations for a specific region of the genome. Ensemble is a joint project between the European Molecular Biology Laboratory, the European Bioinformatics Institute and the Wellcome Trust Sanger Institute.
The molecular data to be processed in a bioinformatics based platform typically concerns genomic data, such as Deoxyribonucleic acid (DNA) data. For example, a well-known method for generating DNA data involves DNA sequencing. DNA sequencing can be performed manually, such as in a lab, or may be performed by an automated sequencer, such as at a core sequencing facility, for the purpose of determining the genetic makeup of a sample of an individual's DNA. The person's genetic information may then be used in comparison to a referent, e.g., a reference genome, so as to determine its variance therefrom. Such variant information may then be subjected to further processing and used to determine or predict the occurrence of a diseased state in the individual.
Manual or automated DNA sequencing may be employed to determine the sequence of nucleotide bases in a sample of DNA, such as a sample obtained from a subject. Using various different bioinformatics techniques these sequences may then be assembled together to generate the genomic sequence of the subject, and/or mapped and aligned to genomic positions relative to a reference genome. This sequence may then be compared to a reference genomic sequence to determine how the genomic sequence of the subject varies from that of the reference. Such a process involves determining the variants in the sampled sequence and presents a central challenge to bioinformatics methodologies. Genomic data includes sequences of the DNA bases adenine (A), guanine (G), cytosine (C) and thymine (T). Genomic data includes sequences of the RNA bases adenine (A), guanine (G), cytosine (C) and uracil (U). Genomic data also includes epigenetic information such as DNA methylation patterns, histone deacetylation patterns, and the like.
“Phenotypic traits” are an organism's observable characteristics, including but not limited to its morphology, development, biochemical or physiological properties, behavior, and products of behavior (such as a bird's nest). Phenotypic traits also include diseases, such as various cancers, heart disease, Age-related Macular Degeneration, and the like.
Non-limiting general definitions for terms utilized in the pertinent art are set forth below.
Allele is any two or more alternative forms of the same gene that have the same relative position on homologous chromosomes.
BAM format is a binary alignment map format, which is the binary version of SAM.
Chromosome is a strand of DNA that is encoded with genes.
DNA is deoxyribonucleic acid, which contains the genetic code. It consists of two nucleotide chains in a double helix and joined by hydrogen bonds between complimentary bases of adenine and thymine, and cystosine and guanine.
Exome is part of the genome formed by exons, the sequences which when transcribed remain within the mature RNA after the introns are removed by RNA splicing.
Genome is the full set of chromosomes, the genetic material of an organism, and includes genes and non-coding sequences of DNA/RNA.
Hypertext Transfer Protocol (“HTTP”) is a set of conventions for controlling the transfer of information via the Internet from a web server computer to a client computer, and also from a client computer to a web server, and Hypertext Transfer Protocol Secure (“HTTPS”) is a communications protocol for secure communication via a network from a web server computer to a client computer, and also from a client computer to a web server by at a minimum verifying the authenticity of a web site.
Internet is the worldwide, decentralized totality of server computers and data-transmission paths which can supply information to a connected and browser-equipped client computer, and can receive and forward information entered from the client computer.
Nucleic acid library is a plurality of polynucleotide molecules that are prepared, assemble and/or modified for a specific process.
Phenotype is the composite of an organism's observable characteristics or traits, such as its morphology, development, biochemical or physiological properties, phenology, behavior, and products of behavior. A phenotype results from the expression of an organism's genes as well as the influence of environmental factors.
SAM is sequence alignment map format is a text format of mapping sequence reads (sequence information from a fragment whose physical genomic position is unknown) with a matching sequence in a reference genome.
Single Nucleotide Polymorphism (“SNP”) is a DNA sequence variation occurring when a single nucleotide in the genome differs between members of a species (or between paired chromosomes in an individual).
URL or Uniform Resource Locator is an address on the World Wide Web.
User Interface or UI is the junction between a user and a computer program. An interface is a set of commands or menus through which a user communicates with a program. A command driven interface is one in which the user enter commands. A menu-driven interface is one in which the user selects command choices from various menus displayed on the screen.
Variant calling is a method of identifying factual differences between sequence reads of test samples and a reference sequence. Variant calling is used to identify somatic variants with a high degree of confidence.
Web-Browser is a complex software program, resident in a client computer, that is capable of loading and displaying text and images and exhibiting behaviors as encoded in HTML (HyperText Markup Language) from the Internet, and also from the client computer's memory. Major browsers include MICROSOFT INTERNET EXPLORER, NETSCAPE, APPLE SAFARI, MOZILLA FIREFOX, and OPERA.
Web-Server is a computer able to simultaneously manage many Internet information-exchange processes at the same time. Normally, server computers are more powerful than client computers, and are administratively and/or geographically centralized. An interactive-form information-collection process generally is controlled from a server computer, to which the sponsor of the process has access.
There is a need for distributing genomic data from a source to a recipient in a secure and efficient means.