1. Field of the Invention
The present invention relates to a novel computer program product to extract and gather peak information from an automated sequencer or bioinformatics tool into a peak database, and to manipulate and analyze the peak information within the database.
2. Discussion of the Background
The recent conclusion of several genome sequencing projects, including yeast (Nature 1997; 387:suppl. 3-105), human (Venter et al, Science 2001; 291:1304-1351), C. elegans (Science 1998; 282:2012-8), and rice (J. Yu et al., Science 2002; 296:79 and S. A. Goff et al., Science 2002; 296:92), as well as on-going sequencing efforts, have generated a deluge of DNA sequence information. These DNA sequences encode the basic “message of life.” However, cataloguing and probing the vast numbers of genes and the proteins, which they encode, can provide novel insights into cell biology, drug design, and therapeutic strategies.
Accordingly, many new analytical methods have been developed to digest the flood of genome sequence data, including analysis of the transcriptome, proteome and metabolites. High-throughput analysis of protein targeting and other methods will ascribe new information to proteins and create important links with other large datasets. To fulfill the potential revealed by this genomic information, many challenges have to be met. Among these are indexing and cataloguing of raw DNA and RNA sequence data, identification of genes and the regulation of their expression, characterization of protein activity and protein-protein, protein-ligand, or protein-DNA/RNA interactions.
One such strategy commonly employed is a DNA automatic sequencer. DNA automatic sequencers are used to determine DNA fragment lengths in a wide array of applications: DNA sequencing, microsatellites, Single Nucleotide Polymorphism, Restriction or Amplified Fragment Length Polymorphism, Single Strand Conformation Polymorphism, gene expression quantification and analyses of the immune receptor diversity. All of these applications require access to raw data (peak area and nucleotide length). Raw data being stored in one file per lane, studies rapidly give rise to hundreds of files. However, with the increasing number of samples analyzed, no tool is currently available to allow the extensive and efficient retrieval of this raw data.
Accordingly, there remains a critical need for a novel program for handling the extensive amounts raw data provided by automated sequencers. In addition, there remains a critical need for a novel program for efficient retrieval of this raw data. Moreover, there remains a critical need for a novel program to analyze the extensive amounts of raw data.