Biochemical separations play an important role in the analytical study of countless complex biological, environmental, and industrial samples. Separation procedures reveal the constituent components in a sample, thus shedding some light on our understanding of the nature and purity of things. Separation methods are simply tools by which we increase the quantity of information available about complex mixtures and to enhance the quality of that information. Most complex mixtures, especially those of biochemical origin, contain similar underlying structures and common functional groups. Thus, it is difficult if not impossible to identify or quantify the constituents so long as they remain in the mixture. By separating out the constituents in a mixture, their identification can be made such as by detecting a physical property; e.g. light absorbance, size, mass tag, presence of color tag, etc.
Biochemical separation techniques are used in a variety of the life sciences and related industries, including: nutritional analysis of foods for protein, fatty acid content, and carbohydrate content; analysis of foods for toxins, e.g. bacterial contamination or shellfish poisons; protein analysis, including gel electrophoresis and protein purification; separation of carbohydrates, separation of oligonucleotides and individual molecules such as nucleotides, amino acids, sugars, and biogenic amines; separation and analysis of pesticides and other synthetic organic molecules; analysis of drugs and pharmaceuticals; clinical profiling such as serum protein analysis; forensic and explosive analysis; analysis of serum organic acids, cyanide and related compounds, and chemical warfare compounds; urinalysis for drug metabolites; analysis of neurotransmitters such as catechol amines and epinephrine; separation and analysis of lipoproteins, separation and analysis of vitamins; preparation or purification of monoclonal antibodies; and protein digest sequencing or structural analysis. The list of course is not exhaustive and is intended to point out the large number of applications and diverse uses for separation methodologies.
Manual inspection of the separation data which results from a separation procedure is a labor intensive and time consuming effort. It is prone to error because of the repetitiveness of the process, and is subject to inconsistent results because of the reliance on subjective human interpretation of the data. This is generally true of all separation data regardless of the source of the data, i.e. whether the data comes from an electrophoretic separation or a chromatographic process, and regardless of the nature of the biochemical analytes being studied.
One of the most common uses of biochemical separation is in the analysis of DNA, and one of the greatest undertakings in this area is the Human Genome Initiative. Genetic analysis projects can require thousands, even millions, of DNA genotypes to be determined, analyzed, and reviewed. For example, it is estimated that up to one million genotypes will be required to map multigenic disorders such as diabetes and heart disease.
Genetic mapping plays an essential role in the process of gene discovery. Armed with new genetic markers and maps, researchers are poised to localize new genes at a dramatic pace. The most commonly used genetic markers employed in gene linkage analysis are highly informative, simple sequence repeat (SSR) polymorphisms. Currently, these 2-, 3-, and 4-base pair (bp) repeats are genotyped by manual inspection and scoring of electropherogram profiles generated on slab-gel electrophoresis systems. For example, Mansfield et al. disclose a method for automatically computing the lengths of DNA fragments based on an initial examination of the separation data by the researcher (David C. Mansfield et al., "Automation of Genetic Linkage Analysis Using Fluorescent Microsatellite Markers," Genomics, Vol. 24, pp. 225-233, 1994). Although the final analysis is carried out by computer, there is still an initial screening step performed by the researcher.
The purpose of the initial screening is to eliminate from the automated analysis profiles of runs which were deemed by the user to be of unacceptable quality. A bad run can result from any of a number of sources of error. For example, impurities in the sample or undetectable levels of DNA would result in bad separation data. In the case where the genetic sample was good, the conditions of the separation run may have been compromised such as by the presence of a bad separation matrix or an improperly controlled electric field. The user must inspect the profiles and make a judgment call as to whether the data can be used for further analysis or whether the run needs to be repeated. The Mansfield et al. technique does not automatically provide this step, relying instead on the researcher's manual review of all data. Moreover, in the case of a good separation run, the Mansfield et al. technique and other methods in common practice still require the researcher to pull out certain information from the separation data and enter that information into the computer.
What is needed is a method for automated decision-making during certain phases in the analysis of a biochemical sample. More specifically, it is desirable to automate analysis of separation data so that a decision can be made as to a subsequent course of action in the analytical process. It is desirable to automatically ascertain the quality of a separation run before proceeding to the next step.