Microorganisms have traditionally been classified along more or less arbitrary lines based upon selected observable characteristics. While generally satisfactory and historically carried through until today, such classification becomes difficult for unicellular organisms. These and other classification systems have proved crude and ineffectual in view of the present day need for the medical profession to have accurate identification of infectious organisms, viruses and the like.
To overcome some of these problems Webster, Jr. describes in his U.S. Pat. No. 4,717,653 issued Jan. 5, 1988 a method for identifying and characterizing organisms by comparing the chromatographic pattern of restriction endonuclease-digested DNA from the organism, which digested DNA has been hybridized or reassociated with ribosomal RNA information-containing nucleic acid from or derived from a probe organism, with equivalent chromatographic patterns of at least two different known organisms.
The DNA molecule is a polymer shaped like a spiral staircase in which the rails consist of repeating phosphate and sugar groups, and each step is made up of There are only four types of bases: adenine and thymine (A and T), which are always paired with each other; and guanine and cytosine (G and C), which are also paired. A gene, which determines some characteristic of an organism, is simply a stretch of DNA several hundred or thousand bases long.
It is the sequence in which these bases appear that directs a cell's function. Most DNA sequencing procedures attempt to determine the entire sequence of bases in the DNA, while with Webster Jr.'s method the presence of some subsequences with certain conserved characteristics is sought. The first step in the DNA analysis is to segment the DNA molecule. There are several standard methods for doing this, based on restriction enzymes--proteins that "snip" the DNA at specific sites.
The fragments are then sorted by size via gel electrophoresis. In this process, the mixtures containing the fragments are placed in a gel that separates by size, e.g., an agarose gel, and subjected to an electric field that drives the negatively charged DNA fragments towards the positive pole. As the particles move slowly through the gel, the smallest fragments move at a higher speed, and thus arrive first at the opposite end of the gel. After a predetermined time, typically several hours, the electric field is removed.
Webster Jr. discovered that a properly selected labelled probe material may then be hybridized to these DNA fragments after the fragments are made single stranded. The probe material fits onto certain DNA half-fragments much like a piece fits in a jigsaw puzzle. The probe material is labeled with a suitable tag, typically with phosphorous 32. The location of the probe material is then detected through autoradiography, a process in which the radioisotope on the labeled fragments exposes a portion of photographic film.
The end result is an image in which the DNA fragments are arranged in decreasing order of size from one end of the agarose gel to the other. The pattern of radiolabeled fragments, which have been sorted by size, uniquely characterizes the microorganism. This technique has been demonstrated for bacteria and is being extended to other life forms. Thus bacteria, and other higher life forms, may be identified from their DNA rather than from externally observable characteristics.
This photographic image contains a series of bands of varying widths and intensities along parallel linear paths corresponding to the gel electrophoresis lanes. This sheet is scanned with a standard CCD video camera to acquire an electronic image of the radiogram. This image has an appearance very much like a chromatogram containing peaks and valleys varying as a function of distance along the electrophoresis gel. This series of peaks and valleys is unique to the DNA which identifies a particular microorganism. Webster, Jr. suggests in his patent on column 15, line 24 that a user can either compare the obtained band pattern visually or by the aid of a one dimensional computer assisted digital scanner program for recognition of a pattern. The computer memory contains a library or catalog of the different band patterns for a plurality of known organisms. It is now simply a matter of comparing the unknown organism or pattern to the catalog of known patterns to achieve identity of the particular organism.
A related technique, described in U.S. Pat. No. 4,753,878 issued June 28, 1988 to Silman, adds a tag to a microorganism which tag is actively incorporated into the products of metabolism of the microorganism. The the products of metabolism of the microorganism. The products are separated by a gel and the tags detected typically by autoradiography. The emission pattern is detected to provide an electrical signal whose wave pattern or spectra is indicative of the identity of the microorganism. This wave pattern is compared in a computer with patterns stored therein representing a collection of known microorganisms.
In addition to the identification of the characteristic DNA pattern of microorganisms, there are many other instances in which it is desirable or necessary to identify a particular spectra or pattern; for example, U.S. Pat. No. 4,651,289 issued Mar. 17, 1987 to Maeda et al. describes a voice pattern recognition method. The need for pattern recognition extends to many fields including astronomy, mass spectrometry and the like. Maeda et al. describe the similarity method or pattern matching method as being widely used in the field of character recognition. They note that the problem associated with most similarity identification methods is that they require storage for numerous reference patterns and excessive computer time to calculate the numerous matrix calculations needed to analyze and compare the various stored reference patterns and input unknown patterns. Due to the large memory and the large amount of computing time required, a relatively expensive computer typically is required for these operations.
Maeda et al. describe a data base organization and search method using cosines as a similarity metric. The access time using this method is proportional to a fraction of the total number of data base entries and improves somewhat over the existing classical method of comparing the unknown to each element of the entire data base. Unfortunately Maeda et al. still require a relatively high powered computer and a relatively large data base storage capability.