The present invention relates to the detection of biological related materials and, in particular, the detection of biological related materials that may present a health risk to a population that is or may be exposed to such materials.
There are many types of biological related materials that pose health risks to populations (human, animal or plant) that may be exposed to such materials. Among the biological related materials that pose health risks are bacteria, viruses, and toxins (which are chemicals that are derived from biological entities). Among the more well known bacteria that present health risks to humans are Bacillus anthracis (anthrax), Yersinia pestis (plague) and Vibrio cholerae (cholera). Examples of viruses that pose health risks to humans are the small pox, Ebola, Marburg and Hanta viruses. Toxins that present a danger to humans include polyethers, proteins and mycotoxins.
Presently, there are a number of techniques available for detecting biological related materials (hereinafter referred to as xe2x80x9cbiological agentsxe2x80x9d). One traditional technique is to: (1) obtain a sample of the material that is suspected of containing a biological agent; (2) place the sample in an environment (such as a petrie dish with agar) in which certain biological agents, if present, can grow and multiply; and (3) after sufficient time has elapsed for any such agents to grow and replicate, visually identify (typically, with a microscope) the biological agent present in the nurturing environment.
Immuno-assay is another technique that is used to detect biological agents. In the immuno-assay technique, a sample of the material that is suspected of containing a biological agent is obtained and processed so as to place any biological agent present in the sampled material into a liquid suspension. The liquid with any biological agent suspended therein is then subjected to an immuno-assay test. In the test, an antibody that reacts with a specific biological agent (for example, Bacillus anthracis) is provided. If there is a biological agent present in the liquid sample and the agent is of the genus for which the antibody is present, a reaction will occur that is detectable. In one implementation, the liquid sample is applied to a glass microscope slide (also referred to as a xe2x80x9cticketxe2x80x9d) that includes antibodies for at least one biological agent. If a biological agent of the genus or species that corresponds to the antibody is present in the sampled liquid, there is a reaction between the antibody and the biological agent (antigent) that results in a visible indication of the presence of the biological agent being provided.
Another technique for detecting biological agents is the polymerase chain reaction (PCR) technique, which is a DNA analysis technique. In the PCR technique, the material that is suspected of containing a biological agent is sampled and, if necessary, transferred to a liquid medium. The DNA of any biological agent that is present in the sample is then placed in an environment that, for only a particular biological agent of interest, will cause a section or vectors of its DNA to multiply or replicate. For example, if the biological agent of interest is anthrax, the environment will only promote the multiplication of a vector of anthrax DNA, if any, present in the sample. If the particular biological agent of interest is present in the sample, the increase in the number of DNA for the biological agent produced by the noted environment creates a signal that can be detected by, for example, electrophoresis or fluorescence.
Yet another method for detecting biological agents is known as the MIDI technique. In the MIDI technique, the material that is suspected of containing a biological agent is sampled and, if needed, transferred to a liquid suspension. Any biological agent present in the liquid is then transferred to a nurturing environment, such as an agar filled petrie dish, so that the agent can grow and replicate. After any biological agent present in the sample has had sufficient time to grow and replicate, the agent is harvested from the nurturing environment and the fatty acids present within the cells of the agent are subjected to a process in which the fatty acids are converted to fatty acid methyl esters. The fatty acid methyl esters are then analyzed with a gas chromatograph to produce a xe2x80x9cfingerprintxe2x80x9d of the fatty acid methyl esters. The xe2x80x9cfingerprintxe2x80x9d of the unidentified agent is then compared to the xe2x80x9cfingerprintsxe2x80x9d for known biological agents contained in a database (which commonly has 10,000 or more xe2x80x9cfingerprintsxe2x80x9d) to identify the biological agent in the sample. Due to the large number of xe2x80x9cfingerprintsxe2x80x9d against which the unidentified xe2x80x9cfingerprintxe2x80x9d is compared, the technique identifies a number (typically, 10) of the known biological agents as having the closest xe2x80x9cfingerprintsxe2x80x9d to the unidentified xe2x80x9cfingerprint.xe2x80x9d
Yet a further technique for detecting biological agents is matrix assisted laser desorption ionization (xe2x80x9cMALDIxe2x80x9d) mass spectrometry. As with the prior techniques, the material that is suspected of containing a biological agent is sampled and, if required, any biological agent present in the sample is transferred to a liquid suspension. The liquid containing any biological agent is then combined with an organic compound that is capable of absorbing infrared or ultraviolet light. The mixture is then dried so that any biological agent present in the sampled atmosphere is bound up in the crystalline matrix of the organic compound. The dried material is then bombarded with light from an infrared or ultraviolet laser that results in any protein associated with any biological agent present in the sample being ionized. The ionized proteins are then analyzed with a mass spectrometer to produce a mass spectrum that is compared to a data base of mass spectra of known proteins to identify the biological agent present in the sample.
The present invention provides a method for analyzing data related to the composition of an unidentified biological agent to both identify the agent and assess the reliability of the identification. In one embodiment, the method includes receiving the data related to the composition of the unidentified biological agent. Typically, the received data is type of mass spectral data (e.g., xe2x80x9cfull scanxe2x80x9d MS, MS/MS, mix-Collision Induced Dissociation (CID) or full-CID) or chromatographic data. The received data is applied to a machine learning procedure that makes an initial identification of the unidentified biological agent. Depending upon the particular application of the method, the machine learning procedure may involve a single step or several steps. For instance, if the method is to be used to determine whether the unidentified biological agent is a bacteria, virus or toxin, it is feasible to implement the machine learning procedure in a single step. In contrast, if a detailed identification is required, such as a particular bacteria, the machine learning procedure is typically implemented in a multi-step fashion. The step or steps of the machine learning procedure are implemented using machine learning techniques, such as artificial neural networks and/or multi-variate statistical analysis. For example, in a multi-step machine learning procedure, each step utilizes a distinct artificial neural network.
Once the identification has been made, ion fragmentation analysis (e.g., MS/MS, mix-CID or full-CID) is used to assess the reliability of the identification provided by the machine learning procedure. In one embodiment, the ion fragmentation analysis is used to assess whether the initial identification of the unidentified biological agent is a false positive or a false negative. An example of a false positive would be if the unidentified biological agent is initially identified as a harmful virus and this identification is subsequently shown to be incorrect by the ion fragmentation analysis. In contrast, an example of a false negative would be if the unidentified biological agent is initially identified as one of the harmless forms of E. coli and is subsequently shown by the ion fragmentation analysis to be one of the harmful forms of E. coli. As the example demonstrates, in many cases, it is more important to detect a false negative (which means that a harmful biological agent is likely to be present) than a false positive (which means that a benign biological agent is likely to be present). In such a situation, the ion fragmentation analysis at least includes an assessment of whether the initial identification is a false negative. If the identification is confirmed by the ion fragmentation analysis, the identification is output.
A further embodiment of the invention provides a method for identifying an unidentified biological agent that makes use of a single set of data in the form of either a mix-CID or full-CD mass spectrum. The method includes receiving either the mix-CID or full-CID data on the unidentified biological agent The mix-CID data includes: (1) a portion of the data that would be obtained by subjecting a sample of the unidentified agent to a full scan mass spectrum analysis between high and low mass limits; and (2) a portion of the data that would be obtained by subjecting a series of multiple ions from a full-scan mass spectrum analysis to MS/MS analysis. The full-CD data includes: (1) all of the data that would be obtained by subjecting a sample of the unidentified agent to a full scan mass spectrum analysis between high and low mass limits; and (2) the data that would be obtained by subjecting all of the ions from a full scan mass spectrum analysis to MS/MS analysis. In one embodiment, the portion of the mix-CD or full-CID data that would be obtained from a full-scan MS analysis is used to make an initial identification of the unknown biological agent. The portion of the mix-CID or full-CID data that would be obtained by subjecting multiple ions from a full-scan mass spectrum analysis to MS/MS analysis is subsequently used to assess the reliability of the initial identification. Alternatively, the mix-CID or full-CD data is analyzed such that the identification and the reliability assessment are performed at the same time, rather than doing the initial identification and then the assessment. In one embodiment, artificial neural networks are used to evaluate both the MS data and the multiple ion MS/MS data (mix-CID or full-CID) at the same time. Other types of machine learning techniques, such as multi-variate statistical analysis, can also be used.
Another embodiment of the invention provides a method for rapid identification of an unidentified biological agent. The method includes receiving data on the biological compositional of the unidentified biological agent. In one embodiment, mass spectrum related data (e.g., MS data or CID data) is used. Alternatively, chromatographic data is used. The received data is subjected to an analysis step in which a decision is made as to whether the unidentified biological agent is, for example, in a first group of known agents or a second group of known agents. By determining that the unidentified agent is in either the first group or the second group, one entire group (the first or second) is eliminated from consideration. By repeating this style of analysis, a very detailed level of identification is attainable. For instance, if it is desirable to identify the unidentified biological agent at the specie or strain level (i.e., a very detailed level of identification), the identification technique may have several steps. If, however, a kingdom level identification (i.e., a low level of identification) is desired, it is possible to implement the analysis such that only one pass is needed to make the kingdom level identification. In one embodiment, the analysis is implemented with one or more artificial neural networks. Alternatively, a multi-variate statistical analysis or other machine learning technique is used. In any event, the method is faster, on average, than current analysis techniques that must potentially compare the xe2x80x9cfingerprintxe2x80x9d of the unidentified agent to all of the xe2x80x9cfingerprintsxe2x80x9d of known agents in a library, many of which contain 10,000 or more xe2x80x9cfingerprints.xe2x80x9d