Methods for fast, error-free identification of microorganisms (or microbes) play an important role during, for example: clinical and extra-clinical infection diagnostics; hygiene monitoring in hospitals and at rivers and lakes used for swimming; food analysis; monitoring and controlling biotechnical processes; and microbiological research. Many institutes worldwide collect various strains of vacuum-dried or deep-frozen microbes for such identifications.
The term “microorganism” or “microbes” describes microscopically small organisms and some viruses. The organisms can include, for example, bacteria, unicellular fungi (e.g., yeasts), algae, and protozoae (e.g., plasmodia as malaria pathogens). Microbes are typically categorized according to the following taxonomic hierarchical scheme: domain (eukaryotes and prokaryotes), kingdom, phylum, class, order, family, genus, species and subspecies. Occasionally additional taxonomic class(es), e.g., serovars or serotypes, are used for differentiating microbes, such as bacteria, included within a subspecies. Serovars and serotypes are distinguished by their different types of attachment behavior at a cell membrane.
The genus and typically the species are determined in order to identify a microbe sample. When possible, the subspecies, the serotype and/or the strain are also determined for the microbe identification. Alternatively, a microbe sample may be identified using other distinguishing characteristics such as pathogenicity of the microorganism (i.e., the ability to bring on an illness), or resistance of the microorganism against antibiotics.
Traditionally, colonies of a sampled microorganism are cultivated in order to determine the identity thereof. “API Tests” used in laboratories, for example, include different culture media for microbe cultivation. Each culture media can detect a specific metabolic characteristic of a microorganism, which permits an initial, approximate taxonomic classification of the microorganism. Microscopic morphology of individual organisms in the colony and the morphology of the colony itself can also be determined. Other types of identification methods can also be used such as: (i) a DNA or a RNA sequence analysis after replication of specific genetic sequences by polymerase chain reaction (PCR), or (ii) a mass spectrometric detection of specific molecular cell components of microorganisms. These alternative methods are generally considered superior to the aforesaid cultivation method for their specificity (true-negative rate), sensitivity (true-positive rate), other error rates and analytical speed.
A publication by van Baar (FEMS Microbiology Reviews, 24, 2000, 193-219: “Characterization of bacteria by matrix-assisted laser desorption/ionization and electrospray mass spectrometry”), describes one example of a mass spectrometric measurement method for bacteria identification. The identification is determined by analyzing similarities between a mass spectrum of the bacteria and reference spectra for known bacteria. During the analysis, a similarity indicator is assigned to each of the reference spectra. The similarity indicator is a measure of agreement between the reference spectrum and the mass spectrum of the sample. The bacterium is identified when, for example, the similarity indicator is significantly larger than similarity indicators for all other reference spectra, and is also larger than a specified minimum value.
The reference spectra are usually collected in a library, which may include reference spectra of bacteria and other microbes, such that bacteria and other types of microorganisms may be identified. Official directives prescribe a distinct validation of medical and forensic reference spectra libraries. Validations typically require each entry be traceable and accurately documented. The reference spectra are obtained from accurately characterized and identified strains. The strains of microorganisms are collected worldwide in government, public and private institutes, usually stored in a deep-frozen or vacuum-dried state, and made available for scientific purposes. Some microbiology institutes also catalog newly discovered strains of microbes. Although the exact classification of certain microbes may be disputed, these disputes are not detrimental to the value of the strains.
The term “strain” describes a microbe population that has been multiplied from a single organism. The individual organisms of the strain are genetically identical. As set forth above, the strains are cataloged in spectral libraries and have known (albeit sometimes disputed or changed) identities and classifications. In other words, each cataloged strain is identified as belonging to a known species and, where available, a known subspecies. Since microbes are collected and stored at different locations worldwide, many libraries have the same subspecies of certain strains. Although these strains are classified as having the same subspecies, however, there may be slight differences in the mass spectra of the same strain in different libraries. This indicates that there can be individual differences (such is the case with animals or plants of the same species) or even further branches of the hierarchy scheme such as, for example, serotypes. The strains are designated by internationally agreed labels after the name of the species or subspecies.
During a mass spectra measurement process, a colony of microbes is disposed on a solid, gelatinous nutrient medium or a centrifuge sediment (pellet) from a liquid nutrient medium. A small swab is used to transfer a tiny quantity of microbes from the colony or the sediment to a mass spectrometric sample support. A strongly acidified solution of a conventional matrix substance is sprinkled onto the sample. The matrix substance is used during a subsequent ionization by matrix-assisted laser desorption (MALDI). The acid of the matrix solution attacks the cell walls, and the organic solvent penetrates the microbial cells. Osmotic pressure causes the cell walls to burst and to release soluble proteins. The burst sample is dried and the dissolved matrix material crystallizes. The soluble proteins and, to a much lesser extent, other substances are also embedded into the matrix crystals.
In some cases, the cell walls of the microbes are difficult to destroy or are not destroyed by the matrix solution. In these cases, additional strong acids may be added to the matrix solution. The solution may also be sonically or mechanically treated to destroy the microbial cell wall. This procedure generates mass spectra that are similar to the spectra generated using the usual preparation on sample supports. The libraries of reference spectra can include reference spectra for both preparation methods.
The sample preparations dried on sample supports, i.e., the matrix crystals with the embedded analyte molecules, are inserted into an ion source of a mass spectrometer and bombarded with pulsed UV laser light. The pulsed UV laser light creates ions of analyte molecules which can be separated by mass in the mass spectrometer and measured. This type of ionization by matrix-assisted laser desorption is usually referred to as Matrix-Assisted Laser Desorption and Ionization (MALDI). Several types of commercial MALDI time-of-flight mass spectrometers are commercially available.
Today, mass spectra of microbe proteins are typically obtained using time-of-flight mass spectrometers operated in a linear mode. The mass spectra are obtained without using an energy focusing reflector because the linear mode exhibits a particularly high detection sensitivity, even though mass resolution and mass accuracy of the spectra from time-of-flight mass spectrometers in a reflector mode is greater. Specifically, approximately one twentieth of the ion signals appear in the reflector mode and the detection sensitivity is one to two powers of ten less than that in the linear mode. The linear mode of a time-of-flight mass spectrometer has a high sensitivity because the stable ions and the fragments from so-called “metastable” decays of the ions are detected. Secondary electron multipliers (SEM) are used in these mass spectrometers such that the neutral particles from ion disintegrations may be measured with the ion detector, because the neutral particles also generate secondary electrons on impact. The fragment ions and the neutral particles, which originated from one species of a parent ion, have the same speed as the parent ions and thus arrive at the ion detector at the same time. The time of flight of the fragment ions and the neutral particles is a measure of the mass of the originally undecayed ions.
The disadvantages associated with linear operation of time-of-flight mass spectrometers, for example significantly lower mass resolution and reduced mass accuracy, are typically outweighed by the need for high detection sensitivities. In order to increase the ion yield during linear operation, desorbing and ionizing laser energy is increased. The increase in the desorbing and ionizing laser energy, however, can also increase ion instability. The masses of individual mass signals can be shifted slightly from spectrum to spectrum due to poor reproducibility of the desorption and the ionization processes during the generation of the ions in a MALDI time-of-flight mass spectrometer operated in a linear mode. The mass shifts in the mass scales of the repeat spectra can be readjusted before the repeat spectra are combined to a reference spectrum. Such a readjustment method is disclosed in U.S. Pat. No. 7,391,017 to M. Kostrzewa et al., which is hereby incorporated by reference. The mass scales of sample and reference spectra can also be adjusted with respect to one another. Smaller mass tolerance intervals therefore can be used to determine matching mass signals during the similarity analysis.
The mass spectrum of microbes is equivalent to frequency profiles of mass values of the ions. The mass spectra for protein ions are usually obtained in the mass range between 2,000 daltons to 20,000 daltons. The mass spectra used for identifications are predominantly obtained in the mass range between around 3,000 daltons to 15,000 daltons. The reduced resolution indicates that the mass signals can no longer be resolved individually in the aforesaid mass range, rather each isotope group forms a single fused mass signal. Typically, the protein ions have a single charge (charge number z=1). Ions can therefore be referred to using their mass m, rather than using the more accurate “mass-to-charge ratio” m/z.
Each laser light pulse produces a single mass spectrum. The mass spectrum, however, merely includes signals for a few hundred to a few thousand ions. Typically, a few tens to a few hundreds of these individual mass spectra are added up to form a sum mass spectrum in order to provide greater reliability and less noise. The individual mass spectra can originate from different parts of the sample preparation or even from different sample preparations. The term “mass spectrum of a microbe” or “microbe spectrum” is used hereinafter to represent the aforesaid summation of the mass spectrums.
Each genetically predetermined protein has a characteristic mass. The profile of each of the proteins represented by the microbe spectrum therefore is characteristic of the microbe species. An abundance of individual proteins, which can be measured via mass spectrometry, in the microbes are typically genetically determined because their production is controlled by other proteins. Furthermore, they only slightly depend on the nutrient medium or the degree of maturity of the colony, which is quite different from the abundance of fatty acids that do not occur in the mass spectrum. The protein profiles therefore can identify microbes much like fingerprints can identify humans.
Mass spectra collected for colonies or sediments of microbes from accurately documented strains are produced and mass spectra acquired in order to provide reference spectra for a spectral library. Repeat spectra (i.e., multiple copies of a particular mass spectra) are typically collected for each reference spectrum. The mass spectra for each microbe typically includes between around 50 to 200 separated mass signals. Many of the mass signals however are pure noise because during the search for mass signals the ion detector is set to high sensitivity. The reference spectra are therefore usually reduced to a maximum number of 70 or 100. The information content is relatively high for a mass spectrum with 50 mass signals in the mass range between 3,000 to 15,000 daltons even without accounting for intensity differences because more than 2,000 distinguishable mass signals can occur even at a reduced mass resolving power. The repeat spectra are initially combined to an average spectrum rich in signals. When a limit between, for example, 70 to 100 mass signals has been reached, the mass signals which occur once or a few times in the repeat spectra are deleted. The mass signals with very low intensities are then deleted until the desirable maximum number of mass signals remains.
The mass spectra of the microbe samples (hereinafter “sample spectra”) are typically generated in a similar way from repeat spectra and limited to a predetermined number of mass signals in order to exclude noise signals as best as possible. The number of mass signals in the sample spectra is usually selected to be slightly higher than the number of reference spectra.
The publication by Jarman et al. (Analytical Chemistry, 72(6), 2002, 1217 1223, entitled “An Algorithm for Automated Bacterial Identification Using Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry” discloses a computational method for the generation of reference spectra of a library and for the similarity analysis between a sample spectrum and the reference spectra of a library. The method is based on the reproducibility of the individual mass signals when generating the reference spectra. An individual weight factor is derived for each mass signal of each reference spectrum during a similarity analysis of a sample spectrum. The weight factor is determined by the agreement with the mass signal of the sample spectrum, the agreement between the intensities and/or from the variation between the reference signals. For example, the smaller the variation of the intensity of the mass signal, the higher the individual weight factor. Mass signals that do not reproduce well receive a low individual weight factor. The individual weight factors of the mass signals of the reference spectra are added to determine a similarity indicator for each reference spectrum for the agreement with the sample spectrum. The reference spectra of a library are sorted according to the magnitude of the similarity indicators. The sorted reference spectra provides a list of designations of the microorganisms assigned to the reference spectra, sorted according to the similarity indicators.
Reference spectra having weights derived from statistical data of the repeat spectra are commonly referred to as “reference spectra with intrinsic weight” or “intrinsically weighted reference spectra”. In contrast, reference spectra having weights derived from comparisons with other reference spectra of the spectral library or even from assessments by a microbiology specialist or technician are commonly referred to as “reference spectra with extrinsic weight”.
DE 100 38 694 A1 to W. Kallow et al. discloses a method for generating extrinsically weighted reference spectra. The weights of the individual mass signals are derived from the frequency with which mass signals occur in the other reference spectra of the library. This method increases the ability to distinguish between reference spectra for the similarity analyses. For example, a mass signal that occurs in a single reference spectrum receives a maximum weight because the mass signal can accurately identify the microbe. Where a mass signal occurs indiscriminately with the same intensity in each of the reference spectra, however, the mass signal receives a weight zero. This type of reference spectra is disadvantageous for the validation of spectrum libraries, particularly when further reference spectra are to be added to an already validated library. The whole library must then be weighted anew and validated, where the validation is performed for all of the reference spectra. In addition, identifying the subspecies can be difficult where the measuring signal that distinguishes between the subspecies of a species has a low weight due to a coincidental presence of the same mass signal in many other, hardly related reference spectra. As the complexity of a reference library increases, this type of weight of individual signals becomes less and less usable.
U.S. Published Application 2004/0234952 discloses a method for expanding the library with a distinguishing spectrum for each pair of similar reference spectra. The distinguishing spectrum distinguishes between corresponding microbes in order to increase the distinguishability of reference spectra of subspecies. The distinguishing spectra have weights for the individual mass signals which emphasize a difference between the intensity of the mass signal and the intensity of the second reference spectrum. The differentiation therefore increases the ability to differentiate between reference spectra, while mass signals which have approximately the same intensity in both reference spectra have low weights. The permanent addition of such distinguishing spectra to the library, however, typically requires the library to be re-validated. These distinguishing spectra are also extrinsically weighed reference spectra.
Simple mass spectrometric identification methods can have a high success rate, even where weights for the mass signal of the reference spectra are not used. Typically, it is advantageous to generate the spectra under standardized conditions to cultivate the colony, to prepare the sample on the sample support, and to acquire the mass spectrum in order to determine both the reference spectra and the sample spectra, while preventing variations in the technical or biological method parameters. This measure alone already leads to an improved identification. Mass value and intensity value variations and weights, for example, do not need to be stored in the reference spectra. This can decrease the size of the library and increase the speed of the similarity analysis. A method for the adjustment of the frequently slightly shifted mass scales of the repeat spectra with respect to each other has been described above. Since many mass signals occur in only some of the repeat measurements, but can nevertheless contribute to the identification, the percentage of occurrences of a mass signal above a detection threshold should be noted. This number (hereinafter “occurrence ratio”) gives the percentage of the repeat spectra in which the mass signal occurs. A mass signal therefore has three entries: averaged mass, averaged intensity, and occurrence ratio.
During the similarity analysis, each reference spectrum is examined to determine how many of the mass signals agree in each case with those of the microbe spectrum within a specified mass tolerance. A first partial measure for the similarity is determined by dividing the number of matches by the number of mass signals in the reference spectrum. A second measure is determined by dividing the number of matches by the number of mass signals in the microbe spectrum. A third partial measure can be derived from the intensity similarity between the mass signals that agree. The product of the three partial measures provides the similarity indicator. A refinement can be introduced by counting each match with the occurrence ratio of the mass signal, i.e. with a number which is possibly less than one. An extremely fast running algorithm can be developed to perform this simple similarity analysis, for example, for thousands of reference spectra in a few seconds using a typical computer server. This algorithm can (as was proposed above in the case of weighted spectra) be adjusted to a maximum similarity indicator between, for example, the measured and the reference spectra, to a maximum similarity indicator of 3.00 for identical spectra. It is even possible to transform the similarity indicators in such a way that a similarity value of 2.00 can be considered to be an adequate minimum requirement for an identification. Typically, such a minimum requirement and a corresponding maximum value have a high psychological value for the acceptance of the method.
Today, medically and legally admissible (i.e., “validated” or “certified”) libraries with reliable microbial reference mass spectra are formed in many locations, including many institutes of microbiology and also central governmental institutions for disease monitoring and prevention. For this work as well it is much simpler to acquire the spectra only under standardized conditions without variation of all method parameters.
Closely related microbes can be distinguished using the aforesaid methods where the microbial subspecies with proteins uniquely vary in terms of species and subspecies. One microbe spectrum, however, can have good matches with several reference spectra, exhibiting almost identical similarity indicators, although the reference spectra may look different, even to the human eye. The contributions of different mass signals can compensate one another in the computation of the similarity indicator, such that the subspecies, or even the species, cannot be identified. Reference spectra with similar indicators usually belong to closely related microbes at the genus, species or subspecies level.
A microbe can typically be identified by its genus or species. The microbe should be identified by its species or subspecies, however, when, for example, the species or the subspecies exhibit a substantially different pathogenicity, or need to be medically treated in a different way. In such a case, the species, subspecies of even biovarieties like serotypes need to be accurately identified.
What is needed therefore is a method for identifying microbes by their mass spectra, with which the microbes can be identified down to the level of species or subspecies even where their reference spectra exhibit almost the same indicators.