Pursuant to 37 C.F.R. 1.71(e), Applicants note that a portion of this disclosure contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The field of this invention relates to mass spectrometry and the analysis of data obtained by mass spectrometry, e.g., including analysis of combinatorial libraries by mass spectrometry.
2. Description of the Related Art
Mass spectrometry is one of the most universal and sensitive analytical methods for compound characterization. Every chemical compound has a molecular mass and, typically, only a minute quantity of analyte is necessary to obtain a mass measurement. These features make mass spectrometry a method of choice for analyzing the products of combinatorial synthetic chemistry.
In comparison with information-rich methods, such as nuclear magnetic resonance (NMR), mass spectrometry measurements have low information content. Many compounds with different chemical structures may have the same molecular weight or the same mass-to-charge ratio. This limits the utility of mass spectrometry for structural identification, especially in combinatorial chemistry where large numbers of compounds are frequently synthesized using a common scaffold and sets of related reagents to produce synthetic products with a narrow distribution of molecular masses. Most often, mass spectrometry is used to confirm an expected chemical structure, or is used in combination with other information to help identify the structure of unknown compounds.
The information content of mass spectrometry may be increased through the use of multistage mass spectrometry (i.e., mass measurement of ions obtained by fragmenting molecules of interest) or accurate mass measurements. However, interpreting data obtained by multistage mass spectrometry requires knowledge of fragmentation pathways for the molecules being analyzed, which is usually not available a priori. Application of accurate mass measurements is limited by the existence of many compounds with the same elemental composition, in which case, increasing the accuracy of measurement does not provide additional information. The expense and sophistication of the hardware required for both multistage and accurate mass measurement as well as the limited capacity of these instruments for high throughput measurement further restrict their application in the analysis of the products of combinatorial chemistry.
Thus, there is a substantial need for new methods and related systems that provide more efficient use of mass spectrometry for structural identification, especially for combinatorial chemistry-related applications.
The present invention includes a method of identifying predicted or actual structures of two or more members of a chemical or physical library. In preferred embodiments, the method is completely or partially computer implemented. The method includes (a) providing a logical matrix that includes virtual masses of members of a complex library (e.g., a combinatorial chemical library or the like) produced by chemical or physical transformations of an initial set of chemical or physical members (e.g., an initial set of building blocks or the like) in which at least one group of the virtual masses includes complex library members having a shared chemical history. The method also includes (b) correlating molecular mass measurements (e.g., mass spectrometric measurements) of two or more chemical or physical library members (e.g., members of a combinatorial synthetic library) having a shared chemical history with two or more virtual masses in the logical matrix to identify one or more groups of virtual masses that most likely describe chemical or physical transformations undergone by the two or more chemical or physical library members. In certain embodiments, the one or more groups of virtual masses describe (i.e., definitively) the chemical or physical transformations undergone by the two or more chemical or physical library members in (b). Additionally, the correlations in (b) generally account for one or more mass defects of reaction. Finally, the method includes (c) identifying the predicted or actual structures of the two or more chemical or physical library members within the one or more identified groups based one the molecular mass measurements.
In some embodiments, (a) includes calculating individual masses for each member of the logical matrix by separately summing masses for each member of the initial set of chemical or physical members with each mass in a set of expected mass changes. In certain embodiments, the set of expected mass changes includes a set of virtual mass changes calculated by separately subtracting masses for each member of the initial set of chemical or physical members from each mass in the set of chemical or physical library members. Each calculated individual mass is assigned to one of m groups, m corresponding to a total number of individual mass changes in the set of expected mass changes. Furthermore, each of the m groups includes n members, n corresponding to a total number of members in the initial set of chemical or physical members.
Optionally, (b) includes (i) matching a selected mass from the set of chemical or physical library members with all identical calculated masses and excluding any of the m groups lacking a member n comprising a mass identical to the selected mass from further consideration to reduce a number of m groups available for subsequent consideration. Thereafter, the method typically includes (ii) repeating (i) at least once, in which each repeated (i) includes matching a different selected mass from the set of chemical or physical library members with all the identical calculated masses that remain in the reduced number of m groups from an immediately preceding (i) and excluding any of the reduced number of m groups lacking an n member with a mass identical to the different selected mass from further consideration to further reduce the number of m groups available for subsequent consideration. This method leads to (1) identifying a single m group which indicates that matched masses from the set of chemical or physical library members have a shared chemical history, (2) identifying more than one m group for further consideration which indicates that insufficient data exists for an unambiguous determination of whether masses selected from the set of chemical or physical library members have a shared chemical history, or (3) identifying no m group for further consideration which indicates that masses selected from the set of chemical or physical library members originate from materials lacking a shared chemical history.
In certain embodiments, the method further includes assigning each of the m groups a P variable in which each P variable is typically initially zero. In these embodiments, (b) includes (i) matching a selected mass from the set of chemical or physical library members with identical masses in each of the m groups in which the P variable for an m group is increased by one when the selected mass matches at least one value therein. Thereafter, these embodiments include (ii) repeating (i) for each remaining value in the set of chemical or physical library members, and (iii) determining which one or more m groups have highest P variables to identify one or more mass changes from the set of expected mass changes best fitting the set of chemical or physical library members. It also identifies all paired values in the initial set of chemical or physical members and the set of chemical or physical library members originating from materials with a shared chemical history.
In certain aspects of the invention, (a) includes solving a simultaneous system of equations to provide one or more values in the logical matrix. For example, solving the simultaneous system of equations optionally includes solving for one or more masses of one or more members of the initial set of chemical or physical members. Optionally, solving the simultaneous system of equations includes solving for one or more of: at least one mass of at least one member of the set of chemical or physical library members, at least one mass of at least one of the initial set of chemical or physical members, or at least one member of a set of expected mass changes.
In one embodiment, (b) includes (i) determining the molecular mass measurements for each of x members of a set of chemical or physical library members, wherein x is at least two, and wherein each x member is derived from one member of the initial set of chemical or physical members and comprises a shared chemical history with all other x members. This embodiment also includes (ii) subtracting a cumulative total mass of all members of the initial set of chemical or physical members from a cumulative total mass of all x members of the set of chemical or physical library members to determine a cumulative total mass change for the set of chemical or physical library members and (iii) dividing the cumulative total mass change by x to thereby determine a mass change for each of the x members of the set of chemical or physical library members. In addition, this embodiment includes (iv) subtracting the mass change of (iii) from each of the molecular mass measurements of (i) to identify each member in the initial set of chemical or physical members corresponding to each individual x member of the set of chemical or physical library members.
The present invention also relates to a system for identifying predicted or actual structures for two or more members of a chemical or physical library. The system includes (a) at least one computer that includes a database having a logical matrix, or data structure including virtual masses of members of a complex library produced by chemical or physical transformations of an initial set of chemical or physical members in which at least one group of the virtual masses comprises complex library members having a shared chemical history. The system also includes (b) system software that includes one or more logic instructions for (i) correlating molecular mass measurements of two or more chemical or physical library members having a shared chemical history with two or more virtual masses in the logical matrix to identify one or more groups of virtual masses that most likely describe chemical or physical transformations undergone by the two or more chemical or physical library members. The correlations in (i) generally account for one or more mass defects of reaction. The system software also includes one or more logic instructions for (ii) identifying the predicted or actual structures of the two or more chemical or physical library members within the one or more identified groups based on the molecular mass measurements. In some embodiments, the one or more groups of virtual masses describe (i.e., definitively) the chemical or physical transformations undergone by the two or more chemical or physical library members in (b).
The system typically further includes a mass spectrometer operably connected to the at least one computer which provides the molecular mass measurements to be correlated. In addition, the system generally includes a handling system (e.g., a solid support handler, such as a bead handler, a bead container handler, a reagent handler, or the like) operably connected to the at least one computer, which handling system directs translocation and synthesis of the chemical or physical library members. The handling system generally includes at least one robotic armature.
The present invention additionally provides a computer program product that includes a computer readable medium having one or more logic instructions for (a) correlating molecular mass measurements of two or more chemical or physical library members having a shared chemical history with two or more virtual masses in a logical matrix or data structure to identify one or more groups of virtual masses that most likely describe chemical or physical transformations undergone by the two or more chemical or physical library members. The computer program product also includes one or more logic instructions for (b) identifying predicted or actual structures of the two or more chemical or physical library members within the one or more identified groups based one the molecular mass measurements. The correlations optionally account for or determine one or more mass defects of reaction. The computer readable medium optionally includes one or more of: a CD-ROM, a floppy disk, a tape, a flash memory device or component, a system memory device or component, a hard drive, a data signal embodied in a carrier wave, or the like.
The invention can also be embodied in kits, e.g., including any of the system elements for performing any of the methods described herein, and optionally, further including containers for holding any of the relevant system elements, packaging materials, instructional materials for practicing the methods, or the like.