The present invention relates generally to identifying unknown compounds in chemical mixtures and, more particularly, to identifying unknown chemical structures related to one or more known structures.
Mass Spectrometry
Mass spectrometry (MS) is an analytical technique used to investigate molecular structure. It functions by first ionizing a sample under investigation, and then measuring the ion""s mass-to-charge (m/z) ratio. In this way, the molecular weight of a substance may be determined, which is generally an important piece of information because all molecules of the same structure have the same molecular weight. This determination can thereby be considered as a xe2x80x9cfingerprintxe2x80x9d for a given structure.
MS is useful in analyzing complex mixtures because it can be used to obtain a molecular weight for each component of the mixture. This approach can be used when seeking to identify the presence or absence of a known material in the presence of many other materials. Common examples are the identification of drugs of abuse in blood and urine, and the identification of dioxins and other priority pollutants in soil samples.
If a complex mixture is admitted into a mass spectrometer all together, the resulting mass spectrum is an aggregate spectrum of all the species present. This makes it difficult to identify individual species. To solve this problem, complex mixtures analysis is usually performed by interfacing some form of chromatographic separation technique with the mass spectrometer. The purpose of this is to provide some level of separation for the mixture prior to the components entering the mass spectrometer. In ideal circumstances, it is then possible to obtain a mass spectrum of each individual component as it enters the mass spectrometer. In these cases, the retention time of each component on the chromatograph affords an additional level of identification.
Use of Mass Spectrometry in Drug Discovery
Drug discovery involves identifying new molecules that have potential use as drugs and other therapeutic agents. The new molecules can be discovered by examining the interaction of target proteins in solution with mixtures of organic molecules. The mixtures are known as xe2x80x98librariesxe2x80x99 of molecules, from which new species of interest can be identified. The libraries can be designed and synthesized according to a variety of rules and constraints that make them suitable for drug discovery. Molecules that bind to a target protein are known as ligands. Ligands have the potential to act as therapeutic agents.
Ligands can be discovered using mass spectrometry. One criterion of library design is that the masses of the component library members are calculated and known beforehand. When a ligand is discovered, its mass serves to identify the component and therefore the molecular structure responsible for binding.
Construction of Chemical Libraries
Libraries comprise mixtures of molecules made up of a core structure coupled with a selected number of different structural motifs known as xe2x80x98building blocksxe2x80x99 (BBs). FIG. 1 is a simplified schematic diagram illustrating a core structure 10 and three BBs 12, 14, 16 coupled to the core.
The number of BBs coupled to a given structural core is usually in the range from two to four, though many more such as, e.g., 15 different BBs may be coupled in any given library. A library should contain every combination and permutation of BBs possible. Also, the chemistry is performed such that the library is intended to contain only molecules that are members of the library, and no other significant molecular species. In other words, a library is intended to be a set of desired compounds, which is designated hereafter as the xe2x80x9cdesired setxe2x80x9d (DS), whose individual compositions and masses may be calculated beforehand.
In practice, however, empirical measurements show that the chemistry involved in the synthesis of libraries does not always proceed in an expected fashion. Sometimes, certain members of the DS are not generated. Furthermore, in addition to the DS, many libraries contain molecular species that are not intended members of the library, designated hereafter as the xe2x80x9cundesired setxe2x80x9d (US). An US can arise as a result of unexpected deviations in the behavior of the synthetic chemistry. Since these deviations might not be anticipated, the size and composition of this set is unpredictable. However, since all of the reagents used in the synthesis were originally designed to become library members, most, if not all, members of the US will be structurally related to some member or members of the DS.
The presence of the US has two consequences of particular importance at the practical level. First, presence of the US has the effect of reducing the number of molecules belonging to the DS. Without knowing the magnitude of this effect, it is impossible to control the quantity of library exposed to the target for screening purposes. Therefore one advantage of identifying members of the US is that it allows a measure of synthesis quality control that cannot be otherwise obtained. A further advantage is that the output it provides can be used to assign a score, or rating, to library quality and thereby automatically eliminate its use in the screening process when the library quality is deemed too low.
Second, members of the US can sometimes themselves bind to the target protein. This manifests itself as follows. Binding is observed between the target and a sample member whose mass corresponds to that of a library member (a member of the DS). However, further structural analysis reveals that the ligand in fact has a structure that does not correspond to any member of the DS having the ligand""s mass. Thus, the experiment has revealed a binding molecule, which was the desired objective of the experiment, but that binding molecule does not have a structure consistent with any member of the DS. A need now exists to identify the structure of the binding molecule to allow an analyst to gain insights into possible structures of ligands that are not members of the DS, but whose structures are related to those of the DS.
Briefly, the present invention is directed to a method and apparatus for identifying unknown chemical structures that are related to one or more known structures. In accordance with the invention, measured masses of the unknown structures (US members) are compared with the expected masses of the known structures (DS members). The mass differences between US members and DS members are compared with changes in mass (xcex94M) between molecules analyzed as a function dependent upon structure (xcex94S). The mass difference is considered as a differential function, xcex94M/xcex94S, the result of which is calculated for each ion in the US when compared with each ion in the DS. These values are then correlated with values that would be expected based on assumed possible structural modifications. In addition, higher order differential functions, such as xcex942M/xcex94S2, can be used as needed to allow for correlation of structures in which more than one structural motif has changed.
These and other features of the present invention will become readily apparent from the following detailed description wherein embodiments of the invention are shown and described by way of illustration of the best mode of the invention. As will be realized, the invention is capable of other and different embodiments and its several details may be capable of modifications in various respects, all without departing from the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not in a restrictive or limiting sense with the scope of the application being indicated in the claims.