Combinatorial chemistry is being increasingly used in the formation of new compounds. Numerous different compounds may be formed simultaneously, and what used to take days, or weeks, may now be accomplished in minutes or hours. Along with the rapid synthesis of new compounds, however, comes the task of identifying the large volume of newly synthesized compounds. For many years now, the X-ray powder diffraction analytical technique has been a favorite among chemists for identifying the structure of new compounds. However, the overall identification process may be time consuming, with each X-ray powder diffraction pattern being compared to a large number of known patterns in a library. Pattern recognition or "search and match" computer programs such as Jade 5.0, available from Materials Data Inc., have helped to more efficiently compare an unknown sample X-ray diffraction pattern to those in a library of known patterns, but the sheer volume of X-ray diffraction patterns being generated in a combinatorial chemistry application is likely to overwhelm the standard historical procedure.
This application focuses on more efficiently managing a large number of X-ray powder diffraction patterns through the use of the statistical tool of principal component analysis. Using principal component analysis allows for each X-ray powder diffraction pattern to be reduced to a set of scores which can be plotted on a 2- or more dimensional plot. A great deal of information is readily apparent to a chemist versed in the analysis of X-ray powder diffraction through inspection of the resulting plot. For example, X-ray powder diffraction patterns that are highly likely to correspond to the same compound or structure can be identified by the proximity of their scores in a cluster, thereby reducing the overall number of X-ray powder diffraction patterns that must be interpreted by comparison to libraries of known X-ray powder diffraction patterns using, for example, search and match-type software programs. Inspection of the scores plot may also indicate outliers corresponding to X-ray powder diffraction patterns that exhibit unusual characteristics as compared to the overall set of samples. A chemist may then focus attention on the X-ray powder diffraction patterns most likely to be a desired new compound without spending resources on samples represented by clusters of scores that are likely to be multiple samples of the same structure. The plot may thus reveal that of the multiple X-ray powder diffraction patterns, only a few should be investigated further. The time and labor savings to a chemist may be enormous.
Principal component analysis has been applied to other analytical data such as near infrared spectroscopy; see U.S. Pat. No. 5,862,060, for process control applications. Principal component analysis has also been used to determine the concentration of controlled substances such as heroin and cocaine when present in a mixture with other known compounds; see, Minami, Y.; Miyazawa, T.; Nakajima, K.; Hida, H.; X-sen Bunseki no Shinpo, 27 (1996) 107-115, and Mitsui, T.; Okuyama, S.; Fujimura, Y. Analytical Sciences, 7 (1991) 941-945. Haju, M. E.; Minkkinen, P.; Valkonen, J.; Chemometrics and intelligent Laboratory Systems, 23 (1994) 341-350 disclosed explaining and predicting ammonium nitrate solid phase transition paths between IV, III, and 11 on the basis of X-ray powder diffraction patterns and differential scanning calorimetry data by applying partial least squares regression and principal component analysis. The present invention, however, uses principal component analysis in conjunction with multiple X-ray powder diffraction patterns to gain a great amount of information on potentially widely varied samples. That is to say, the present invention is intended to be a; discovery method applied to a very large number of samples where any number of known and unknown materials may be present within the sample set. It therefore differs from the prior art which was limited to the case where all the materials present in the sample set were known a priori, and, moreover, the number of possible materials present was very limited.