Near infrared spectroscopy (or Raman spectroscopy) has been widely used in pharmaceutical development as quality and process control methods. Two types of methods are generally used, i.e. qualitative and quantitative methods. Qualitative methods are used for identification of drug substances, excipients and other raw materials as well as final products whereas quantitative methods are for determination of drug concentration, moisture content and other product attributes. Both types of methods require multivariate calibration models.
To develop NIR/Raman methods for determining drug content (content uniformity or CU) in tablets, a quantitative relationship between the NIR/Raman spectra and drug concentration is established by multivariate modeling. A common approach is by means of partial least squares (PLS) regression. It is well known that the total variance in the NIR/Raman data set is attributed not only to drug concentration variation but also to variations of excipient concentration, moisture content, tablet density and others. In addition, Trygg et al. have pointed out in U.S. Pat. No. 6,853,923 that “For solid samples, this systematic variation is due to, among others, light scattering, and differences in spectroscopic path length, and may often constitute the major part of the variation of the sample spectra”. Furthermore, “the variation in X (matrix of spectral data) that is unrelated to Y (matrix of drug concentration) may disturb the multivariate modeling and cause imprecise predictions for new samples and also affect the robustness of the model over time”. To address this issue, Trygg et al. proposed a so called Orthogonal Partial Least Squares (QPLS) method to remove the systematic variation from X through orthogonalizing the X matrix and removing the irrelevant variances. Advantage of the QPLS method compared with the other pretreatment methods is that it keeps the Y-relevant variances intact.
Hazen et al. proposed another different variant of multivariate modeling in U.S. Pat. No. 6,871,169, which was called Combinative Multivariate Calibration (CMC). In their method, the analytical signals were separated into different wavelength or spectral regions. Then each region was modeled independently using different number of factors. This approach “allows for each wavelength or spectral region to be modeled with just enough factors to fully model the analytical signal without the incorporation in the model of noise by using excess factors”. The data pretreated by CMC can be used for further partial least squares regression or principal component regression (PCR).
Principal component analysis (PCA) is widely used as an unsupervised and exploratory algorithm for multivariate data analysis. PCA does not make assumptions about an underlying causal model. It is simply a variable reduction algorithm that utilizes a relatively small number of latent variables 6 (also called principal components or PCs) to represent most of the variances in a set of observed variables. On the other hand, PCA does assume linearity in analysis of spectral data. It also ranks the latent variables, which are orthogonal to each other, based on the amount of variances that they describe. These characteristics imply that conventional PCA is not suitable for determining active drug content in pharmaceutical tablets because of the following: (a) the variance that is related to drug concentration may be masked by noise; (b) the variance that is related to drug concentration may not have a high enough ranking; (c) the relationship between latent variables, wherein at least one latent variable is related to the drug concentration, may not be genuinely orthogonal; (c) the ranking of the latent variable that is related to the drug concentration may not be consistent among different data sets.