The present invention relates to a non-invasive and non-destructive method for screening irregular or inhomogeneous samples and, in particular, for screening encapsulated drugs and tablets for contaminants and imperfections using spectral analysis and a nonparametric clustering algorithm.
The well-publicized and increasing number of cases of adulteration of non-prescription capsules has highlighted the need for rapid, non-invasive, and non-destructive methods for screening over-the-counter drugs. Near-infrared diffuse reflectance spectrometry is a fast analytical method that typically uses the reflectance of a sample at several wavelengths to determine the sample's composition. The technique is heuristic in its approach and makes extensive use of computers. Through a computational modeling process, near-infrared reflectance analysis is able to correct automatically for background and sample-matrix interferences, making ordinarily difficult analyses seem routine.
A model or calibration equation is typically a linear combination of equations of the form: ##EQU1## where A is a sample component of interest, d is the number of wavelengths at which measurements are obtained, the R.sub.i are the sums of the sample-component signals observed at each of i wavelengths, and c.sub.i are weighting coefficients often determined by multiple linear regression. It will be appreciated that although the present application is framed in terms of near-infrared reflectance measurements, any observable, such as mass, density, magnetic behavior, radioactivity, etc., or other information may be considered a "wavelength" for use in a calibration equation.
The modeling process employs a "training set" of samples to "teach" the computer algorithm to recognize relationships between minute spectral features and the sample's composition. Of course, the training set must have been previously analyzed by some other reliable (reference) chemical procedure. Although assembling a training set and developing a new calibration can require considerable time, the speed of subsequent analysis has provided plenty of impetus for the growth of near-IR reflectance methods.
Quantitative analysis has been the principal application of near-IR reflectance analysis to date, but it can be used as a qualitative technique as well. Near-IR reflectance analysis can differentiate among a variety of pure compounds and mixtures of constant composition to solve the false-sample detection problem. A false sample is simply any sample that falls outside of the domain of the samples used to train the analysis algorithm. For example, a manufacturer might use near-IR reflectance analysis to monitor a liquid stream having a normal range of protein concentration of 3 to 6%. Training samples would be selected to completely cover this range. If a process change or equipment failure should cause the protein concentration to jump to 10%, a false-sample situation would exist. Analyzing this false sample requires extrapolating beyond the range of the training set used to generate the prediction equation. An operator should be signaled either to stop the stream an correct the equipment failure, or to recalibrate the near-IR reflectance analysis instrument to accept the new range of concentration values. This type of false-sample condition is easily detected, however, by a simple test to determine if the predicted value falls outside of the range of concentrations used in generating the prediction equation.
Another type of false-sample condition, which is more difficult to detect, arises when a new component, a component not present in the training set and therefore thoroughly unexpected, appears in the samples causing erroneous composition values to be generated. The new component could be a chemical entity, as might be introduced by opening a valve at the wrong time or by contamination of the raw materials, or a noise source, such as instrument drift over time or a change in particle-size distribution.
Detecting false samples involves the analysis of multivariate data distributions, a topic which is currently being investigated in a number of ways. Quantile analysis is a useful basis for nonparametric tests of distributional assumptions because it provides easy access to both numerical statistics and readily interpreted graphs. Quantile analysis transforms the cumulative frequency distribution of a data set into a convenient linear form from which the location, scale, and skew of data sets can be estimated. Quantile analysis provides additional advantages that are particularly useful for analyzing multivariate data as set forth in co-pending U.S. patent application Ser. No. 07/359,084, filed May 30, 1989 for Method for Detecting Subclusters in Spectral Analysis.
FIG. 1 shows two thousand simulated reflectance data points at two wavelengths for two hypothetical compounds A and B. It will be understood that each wavelength in a spectrum can be represented as a spatial dimension, giving a single point in a d-dimensional space (a hyperspace) for a spectrum recorded at d wavelengths. Thus, a 2-dimensional hyperspace is shown in FIG. 1 in general, the more dimensions provided, the more discriminating the calibration equation. The hyperspatial point is translated from the origin by amounts that correspond to the magnitude of the reflectance observed at each wavelength. By representing spectra in this manner, a group of similar samples with similar spectra appears as a cluster of points at a location in hyperspace As set forth in U.S. Pat. No. 4,893,253, a univariate distribution can be formed from the points that lie within a specified radius of a line in the hyperspace (a hyperline) such as the hyperline connecting the centers of clusters A and B, i.e., the points within the bar shown in FIG. 1.
Confidence limits are an expression of the surface of a cluster in hyperspace, i.e.. distances from the cluster center. Thus, a sample point is a member of a cluster with a given confidence or probability if the distance between the cluster center and the sample point is less than or equal to the confidence limit. The typical confidence limits that express a surface which is symmetrical through the cluster center or mean, such as a spheroid, reflect an underlying assumption of symmetry of the cluster. Such limits fail to identify accurately samples from clusters which are asymmetric.