Herbal Medicines (HMs) have been widely used for disease prevention and treatment over many centuries in Asian areas and, even with advance of modern western medicine, become more and more popular throughout the world. However, due to the fact that in HM herbs there may be hundreds of components for which knowledge is available, it is almost impossible to identify all these substances and to carry out useful quantitative analysis. Therefore, chromatographic fingerprint methods have been highly recommended for quality control purposes by many official authorities [1-4].
When the samples are analyzed by the hyphenated chromatography such as HPLC-DAD (High Performance Liquid Chromatography Diode Array Detector), HPLC-IR (High Performance Liquid Chromatography infrared spectroscopy), Capillary Electrophoresis-Diode Array Detection (CE-DAD), and High Performance Liquid Chromatography Nuclear Magnetic Resonance Spectroscopy (HPLC-NMR), the measured data sets usually can be collected as a two dimensional data matrix, expressed as X (m×n), where the m rows are spectra taken at regular time intervals and the n columns represent chromatograms measured at consecutive wavelengths. The two dimensional data matrix, referred to as three dimensional chromatograms in this application, can reveal qualitative and quantitative information of the samples under study, and it can be utilized to characterize and identify the HM.
However, there are problems that may hamper the use of these three dimensional chromatograms to set up the fingerprint of HM. For example, the HM samples are very complex chemical systems and their data size is usually very large. This means that more storage space is needed and also longer computation time is required in data processing. For instance, 2,862 million data points are obtained for a sample run of 90 minutes by using the instrument Agilent HPLC-DAD 1100. Moreover, baseline drifting and retention time shift are major problems when using these chromatograms for quality control. Therefore, pretreatment of the raw data seems to be an important step in extracting and obtaining useful information [3].
Recently, wavelet transform (WT)[8-11] has been applied in many diverse fields of science, and is now becoming of interest in analytical chemistry. The essence of the WT is that it decomposes a signal into localized contributions labeled by a scale and a position parameter. For the functional data or smooth 3D chromatograms, each of the contributions represents the information of different frequency contained in the original signal. That is to say, the noise, the signal and the drifting baseline are usually considered to be present in the contributions in high, medium, and low frequency band, respectively. Moreover, after WT the total data length and the storage space will be reduced greatly if only the signal information, i.e., the medium band is selected for signal reconstruction after WT treatment. WT treatment results in a set of coefficients. The set of coefficients is much smaller in size yet contains sufficient useful information present in the original data set if an appropriate level of WT treatment is being used.
Conventionally, HM fingerprint analysis (for authentication or quality control) is based on raw data directly obtained from the measuring instrument. The approach of directly using raw data for fingerprint authentication does not make good use of the information-rich measurement data sets. The data sets measured by, for example, HPLC-DAD, are not only univariate or multivariate observations, but also functions observed continuously. In other words, they are smooth curves along the time line. Such special characters of the data, if being handled efficiently, will certainly improve the predictive accuracy [5-7]. There is therefore a need for a better way of performing HM fingerprint analysis.