The present invention, in some embodiments thereof, relates to quantitative assessment of hydrocarbon contamination in soil using reflectance spectroscopy and more particularly but not exclusively to the quantitative assessment of hydrocarbon contamination using near infra-red spectral assessment and a modeling approach such as artificial neural networks, fuzzy logic, partial least squares, support vector machine, and metric learning.
The term “hydrocarbon contamination” is intended to include all kinds of artificial organic pollutants in soil that can be identified by reflectance spectroscopy.
Petroleum hydrocarbons are contaminants of great significance. The commonly used analytic method for assessing Total Petroleum Hydrocarbons (TPH) in soil samples, is based on extraction with 1,1,2-Trichlorotrifluoroethane (Freon 113), a substance prohibited for use by the EPA.
During the past twenty years, a new quantitative methodology has been widely developed that utilizes the reflected radiation of solids. By using this approach, the reflectance radiation across the VNIR-SWIR region (400-2500 nm) is modeled against constituents determined by traditional analytic chemistry methods and then used to predict unknown samples. This technology is environmental friendly and permits rapid and cost-effective measurements of large number of samples. Thus, this method dramatically reduces chemical analytical costs and secondary pollution, enabling a new dimension of environmental monitoring.
With production rates of 13.42 million cubic meters of crude oil per day (Energy Information Administration, 2009), petroleum hydrocarbons (PHC) potential as soil and water contaminants is apparent and of particular significance. PHC are well known to be neurotoxic to humans and animals. PHC were found to affect brain activity and development as well as to cause nausea, disorientation, mental confusion, speech slurring and memory disorders. Exposure to higher levels can cause extreme debilitation, loss of balance, and may even lead to coma, seizures and lethality. Long term exposure is proven to cause changes in neurophysiological or psychological capacity and is further known to induce increased risk of lung, skin and bladder cancer alongside other carcinogenic effects (Hutcheson et al., 1996; Boffetta et al., 1997; Ritchie et al., 2001). For both the diagnosis of suspected areas and the possibility of controlling the rehabilitation process, there is a need to develop and implement a method to rapidly detect and assess PHC in soils.
Due to the complex nature and structure of PHC ingredients, a general measurement index “Total PHC” (TPH) was defined and is the common measurement index for quantifying environmental contamination originated by PHC. The TPH level is determined by the ratio of IR absorption measured per sample extraction, relative to the IR absorption of the EPA standard consisting of 31.5% isooctane, 35% hexadecane and 33.5% chlorobenzene.
The common method for assessing TPH in soil samples is based on the no longer approved Environment Protection Agency (EPA) method 418.1. The EPA withdrew this method due to the use of Freon 113, an ozone depleting material. Nevertheless, this method is still commonly used worldwide, in some countries (i.e. Israel), this method is the only method used for site investigation. The method was developed originally to assess TPH in waste water but was later adjusted in order to assess TPH in soil samples. Not only was this method withdrawn by the EPA, but it is also problematic for various other reasons such as the need for skilled operators, the process length and cost, the difficulties in using it in situ, availability of the extracting solvent being very limited, the need for transporting samples to the laboratory etc.
The spectral properties of hydrocarbons were identified in the late 1980's, although it was argued that these properties are only visible at concentrations of 4% wt. and above (Cloutis, 1989). In the mid 1990's a NIR reflectance sensor was developed as a proof of concept for the detection of organic matter in soil, based on the spectral properties identified by Cloutis (ibid). The sensor was designed for the detection of Benzene in soil at a minimal concentration of 4.4% wt, several configurations were tested and minimal information is provided (Schneider et al., 1995). Soon after, the U.S. Department of Energy contracted a private company to investigate the application of reflectance spectroscopy as a tool to determine motor oil contamination in sandy loam. A schematic design for a field instrument was suggested, but only one type of PHC contaminant and one type of soil were tested. In addition, a small number of samples were used at a very limited contamination range (Stallard et al., 1996).
A more inclusive study was conducted shortly after using three types of soil contaminated in the laboratory with diesel and gasoline. 0.1% wt. and 0.5% wt. minimum detection limits were achieved respectively (Zwanziger and Heidrun, 1998). The first study utilizing field collected samples, was not able to produce robust models but rather led to very low correlations (r=0.68) and large errors, probably due to the limited number of samples and problems with the analytic chemistry measurements done by the laboratory that produced inconsistent measurements (Malley et al., 1999). Attempts at mapping hydrocarbons using the Landsat and Daedalus sensors in 1994 and 1995 failed, probably due to the limited spectral resolution of the sensors (multispectral sensors, 7 and 12 bands respectively) (Kühn and Hörig, 1995; Hörig et al., 2001). Nevertheless, a later study, utilizing the higher spatial and spectral resolutions as well as the very high signal to noise ratio of the HyMap HSR airborne scanner (128 bands) (Cocks et al., 1998), yielded a successful identification of hydrocarbons and oil contaminated soils but for high consecrations only (2.5% wt) (Hörig et al., 2001). Based on the HyMap mission, a Hydrocarbon Index was developed for mapping hydrocarbon bearing materials. This index is limited to very high signal to noise ratio sensors as well as other issues, such as problems with land cover, vegetation and high concentration detection levels (Kühn et al., 2004).
The most comprehensive work on reflectance properties of hydrocarbons was conducted by Winkelmann (2005): several types of hydrocarbons were mixed with several types of soil under laboratory conditions. They were measured spectrally and an attempt was made to separate them into hydrocarbon groups using the reflectance spectra; hyperspectral airborne remote sensing was also applied to identifying hydrocarbon contamination. No quantitative models were tested, although this was mentioned as an avenue of further study (Winkelmann, 2005). A recent study by Chakraborty et al. (2010) on the prediction accuracy of VNIR-SWIR reflectance spectroscopy of petroleum contaminated soil, showed fair validation results (R2=0.64). The study included 46 field collected samples that were preprocessed and modeled by several techniques.
Chakraborty et al. continued collecting field samples, and applied the statistical approach of the previous study. By using kriging, they produced TPH distribution maps of the contaminated site that match well with the topography of the study site. Sorak et al. started exploring the possibility of using a hand held Phazir portable spectrometer for TPH determination. They started by preparing several artificially contaminated samples in the laboratory with diesel and oil and creating Near Infrared Analysis (NIRA) models.
While the above mentioned studies addressed concentration levels of 0.1% wt and above, nowadays environmental regulations require precision levels of an order of magnitude lower. A comprehensive research including several types of PHC at a wide concentration range is needed, especially at very low concentrations.
During the past twenty years, a new quantitative methodology named NIRA (Near Infrared Analysis) or NIRS (Near Infrared Spectroscopy) has been widely developed (Williams and Norris, 1987). This approach was adopted 40 years ago from a strategy developed in the food science discipline (Ben-Gera and Norris, 1968a; b), whereas today it is widely utilized in many industrial and scientific applications. By using the NIRA approach the reflected radiation across the VIS-NIR-SWIR region (400-2500 nm) is modeled against constituents determined by traditional chemical analysis. The constructed model is then used to assess unknown samples. Visible light has also been used.
In order to remove any irrelevant information, which cannot be handled properly by the modeling techniques, spectral preprocessing techniques are used. The preprocessing techniques include averaging, centering, smoothing, standardization, normalization and transformations, among others.
Introduced in 1983 by Wold et al., partial least squares regression (PLS) is similar to principal component regression (PCR), but in PLS the principle components or latent variables (PCs, LVs) are constructed such that they include the chemical reference (Y variables, dependent data) in the calculation process. This technique orders the PCs according to their relevance for predicting the dependent variables, rather than to their description of the most variance of the spectral data. This method excels when the dependent data (X variables) express common information, as usually happens in spectral data. The required number of PCs is typically smaller than that in a PCR calibration model for similar model performance (Wold et al. 1983; Esbensen et al. 2002; Nicolaï et al. 2007). As the PLS process is based on LVs, using the optimal number of LVs (nLV) is crucial. On one hand including as much data as possible will improve performance, but on the other hand only the first LVs represent the relevant data whereas the rest are noise (Esbensen et al. 1994). Keeping the model as simple as possible by using the minimum number of LVs is very important to prevent over fitting, but it is also critical to include all the LVs that contain the data relevant for the modeling process in question. In short, the optimal nLV should be selected for representing the property in question and not the noise.
Modeling of spectroscopy data refers to relating a set of spectral parameters that are derived from the spectral information (before or after the aforementioned preprocessing treatment), to the chemical or physical properties of the material in question by using a set of well-known samples. The data are divided into three groups: training, validation and test. The relationship between the property in question and the spectroscopy data is found via the training group and simultaneously cross-validated by the validation group. Finally, the model is applied to the test group, independently of the training and validation processes. Division of the data into the training, validation and test groups is done by using a well known algorithm (Minasny and McBratney, 2006) that takes into account the reference values distribution in order to create the training, validation and test groups in a way that would best represent the entire dataset.
Reflectance spectroscopy permits environmental friendly, rapid and cost-effective measurements of many samples and therefore functions as a substitute for the costly and time consuming chemical analysis. Due to the numerous combinations of preprocessing techniques as well as dataset divisions there is a lack of effective tools to allow reflectance spectroscopy methods to be used effectively in situ and so today it is not possible to provide an automated and optimized NIRA modeling system for hydrocarbon contamination analysis in soils in a way which is rapid, accurate, and cost effective, solely from reflectance spectroscopy.