Biomass burning is a major source of greenhouse gas emissions and often has a significant footprint on flora, fauna and air quality. Thus, accurate and cost-effective fire mapping techniques are necessary for understanding the frequency and distribution of forest fires. While monitoring fires in near-real time is critical for operational fire management, mapping historical fires (i.e. burned areas) is also important for a number of reasons, such as climate change studies (e.g., studying the relationship between rising temperatures and frequency of fires), and carbon cycle studies (e.g., quantifying how much CO2 is emitted by fires is critical for emissions reduction efforts such as UN-REDD). There are two primary approaches for mapping large-scale burned areas: (1) field-based surveys combined with aerial observations, which allow extremely detailed burned area mapping, but are limited in their spatial extent and temporal frequency because of their high cost, and (2) satellite remote sensing-based techniques, such as those from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instrument, which offer the most cost-effective data for mapping burned areas. MODIS data sets are available freely with regular, global wall-to-wall coverage and offer interesting opportunities to develop novel spatio-temporal data mining algorithms for classification (event detection) that produce reliable and timely burned area products.
Broadly there are two paradigms to mine useful information from large data sets-supervised learning and unsupervised learning. Supervised learning approaches are widely used for land classification from remote sensing data. Supervised approaches use labeled training data samples to train classification models such as decision trees, random forests, logistic regression or support vector machines on multi-spectral or hyper-spectral remotely sensed imagery data. However, there are several data-centric challenges in using supervised learning approaches for the task of burned area mapping. Fires are a rare event and therefore collecting sufficient labeled training data requires significant effort. Moreover, the multi-spectral data for the burned locations is distributed differently in different seasons, geographical locations and land cover classes. Due to the seasonal, geographical and land cover heterogeneity in data, classifiers trained using training data samples obtained from a particular season/geography/land cover show poor classification accuracy when used to classify pixels of a different season/geography/landcover. Training separate classifiers for each combination of season, geography and land cover class would explode the number of training samples needed thereby making supervised approaches infeasible for global scale burned area mapping. In contrast to supervised learning approaches, unsupervised learning approaches do not use labeled examples—instead they exploit prior biases about the form of input data and expected output.
The bulk of work done in the past in land classification includes pixel-based approaches that use spectral features of each pixel to classify it to a surface cover class such as water, forest, grass and burnt, for example. These approaches ignore the spatial context during the classification process.
As an illustrative example, FIG. 1 shows a spatial region where each pixel is assigned a score between 1 and 5 under the prior art. Higher score implies greater probability of being part of an event. In FIG. 1, dark shaded area 102 represents locations that are not part of an event and light shaded regions 104, 106 and 108 represent locations that are part of the event. The numbers represent individual pixel feature values with higher pixel feature values being representative of an event. Numbers that are not surround by small shaded boxes, such as numbers 110 and 112 , are locations that were not identified as part of the event under the prior art, while numbers surrounded by small shaded boxes, such as numbers 114 and 116, are locations that were identified as part of the event under the prior art. FIG. 1 shows that classifying each pixel independently (by using a threshold of 3) misses some pixels, such as pixel 112, that are part of the event and also identifies some spurious pixels, such as pixel 114.