Information extracted from photographs of the earth's surface that were taken by airborne sensors has found applications in a wide range of areas including urban planning, crop and forest management, disaster relief, and climate modeling. Relying on human experts for extracting information from aerial imagery is both slow and costly, so automatic aerial image interpretation has received much attention in the remote sensing community. So far, there are only a few, semi-automated systems that operate in limited domains.
In machine learning applications, aerial image interpretation is usually formulated as a pixel labelling task. The goal is to produce either a complete semantic segmentation of an aerial image into classes such as building, road, tree, grass, and water or a binary classification of the image for a single object class. In both scenarios, the availability of accurately labelled data for training tends to be the limiting factor. Hand-labelled data tends to be reasonably accurate, but the cost of hand-labelling and the lack of publicly available hand-labelled datasets strongly restrict the size of the training and test sets for aerial image labelling tasks.
At present, maps of many major cities not only provide the locations of most roads and parks, but also the locations of buildings. Thus, one alternative to using hand-labelled data is to use maps from projects such as OpenStreetMap™ for constructing the labels. For object types covered by these maps, it is now possible to construct datasets that are much larger than the ones that have been hand-labelled. While the use of these larger datasets has improved the performance of machine learning methods on some aerial image recognition tasks, datasets constructed from maps suffer from two types of label noise: omission noise and registration noise. FIG. 1 shows an example of omission noise and registration noise in a mapping application.
Omission noise occurs when an object that appears in an aerial image does not appear in the map. This is the case for many buildings (even in major cities) due to incompleteness of the maps. It is also true for small roads and alleys, which tend to be omitted from maps, often with no clear criterion for when they should be omitted.
Registration noise occurs when the location of an object in a map is inaccurate. Such errors are quite common because not requiring pixel level accuracy makes maps cheaper to produce for human experts without significantly reducing their usefulness for most purposes.
The presence of these kinds of errors in the training labels significantly reduces the accuracy of classifiers trained on this data.
It is an object of the present invention to mitigate or obviate at least one of the above disadvantages.