Field of the Art
The present invention is in the field of image analysis, and more particularly in the field of the use of deep learning model computer vision systems for automated object identification from geospatial imagery.
Discussion of the State of the Art
Image analysis has been an important field of technology at least since the period of World War 2, when extensive use of image analysis, photogrammetry, and related technologies were used in conjunction with aerial photography for intelligence and bombing damage assessment purposes (among others). However, the extent of the use of image analysis (particularly image analysis of remotely-sensed images), particularly for identifying or locating targets of interest, has always been limited by the need for highly-trained, specialized image analysts or interpreters. The need for specialized (and expensive) skills has limited the use of image analysis to a correspondingly limited range of applications (notably military, homeland defense, and law enforcement).
The market for image analysis has also historically been limited by the high cost of obtaining images to analyze. In the military arena, the benefits were sufficiently apparent that large numbers of military reconnaissance flights have been made over regions of interest since World War 2. But the cost of such flights virtually excluded all commercial applications of image analysis. Starting in the 1970s with the Landsat satellite, this began to change as low resolution satellite images became publicly available. A series of new satellites has opened up progressively more applications as the resolution, spectral coverage, geographic coverage, and cost per image have all continuously improved; accordingly, a significant market in commercial remote sensing imagery has emerged. But even this market has been limited from achieving its full potential because of the still-present requirement for expensive, scarce image analysis talent.
One common type of geospatial image analysis task is the “search and locate” task. In this task, one or more targets of interest need to be identified and precisely located. A well known example of “search and locate” is the discovery and pinpointing of warships, tanks, or other military targets of interest. Recently, focused geospatial image analysis of geographically specific data has been used for search and rescue efforts of downed planes or lost shipping. However, these efforts have required the work of image analysts, limiting what could be done. Development of a method to identify targets of interest rapidly, using less resources would allow the pursuit of less urgent but promising applications which include assessing the scope of a refugee crisis by for example counting tents in an area of interest, analyzing the change in infrastructure in developing nations, assessing numbers of endangered species, finding military hardware in areas previously not expected to contain such equipment, identifying previously unknown airstrips or camps where crime or terrorism may be in operation. The ability to extend “search and locate” like tasks to large geological areas and efficiently perform them repetitively over time would allow the use of geospatial imagery to map remote regions, to track deforestation and re-forestation and to detect natural disasters in remote areas of the world.
The notion of computer vision, specifically the reliable identification by a computer of particular objects has been an active pursuit within the field of computer science since the late 1960s. Unfortunately, until recently, this pursuit has met with little success except when both the object of interest and the background against which it is presented have been tightly controlled. Barriers to advancement in computer object identification have been both technological and logical. The technological barriers have been present because, like its biological counterpart, computer visual processing requires computational power and amounts of memory storage that have been prohibitive up until the last 15 years. Advancement in the ability to pack more transistors into the same volume while also reducing cost and the development of such specialized components as the graphics processing unit, which is optimized to perform calculations encountered during manipulation of visual data has brought current hardware to the point where rapid, even real time, object identification is possible. There has also been a significant maturation process in how computer scientists in the field program computers to analyze objects of interest. Some of these early methods have been to break each object of interest into a unique grouping of simple geometric shapes or to take advantage of unique shading patterns of each object to identify new instances of the desired object. All of these early attempts gave results that were extremely sensitive to such variables as lighting, exact object placement in the field of sample, and exact object orientation, sometimes to the degree that the object of interest was not identifiable in the original image without great care. Currently, after great advancement in computer capabilities, advances in our understanding of biological vision, and advances in computer vision theory, a method of training computers to reliably identify specific objects of interest has emerged. This method combines a convolutional neural network with deep learning to train the system to recognize an existing object of interest both when presented against many backgrounds and when the object is in different orientations. The convolutional neural network which consists of several layers of filters with partial, local field interconnections between layers interspersed with data complexity reduction pooling layers affords computer learning of object recognition with a minimum of pre-supposition on the part of the programmer as the convolutional neural network determines the best filters to use to identify the target object. Deep learning consists of a period of “supervised learning” which uses a moderate sized set of training images where each image contains an example of the object to identify, for example, the human face, which is clearly demarcated or “labeled” followed by a period of “unsupervised learning” on a very large number of unlabeled images, a portion of which do not have the object to identify present. The number of training images is proportional to the overall system's accuracy, specifically the precision and recall of the classification results. Accordingly, the number of training images is inversely proportional to the amount of time the convolutional neural network—deep learning model spends training and further, searching and accurately finding objects of interest. This convolutional neural network—deep learning model method has given rise to computer systems that have been reliably used in human facial recognition, optical character recognition, and identification of complex sets of parts during manufacturing. Indeed, the convolutional neural network-deep learning model method has been found so widely useful for object identification that there are multiple programming libraries now publicly available for download and use for that purpose. These include, for example, the Caffe library (BerkeleyVision and Learning Center), the Torch7 library (Nagadomi) and the cuda-covnet2 library (Alex Krizhevsky). While the convolutional neural network-deep learning model method has been widely and very successfully applied to ground based photography and video, it has not found application in the field of geospatial image analysis.
What is needed in the art is an automated system that generates synthetic training images to augment the number of real training images needed for an automated system to both identify and determine the precise location of a number of objects of interest from geospatial imagery.