1. Field of the Invention
The invention relates to the field of data analysis, and more particularly to the field of image mining and searching.
2. Discussion of the State of the Art
In the field of data analysis, search engines are commonly utilized in a variety of fields and for a variety of use cases, such as text-based searching of documents, keyword-based searching for information on a network such as the Internet, or “fuzzy” searching using algorithms to search over non-text or mixed-media data, such as (using a single example in the current art) Google™ reverse image search, that enable users to select an image and search the Internet or other data for any similar images. However, there are limitations present that prevent full realization of the utility of search engines, particularly in the area of image search and analysis.
There exist solutions that allow for basic search of an image base, such as Google™ reverse image search as previously mentioned. Such search engines typically operate by identifying similarities between images, or by identifying image content such as with “tags” or other identifiers for use in keyword or other text-based searches. There is currently no provision for searching with greater granularity as may be common in text-based searches—for example, a user might search within a selection of document text or take a count of occurrences of a specific keyword or phrase, but such an operation is not currently possible when dealing with image data. Current image search paradigms are focused on basic image comparison and are limited in their scope. Text-based image search currently only accommodates tag text, and depends on humans to tag images, which will not typically be practical for large image data sets such as are obtained from satellites.
Moreover, images often contain implicitly hierarchical information. For example, some image features may be relevant or meaningful at a pixel level of granularity, such as land cover classification or texture roughness; others may be relevant at a regional (i.e., region within an image) level of granularity, such as shapes, arrangements of objects, and object identification and classification; and yet others may only be relevant or meaningful at a scene (whole image or collection of images) level, such as percentage of land cover or use, object counts in an area, or high-level classification of areas such as deserts, cities, lakes, and the like. Tag-based text search does not capture this richness of information, and many such hierarchical image features would not normally be tagged by humans in any case.
Common image search approaches in the art also fail to allow for scalable operation, in both the scope and the method of search, as well as system architecture. Existing solutions are limited to simple image comparison operations (“find images like this one”) or basic keyword data (“find images related to this topic”), preventing detailed or more precise identification or searching of image information. They further fail to incorporate a modular approach to search logic, such as allowing for a variety of search logic modules that may be added, removed, or altered without interrupting operation or requiring manipulation of other elements of a search engine or system.
What is needed is an image search and mining system that allows for scalable operation. Preferably, such a system should be modular to allow incremental development without interfering with ongoing operations, and should allow users to search with variable granularity over a large base of image data keyword search queries.