The ability to manage large image databases has been a topic of growing research. Imagery is being generated and maintained for a large variety of applications including remote sensing, architectural and engineering design, geographic information systems, weather forecasting, and biomedical image management. Content-based image retrieval (CBIR) is a technology that is being developed to address these application areas. CBIR refers to techniques used to index and retrieve images from databases based on their pictorial content. Pictorial content is typically defined by a set of numerical features extracted from an image that describe the color, texture and/or shape of the entire image or of specific objects. This numerical feature description is used in CBIR to index a database through various techniques, including distance-based, rule-based decision-making, and fuzzy inferencing.
With the availability of low-cost, high-performance computers, memory, and disk storage media, image libraries and CBIR technologies have become more common. Although large repositories can be readily assembled, the efficiency of these systems to retrieve the most relevant imagery is still primarily a function of capacity and long-term storage.
Applied to the semiconductor industry, image data management in the manufacturing environment is becoming more problematic as the size of silicon wafers continues to increase while the dimension of critical features continues to shrink. Fabricators rely on a growing host of image-generating inspection tools to monitor tiny defects and other features of interest in complex device manufacturing processes. These inspection tools include optical and laser scattering microscopy, confocal microscopy, scanning electron microscopy, and atomic force microscopy. The number of images that are generated are on the order of 20,000 to 30,000 each week in some fabrication facilities. Manufacturers currently maintain on the order of 500,000 images in their data management systems (DMS) for extended periods of time. Gleaning the historical value from these large image repositories for yield improvement is difficult to accomplish using the standard database methods currently associated with these data sets (e.g., performing queries based on time and date, lot numbers, wafer identification numbers, etc.). CBIR techniques facilitate an indexing and reuse of this data based on image content.
Another image-rich environment where data management needs are growing is in the preclinical and clinical biomedical and medical communities. Preclinical imagery is collected from small animal research studies using anatomic and functional modes such as micro-computed x-ray tomography (CT), positron emission tomography (PET), single photon emission computed tomography (SPECT), and magnetic resonance imaging (MRI). In the clinical environment imagery is also generated from X-ray, CT, PET, and SPECT modes plus optical modes such as retinal imaging with fundus cameras and optical coherence tomography (OCT). Many preclinical research and clinical medical facilities today use picture archiving and communications systems (PACS) to store these images. Accessibility, indexing, and reuse are critical to these biologists and medical personnel but access is typically limited today to standard database queries using non-image data.
Another image-rich environment where data management needs are growing is in the preclinical and clinical biomedical and medical communities. Preclinical imagery is generally collected from small animal research studies using anatomic and functional modes, such as micro-computed x-ray tomography (CT), positron emission tomography (PET), single photon emission computed tomography (SPECT), and magnetic resonance imaging (MRI). In the clinical environment imagery is also generated from X-ray, CT, PET, and SPECT modes plus optical modes such as retinal imaging with fundus cameras and optical coherence tomography (OCT). Many preclinical research and clinical medical facilities today use picture archiving and communications systems (PACS) to store these images. Accessibility, indexing, and reuse are critical to these biologists and medical personnel but access is typically limited today to standard database queries using non-image data.
Due to the rapid growth in the size of image libraries and the high potential for data (image) redundancy, a method is needed to reduce redundancy to facilitate either the long-term storage of the most information-rich image content (i.e, maintaining the same database capacity but keeping data for a longer period of time), or (2) a reduction in the size of the repository capacity which results in improved performance (i.e., storage and retrieval efficiency) and reduced time for indexing and retrieval.