1. Field of the Invention
This invention relates to the use of biologically-inspired scene descriptors for the extraction of metadata.
2. Description of the Related Art
Computer vision is the science and technology of machines that see. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information about a scene from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner. As a technological discipline, computer vision seeks to apply its theories and models to the construction of computer vision systems.
Computer vision is closely related to the study of biological vision. The field of biological vision studies and models the physiological processes behind visual perception in humans and other animals. Computer vision, on the other hand, studies and describes the processes implemented in software and hardware behind artificial vision systems.
One application of computer vision is to extract information from the image data, interpret the information and append it as metadata to the image or sequence of images. Metadata is “data about data” and provides context and descriptions for the image data. Metadata is used to facilitate the understanding, usage, and management of data, both by humans and computers. Thus metadata can describe the data conceptually so that others can understand them; it can describe the data syntactically so others can use them; and the two types of descriptions together can facilitate decisions about how to manage the data. When structured into a hierarchical arrangement, metadata is more properly called an ontology or schema.
For digital images or sequences of images metadata may be as simple as date and time created, details of the image capture, etc. Metadata may also be extracted that provides a measure of scene understanding at one or more semantic levels. Semantic information reflects the structure and meaning in the image data. Generally speaking image understanding involves the detection and recognition of objects, the relationships of objects and the context in which the objects exist in a scene.
The extraction of metadata that provides image understanding from complex visual environments e.g. natural imagery that contains complex evolving visual elements is difficult. Most vision systems are modeled on how a computer sees the world, rather than the human visual system, and are subject to one or more constraints or limitations in order to provide useful metadata. The image data is typically segmented in a supervised procedure to identify certain segments of the image for consideration. Supervised segmentation is not a practical constraint in many applications. Systems are typically not robust to changes in viewing conditions. The metadata may be limited to provide semantic information only at one level. The system may not be scalable to complex scenes or broad classes of scenes. In many cases, the extraction of scene descriptors is application specific, not universal.
To address these limitations researches are attempting to model the extraction of metadata on biological visual systems. Because humans and primates outperform in almost any measure the best machine vision systems, building a system that emulates object recognition in the cortex and image understanding in higher-level cognitive processes has always been an attractive idea. However, for the most part, the use of visual neuroscience in computer vision has been limited to a justification of Gabor filters. More recent research efforts are investigating the extraction and use of so-called biologically-inspired visual features to further image understanding. Each feature is classified separately to provide a semantic descriptor. These descriptors are combined to provide metadata.