Recent developments in machine vision have demonstrated remarkable improvements in the ability of computers to properly identify particular objects in a viewing field. However, most of these advances rely on color-texture analyses that require targets objects to possess one or more highly distinctive, local features that can be used as distinguishing characteristics for a classification algorithm. Many objects, however, consist of materials that are widely prevalent across a wide variety of object categories.
The human visual system is one of the archetypal models of sensor fusion. Populations of specialized cells in the retina, thalamus, and cortex act as sensors responding in parallel to complementary attributes of the objects in view, such as spatial boundaries and grayscale textures, color signatures, motion cues, and depth cues from stereopsis. Information in these complimentary channels is fused in higher regions of the cortex, providing a rich representation of the world, even before additional fusion of information from other sense organs is provided, such as sense information from sound or touch. Neuroscience has revealed many properties of individual neurons and of the functional organization of the visual system that are believed to be essential to reach human vision performance, but are missing in standard artificial neural networks. Among these are extensive lateral and feedback connectivity between neurons, spiking dynamics of neurons, and sparse patterns across populations of neurons.
Equally important may be the scale of the visual cortex and the amount of visual input it receives. The human visual cortex consists of approximately 10 billion neurons, each with approximately 10,000 synaptic connections. A simple simulation at 10 flops per neuron to process one frame of data therefore requires approximately one petaflop (1015 flops) of computation. In a year, the 6 million cones in an eye's retina and approximately 1 million fibers in the optic nerve deliver approximately one petapixel of information to the brain.
Computing hardware to support full scale implementations of hierarchical models for large scale datasets now exist. The Petascale Synthetic Visual Cognition team at Los Alamos National Laboratory (LANL) has developed large scale functional models of the visual cortex called Petascale Artificial Neural Network (PANN) that can operate on LANL's Roadrunner petaflop supercomputer, and on graphical processer unit (GPU) accelerated clusters. An initial run of a simple visual cortex (V1) code on Roadrunner achieved 1.144 petaflops during trials at the IBM facility in Poughkeepsie, N.Y. in June of 2008. The example PANN model also achieved real-time processing of grayscale high definition video (1080 p) on a cluster of 16 computer nodes, each accelerated by a graphics processing unit (GPU) (e.g., NVIDIA Fermi™). This scale of computing is often preferable for applications that require processing of regional-scale satellite imagery collections, for climate change studies, for disaster response, etc.
However, conventional approaches use large numbers of parameters in their models. For instance, some approaches use from 60 million to a billion parameters, and, in the latter case, approximately 16,000 computers are required to process a 200×200 pixel dataset. Accordingly, an improved approach to image processing may be beneficial.