Through exponential improvement over the last 40 years, computers have become millions of times faster and more capable. The famous proposition known as Moore's Law generally sets forth the inevitability of rapidly increasing improvement in the power of computing devices. In this time, computers have become much more effective than humans at certain tasks such as playing chess and calculating taxes. Thus, it is notable that sensory observation tasks, such as visual object recognition, which humans find effortless, have only recently become computable with a sufficient quality of results to be practically useful in the real world. Vision systems that have been implemented commercially typically have just one programmed function, such as detecting traffic stop violations or determining assembly line defects on a trained object pattern. Although classes of algorithms with improved generalization and task flexibility exist, these have not found practical use, one reason being that their extreme complexity requires supercomputing resources such as large clusters. Many industries could benefit from the availability of improved vision systems but as long as these systems' are physically large (i.e. requiring a general purpose computer/PC), power hungry, expensive and immobile they are not appropriate for a practical, general recognition, application.
FIG. 1 details a standard implementation of general purpose (and various versions of a special purpose) computing arrangement 100. The arrangement 100 consists of a central processing unit (CPU) 110 that, as described below, can comprise one or more processing “cores.” The CPU interacts with a memory 120 that stores program instructions and data upon which the program instructions operate using the CPU 110. A bus or other connectivity structure 130 connects the CPU 110 and memory 120 to each other and also to other functional components of the overall computer 100, including one or more input/output (I/O) devices 140, which are adapted to allow data output and display, data input and operation of various peripheral devices (for example graphical user interface (GUI) devices). The organization of the components is highly variable. Multiple memories, etc., can be provided in alternate arrangements. Likewise, the various components of the computer 100 can be provide on one physical circuit chip structure, or a plurality of physical circuit chip structures.
As noted above, more recently, general and special-purpose computers have implemented processor arrangements in which a plurality of separate, parallel-processing “cores” are provided on one or more physical circuit chip structures. Advances in the miniaturization of circuit design have concurrently enabled such multi-core arrangements to be provided with a physical footprint that heretofore supported fewer (or only one) core(s). The use of multiple cores, as noted, has yielded a degree of parallelism in the processing of programs and program threads. With reference to FIG. 2, the CPU 110 is implemented with multiple processing cores (four cores in this example) 220, 222, 224 and 226. The multiple cores are connected to an on-chip memory cache arrangement 240. In the best case (e.g. highly parallel, non-serially dependent tasks), four cores can finish an overall task approximately four times faster, although typically the benefit of multiple cores is significantly less due to serial dependencies inherent in the task(s).
For tasks that are easily divided into multiple subtasks, which can execute in parallel, it is contemplated to employ a plurality of discrete computers (Computers 1-7) together in a computer cluster 300 (FIG. 3) connected by one or more network switches 320 or other internetworking devices. In the best case, a cluster of N computers can complete the overall task N times faster. Note that if the overall problem is not divisible into multiple subtasks, the program is typically not adapted to execute on a cluster.
General purpose computers are particularly useful for their ability to execute programs written after the computer chips have been fabricated. That is, general purpose CPUs are not typically optimized for any individual program. However, if significant constraints can be placed on the types of algorithms that are to be executed by the computing device/processor, then its architecture can be optimized to excel at those calculations that are most important to the special class of programs it will run.
Some computer/processor architectures have been designed with the specific purpose of improved performance at sensory recognition (e.g. auditory, radar, medical scanning/imaging and vision). For example, some processors are adapted particularly to improve the performance of vision algorithms. In the 1980s, processors were designed and sometimes fabricated to accelerate certain portions of vision algorithms that constituted the slowest processing bottlenecks at the time. These older architectures have become outdated because they optimized for programs that would currently execute much faster than real time on existing general purpose processors, and thus, would not be very useful today.
Two modern architectures optimized for vision processing are the Acadia II processor made by Sarnoff Corp. of Princeton, N.J. (by way of background, refer to World Wide Web address http://www 10.edacafe.com/nbc/articles/view_article.php?section=ICNews&articleid=679089), and the EyeQ2 made by MobileEye N.V. of the Netherlands (by way of background refer to World Wide Web address http://www.mobileye.com/default.asp?PageID=319). These processors focus on power-efficient acceleration of low-level vision routines such as edge detection and tracking. For higher-level routines they integrate one or more general purpose CPU cores (either on-chip or off-chip). These architectures are appropriate for problems in which the higher-level routines are relatively simple and can run effectively using the onboard general-purpose processors.
However, even such modern processors are still limited in that they tend to be directed to particular higher level algorithms used to solve particular vision problems. That is, they may be optimized to carry out sets of algorithms particularly suited to a certain set of tasks, such as license plate recognition, but these algorithms are not useful for vehicle shape recognition, facial recognition or the like. In general, these processors do not emulate the theorized approach in which humans perceive and recognize visual objects and other sensory information in which features of a subject (for example a person's eye shape) are initially discerned by the mind and either discarded if incorrect, or accepted and then combined with other features (for example, the person's mouth) until the mind has confidence that it has made the correct recognition. This approach requires a large number of parallel tasks that build upon each other in differing combinations—a task not necessarily suited to modern processor architectures. Rather, performing this task with modern processors would require a massive investment in general purpose cores, an extremely power-hungry approach that limits the miniaturization of such an architecture.
The ability to provide a processor capable of running general recognition algorithms, capable of discerning a very large number of subjects is critical to constructing autonomous robots and self propelled vehicles, as well as general identification and recognition systems—used for example in surveillance and crime-control. However, most of these systems have significant limitations in power availability and/or size. A processor that can recognize hundreds or thousands of different trained (or extrapolated) subjects, but that exhibits small size and low power consumption is particularly desirable. This processor architecture should be easy to construct with conventional circuit fabrication techniques and allow for classes of recognition algorithms to be variously loaded and employed without need to alter the processor architecture significantly, or at all.