Point-of-interest detectors process a supplied video signal and detect regions in the video signal which can be of interest. This the output of such detectors is a signal indicating interesting regions in the supplied video signal, which output signal is then used to control an actuator e.g. of a robot or a vehicle (automobile, plane, boat, . . . ).
POI detectors are employed ubiquitously in computer vision applications. Although the methods for POI detection differ greatly, it is nevertheless commonly accepted that POI detection should be among the first stages of any vision-based image understanding system. This is based on the fact that a parallel, full-blown analysis of all regions of the image is infeasible in addition to being unnecessary due to the fact that the relevant information content of images is usually concentrated on few regions only.
It is now quite universally agreed that the presence of semantically relevant quantities such as, e.g., objects or persons, can be detected in an image based on quite simple local image properties but possibly in a situation- and task-dependent manner. Saliency map models (which are formulated according to biological vision processing principles) are capable of performing exactly these functions, yielding POIs similar to those human or animal vision might detect.
Typical operations that use the detected interest points are segmentation, object classification, region tracking and gaze or actuator control. All of these operations can be computationally expensive; it is therefore imperative that the supplied points of interest are few, yet of sufficient relevance for the targeted application domain in order not to consume computing time unnecessarily; on the other hand, it may be important that no application relevant quantities are missed by the point-of-interest detection. Especially in the domain of “intelligent vehicles”, a missed detection of, e.g., pedestrians can have grave consequences.
For this reason, saliency map models have recently received strong scientific attention since they have the potential to emulate human performance which is far superior to present-day technical approaches to point-of-interest detection with respect to the criteria that were just mentioned.
Virtually all of the literature on the subject agrees that the performance of saliency maps increases as the number of measurements increases; it is therefore of considerable practical interest to be able so simulate large numbers of Amari dynamics-governed systems in real-time. The requirement of real-time capability becomes even more important when taking into account that computing hardware operating in cars is and will be limited in processing power due to robustness and power consumption requirements.
There are also many implementations in technical systems aiming at similar-to-human point-of-interest detection which use the Amari dynamics (AD) technique or trivial derivations thereof, see, e.g. [Conradt, J, Simon, P, Pescatore, M and Verschure, P “Saliency Maps Operating on Stereo Images Detect Landmarks and their Distance”, Proceedings of the International Conference on Neural Networks, 2002; Fix, J, Vitay, J, Rougier, N “A Computational Model of Spatial Memory Anticipation during Visual Search”, Proceedings of the Anticipatory Behavior in Adaptive Learning Systems conference, 2006; Itti et al., op. cit.; Goerick, C. et al., “Towards Incremental Hierarchical Behavior Generation for Humanoids”, In IEEE-RAS International Conference on Humanoids, 2007]. However, these publications are based on a standard simulation of Amari dynamics, resulting in significantly lower processing speeds. This is a critical issue in point-of-interest detection since it usually has to be performed in real-time.
The simulation of Amari dynamics (AD) on a digital computer is quite expensive from a computational point of view, which is why real-time applications of larger systems of coupled Amari dynamics (AD) are not feasible up to now.
The simulation of Amari dynamics (AD) on a digital computer requires numerically solving a nonlinear differential equation for one- or two-dimensional neural fields (see, e.g., [Erlhagen, W, Schöner, G “Dynamic field theory of movement preparation”, Psychological Review 109:545-572, 2002] for an introduction to the concept of neural fields) which is related to Amari dynamics [Amari, S “Dynamics of pattern formation in lateral-inhibition type neural fields”, Biological Cybernetics 27:77-87, 1977; Amari, S “Mathematical foundations of neurocomputing”, Proceedings of the IEEE 78: 1443-1463, 1990], requiring significantly less computational resources than previous approaches. At the same time, the invention allows to incorporate the most common types of boundary conditions when solving the differential equation without impairing computational speed, which is of importance in many applications.
One formulation of the differential equation for Amari dynamics (AD) readsτ{dot over (a)}({right arrow over (x)},t)=−a({right arrow over (x)},t)+i({right arrow over (x)},t)+F({right arrow over (x)})*f[a({right arrow over (x)},t)]+h  (1)where a({right arrow over (x)},t) is the function to be found, i.e., the state of the neural field, i({right arrow over (x)},t) is a known function stating the input to the neural field, f[.] is a bounded monotonic, usually nonlinear function with values between 0.0 and 1.0 called “transfer function”, F({right arrow over (x)}) stands for a function called the “interaction kernel”, specifies the time scale the neural field can change on, and h is a constant specifying the global excitation or inhibition of the field. The operator “*” represents a spatial convolution operation defined on function spaces as(f*g)(x)=∫f(ξ)g(x−ξ)dξ  (2)
In order to simulate equation (1) numerically, the variables {right arrow over (x)},t may be discretized using step sizes x, y, t. By doing this, the convolution is transformed to a discrete convolution operation. The correct choice of x, y, t is nontrivial and must be performed according to the accuracy requirements on the desired solution. Especially when discretizing the time dimension, variable step sizes may be employed. Within the scope of the presented invention, however, the choice of correct step sizes is not considered since it may depend on the requirements of a particular application. It is assumed that that step sizes have been set to fixed values which are compatible with all application requirements. At least for the time variable, the use of variable step sizes does not invalidate any aspect of this invention. The term “neural field” will refer to the discretized version of the continuous neural field from now on.
The function F({right arrow over (x)}) is usually concentrated in a small region around the origin and can thus be expressed as a discretized, finite convolution filter of a certain size. An issue here is that the discrete convolution operation is very computationally costly, since it requires at least N*M multiplications, where N is the number of discretized points of the neural field, and M the corresponding number for the interaction kernel. In case of two-dimensional neural fields, the problem becomes worse since N scales quadratically with the neural field size, and M is usually related to that size.
For reasons of the stability of solutions, it is always asserted in the literature [6, 7] that the input to the neural field changes on a slower time scale than the neural field itself. When solving the differential equation for the neural field iteratively, this is usually expressed by keeping i({right arrow over (x)},t) constant for K time steps (or “iterations”) before supplying a new value, so the neural field has always sufficient time to converge to an equilibrium solution with the current input before a new input is presented.
Usually, Amari dynamics (AD) is simulated in the space representation (SR) using the method of separable filters [Jaehne, W “Digital image processing”, 6th edition. Springer Verlag Berlin, Heidelberg, New York, 2005] for speeding up the convolution process, as detailed, e.g., in the appendix of the article by Erlhagen et al. [op.cit.]. However, since the difference-of-Gaussian function that is generally used for the interaction kernel is not itself separable (Gaussian functions are separable, but not their sum or difference), the speed-up that can be gained is limited.
German patent application DE 19 844 364 discloses using neural networks to learn the attractors of a dynamical system governed by Amari dynamics (AD) in order to avoid the computationally costly exact simulation. However, this approach requires a significant training phase prior to use, and it is not guaranteed that a qualitative behavior identical to true Amari dynamics (AD) will result due to the limited learning and generalization capabilities of neural networks.
It is therefore an object of the present invention to efficiently implement a neural network with Amari dynamics. It is a further object, to provide a more efficient method and a device for detecting points of interest in a video signal, based on a neural network simulation.