Visual surveillance of dynamic scenes is an active area of research in robotics and computer vision. The research efforts are primarily directed towards object detection, recognition, and tracking from a video stream. Intelligent visual surveillance has a wide spectrum of promising governmental and commercial-oriented applications. Some important applications are in the field of security and include access control, crowd control, human detection and recognition, traffic analysis, detection of suspicious behaviors, vehicular tracking, Unmanned Aerial Vehicle (UAV) operation, and detection of military targets. Many other industrial applications in the automation fields also exist, such as faulty products detection, quality assurance, and production line control.
Commercial surveillance systems are intended to report unusual patterns of motion of pedestrians and vehicles in outdoor environments. These semiautomatic systems intend to assist, but not to replace, the end user. In addition, electronics companies provide suitable equipment for surveillance. Examples of such equipment include active smart cameras and omnidirectional cameras. All the above provide evidence of the growing interest in visual surveillance, whereas in many image processing applications, there is a crucial need for high performance real-time systems. A bottleneck in these systems is primarily hardware-related, including capability, scalability, requirements, power consumption, and ability to interface various video formats. In fact, the issue of memory overhead prevents many systems from achieving real-time performance, especially when general purpose processors are used. In these situations, typical solutions are either to scale down the resolution of the video frames or to inadequately process smaller regions of interests within the frame.
Although Digital Signal Processors (DSPs) provide improvement over general purpose processors due to the availability of optimized DSP libraries, DSPs still suffer from limited execution speeds. Thus, DSPs are insufficient for real-time applications. Field programmable gate array (FPGA) platforms, on the other hand, with their inherently parallel digital signal processing blocks, large numbers of embedded memory and registers, and high speed memory, together with storage interfaces, offer an attractive solution to facilitate hardware realization of many image detection and object recognition algorithms. As a result, computationally expensive algorithms are usually implemented on an FPGA.
State-of-the-art developments in computer vision confirm that processing algorithms will make a substantial contribution to video analysis in the near future. Processing algorithms, once commercialized, may overcome most of the issues associated with the power and memory-demanding needs. However, the challenge to devise, implement and deploy automatic systems using such algorithms to detect, track and interpret moving objects in real-time remains. The need for real-time applications is strongly felt worldwide, by private companies and governments directed to fight terrorism and crime, and to provide efficient management of public facilities.
Intelligent computer vision systems demand novel system architectures capable of integrating and combining computer vision algorithms into configurable, scalable, and transparent systems. Such systems inherently require high performance devices. However, many uncharted areas remain unaddressed. For example, only a single hardware implementation attempt has been reported for a Maximally Stable Extremal Regions (MSERs) detector and the attempt had limited success. This is in spite of the fact that MSER detectors were introduced as a research topic more than a decade ago, have been used in numerous software applications, and discussed in over 3,000 published papers. The major advantage of MSER detectors is affine invariance. Traditional scale invariant feature transform (SIFT) detectors and speeded up robust features (SURF) detectors are only scale and rotation invariant.
In spite of the major advantages of MSERs, a problem remains in tracking objects that pass through scenes of dramatic light intensity changes. For example, assume that a car is being tracked by a helicopter in a clear bright environment using a classical MSER tracking system. Once the car enters an area having a dramatic intensity change such as passing through a sunny area into a shady area, the classical MSER tracking system will very likely lose track of the car. This is because the classical MSER tracking system relies on intensity images that have a relatively stable light intensity to track objects.
What is needed is hardware architecture for real-time extraction of MSERs that can track objects through scenes having relatively large light intensity changes. Further still, the architecture should be easily realized with e.g., an FPGA or an application specific integrated circuit (ASIC) or the like.