Most surveillance cameras today are large, expensive, fixed in location and with large numbers of mechanical parts prone to failure. A Pan/Tilt/Zoom (PTZ) camera can cost $1600 for a modest quality unit, and $3500 for a unit with good specifications. These systems have relatively small processors in them that mainly compress raw photos or produce raw video streams to be sent out as files or stored in local flash. These systems are similar to the mechanically steered radar antennas of the 1960s or the very complex computer disk storage systems of the 1980s in that they are large and complex devices.
The surveillance camera sensors have grown in resolution due to Moore's law, with 1-2 megapixel video surveillance cameras available on the market today. A new class of sensors are now available from the consumer electronics world. These sensors are much higher volume and therefore cheaper and much higher resolution. Low cost sensors are available at 16-24 megapixels with 36 megapixel sensors on the way. These sensors are paired with low-cost, high-volume lenses that give much higher resolutions than conventional video cameras. These new classes of sensors are much higher performance and can allow very high resolution photos to be taken at 5-10 frames per second.
Network bandwidth has not kept pace with these improvements though, and it is difficult to get the full quality video stream off the camera. Most analysis of images is done by humans at remote locations, and it is a challenge to get the increasing amount of content to the analysis points. As a result, low quality video streams are often sent around and much detail is lost, or a low quality sensor is used as there is no ability to deliver greater data rates off the camera.
While the camera sensors, processors, and storage have become dramatically cheaper, dramatically more reliable, and dramatically less power-hungry thanks to continuous improvement in electronics (aka “Moore's Law”), no such dynamic has been at work on the mechanical parts of surveillance cameras. The movie making industry formerly used low-volume, high-cost dedicated cameras with special lenses, mounts and storage systems until just a few years ago. However, the industry was dramatically changed by the transition to using high-volume consumer photographic and internet technology, in particular sensors (Blackmagic, Canon C-100, Sony NEX-40) and consumer standard lens mounts (EF-Mount, E-mount), as well as the use of internet standards for data storage (H.264, Firewire).
This technology is higher volume and lower cost. Today, one can buy a $300 lens and use it on a $400 Sony NEX-3n for photos and home videos, and then mount that same lens on a $40K system using the same sensor to film a major motion picture while viewing the output of that picture on any standard MacBook.
The Internet has scaled to billions of computers and millions of servers through the use of sophisticated protocols such as HTTP, HTML5, XHTML and to allow queries and processing to run across millions of nodes through technologies like Hadoop. These techniques are well understood by millions of developers and development classes are taught at the high school level in the US.
Pan/Tilt/Zoom Cameras have relatively narrow fields of view and solve the need to get high resolution by employing optical zooms of 3-10×, with a complex lens system. They solve the problem of coverage by motors that can pan the camera as much as 360 degrees around or tilt the camera up and down. These are complex mechanical systems that require maintenance and calibration.
Some camera systems will embed multiple PTZ cameras into a single housing to provide higher resolution and achieve scale economies. To provide adequate coverage, cameras are often mounted in various locations in a distributed camera network. They may also automatically scan various locations in a regular patrol path if they are PTZ cameras. When there are multiple targets, the surveillance system may direct some cameras to zoom while other cameras may continue to pan.
Because the entire area cannot be continuously scanned, these systems use algorithms such as Kalman Consensus Tracking to estimate the positions of objects that cameras cannot current see. These systems require careful calibration of the placement of cameras to ensure tracking is done properly.
Current surveillance systems use a small sensor focused on video at NTSC resolutions and increasingly HD resolution (1920×1080 or 2 megapixels at 30 frames per second typically). These systems typically have relatively narrow and fixed fields-of-view and require many cameras to cover an area while getting enough resolution to identify objects. These cameras are manually installed and calibrated by hand to ensure their placement is correct.
Some cameras use a single video sensor with a panoramic lens to replace the need for PTZ, however by spreading the resolution across 180 or 360 degrees, while pan and tilt are not necessary, resolution is poor at any particular area. Some cameras incorporate “Digital Zoom”, replacing a mechanical zoom lens with digital expansion of the image. This is sometimes marketed as “Digital Pan/Tilt/Zoom” although there is no real pan or tilt feature and there is a loss of resolution.
Some cameras may have infra-red sensitive lenses and infrared radiators to see in low light or in the dark. Some cameras have dual cameras to provide stereo vision to allow distance estimation of objects. Cameras are either connected by video coax cable or they stream their video feeds over the Internet as real-time video streams (RTP) or by providing web pages from the cameras. Some cameras are wireless using WiFi or proprietary methods. Certain research systems use dual-radio systems with low bandwidth communications (e.g., Bluetooth) handled separate from high bandwidth (e.g. WiFi). Some cameras allow storage of images on SD or other cards which may be manually removed later for analysis. Surveillance systems feed into forensic and scene analysis tools; there are many systems that take a set of images from a single camera and analyze them. However they often cannot work on the full camera video resolution due to bandwidth constraints, and often cannot work in real-time. Intelligent Video Surveillance Systems are typically single cameras designed to detect specific abnormal events such as an object crossing a line and to send a message over a network. These formats are proprietary to each camera vendor and must be uniquely programmed. Some camera vendors (for example Axis) have created a developer program to allow limited placement of code directly on their cameras, permitting some limited analysis at image capture time on the sensor images using proprietary protocols and interfaces. This is an example of a smart camera, which are cameras that include a system-on-chip to provide additional scene analysis and video compression with a variety of applications as well as being able to send proprietary notification events to typically proprietary systems (e.g., Axis Surveillance system). Research has also been done on distributed smart cameras, which are fully distributed camera systems where each camera is a peer and attempts to cooperate with other cameras to in a distributed way handoff tracking of objects as they move. They do not have central controllers, but rather a peer-to-peer network where the cameras cooperate with each other using proprietary protocols (e.g., CORBA). These systems typically make heavy use of PTZ mechanisms and various path prediction algorithms to track targets in an area with very limited or spotty coverage due to the expense of the cameras involved.