Video surveillance cameras connected to a computer network present a number of advantages over conventional analog Closed-Circuit Television (CCTV) systems. Among these advantages are the ability to remotely view a camera feed from anywhere on the network, the ability to store or replicate the digital video images without any signal degradation, the ability to send control messages back to the camera, and the opportunity to use digital image processing computers remote from the camera to automate some aspects of security video monitoring that would normally require an alert attendant. However, the use of such cameras across the Internet presents some unique challenges.
Various methods for distributing real-time or stored video are known. These can be usefully categorized as either data push methods or data pull methods.
In a data push method, the video source constantly sends (pushes) video data out into the network, a technique commonly referred to as multicasting. The video stream is associated with a unique Internet Protocol (IP) address called a multicast group. Network devices wishing to receive the video stream inform their local router of their desire to join the multicast group. Routers between the video source and the viewer are informed of the need to replicate the stream in the viewer's direction. When the multicast stream reaches the viewers local network segment, it is simply broadcast onto the segment and any computers wishing to use the stream recognize it by its unique multicast IP address.
While multicasting is well suited to mass distribution of live video or audio such as sporting events, news feeds, or music, it is not well suited to the specific needs of video surveillance for a number of reasons. In video surveillance, it is often unnecessary to broadcast every frame of video. Unlike an audio stream or a movie video in which a continuous data stream needs to be maintained in order to achieve acceptable sound or picture fidelity at the receiving end, in video surveillance having a regular frame rate is much less important than ensuring the fidelity of individual frames. For example, for basic monitoring purposes a rate of one frame per second can be quite adequate. However in the event of an alarm or other need for better real time data, a high frame rate may suddenly be desirable. Likewise, the frame rate demands of different users who are simultaneously accessing the camera can be quite different. For example, a security guard actively monitoring the video may desire a higher frame rate than is needed by a video recording application that is simultaneously accessing the video data at a low frame rate for long term archival reasons.
Different users may also have different image resolution requirements. For example, for general monitoring a 320×240 pixel image is often sufficient, but a higher quality 640×480 pixel image may be required to reliably recognize a persons face seen in the image. The nature of packet switched computer networks also results in varying data rates to different users depending on the network bandwidth available between each user and the video source.
Another problem is that it is common for multiple security cameras to share a local network, and therefore having every camera push all its video data onto the local network is often unacceptable to other users of the network and is a poor utilization of the network resources. Also, typical multicasting protocols provide no user authentication capability. In the case of security cameras, it is normally important to the owner of the surveillance system to protect against unauthorized access to the cameras. Similarly, it is common for local area networks to be protected from the Internet by a firewall for network security reasons. While it is possible to broadcast a video stream out through a firewall, the firewall prevents any back channel to the video source to control its video feed.
As noted above, using a computer network for the video surveillance system provides the opportunity for other applications on the network to interpret the video feed and add additional high level information such as indicating motion or recognizing a face. A multicasting system provides no ready means to integrate this additional information from other sources into the camera's video stream.
As a consequence of these problems with multicasting, the common approach to network security cameras is to use a data pull method, where data is only sent from the camera to a user upon request. The most common means of implementing a data pull method is for the camera to run a web server. Users wishing to see pictures from the camera connect to it using their web browser by entering the cameras IP address or domain name. This approach solves some of the problems mentioned above with multicasting, such as not sending data when it is not needed and the ability to provide typical web based user authentication. However, this approach does not solve all of the issues, and introduces several new issues.
First, the frame rate that can be delivered by the camera quickly degrades as more users access one camera. Without special browser plug-ins, even the frame rate to a single user is typically much lower than can be achieved using streaming methods like multicasting. Further, while multicasting ensures that only one copy of the data needs to be transmitted on any given network segment, the web server approach is much less efficient for the case of multiple users because the server has to replicate the data transmission for each user thus making poor use of local network bandwidth.
Some of the problems associated with multicasting remain unresolved with the web server system. The camera's web server is not normally accessible through a firewall without specifically configuring the firewall to allow these connections. Also, the web server method provides no convenient means to augment the basic video data with higher level interpretations from video analysis applications running elsewhere on the network.
Unlike multicasting, there is no way for a web server behind a firewall to register its services with an outside server so that, as new cameras are added to or removed from the system, an outside user can readily determine what cameras are available. Similarly, the IP address of the camera must be statically allocated, rather than dynamic, otherwise users will not be able to reliably access the service. This greatly increases the level of setup complexity in comparison to network devices that use a service such as DHCP to automatically obtain an IP address and other crucial configuration information.
Finally, for a security camera network with many cameras there is no centralized facility for management of user authentication and for global camera configuration and camera software updates. In summary, the web server (data pull) approach fails to address the basic issues of system scalability required of any larger scale enterprise system, such as a video surveillance system.
The particular problem of accessing a network service hidden behind a firewall has been addressed by various systems and methods known in the art, typically known as tunneling. For example, U.S. Pat. No. 6,104,716 describes a method for a server, hidden by the firewall, to be contacted via the Internet by a client application on a separate local network also protected by a firewall. A server side proxy initiates a connection out through the firewall to a trusted middle proxy located at a public Internet address. Similarly, a client side proxy initiates a connection through its firewall to the same middle proxy. After the connection is authenticated, the three proxy agents together provide a virtual secure tunnel between the client and server. Neither the client nor server need be explicitly aware they are dealing through a proxy rather than interacting directly. Once the tunnel is established, the proxy chain acts only as a pass through mechanism, and does not interpret the data in any way. U.S. Pat. No. 6,349,336 describes an alternative arrangement of proxy agents for a similar purpose. While addressing the need to connect applications not originally intended for interaction through firewalls, these methods fail to resolve the other issues described above for a camera with a web server system.
It is therefore an object of the present invention to provide a means for one or more end users, who may be hidden behind firewalls, to access via the Internet one or more video cameras that may also be hidden from the Internet by a firewall, and to do so without requiring any modifications of the firewalls or any special services in the IP routers and gateways of the network.
It is a further objective of the present invention that the cameras have their network configuration automatically assigned by a service such as DHCP so that no configuration is required by the person installing the camera.
It is a further object of this invention to avoid having to send duplicate copies of camera data on the camera's local network segment.
It is a further object of this invention that the system be readily scalable from a small number of cameras to a large number of cameras, and from a small number of users to a large number of users.
It is a related object of this invention that the failure of any one element of the system should have little or no impact on the operation of the rest of the system.
It is a further object of this invention to support independent data rates and independent data formats to each camera user based on their needs and the available bandwidth.
It is a further object of this invention to be able to implement services elsewhere on the network that provide for higher level interpretations of the camera's raw data, acting as both consumers of video data and producers of interpreted video data.
It is a further object of this invention to provide for centralized management of a group of cameras and a group of users, including implementation of user and camera authentication, centralized management of configuration issues such as software updates for cameras or end users, and a database of information specific to each camera such as camera parameter settings and camera geographic location.
It is yet another objective of the present invention to support access to, and control of, non-real-time video sources such as digital video recorders within the same system framework.
In the event that a camera has a public IP address, it is a further objective of the present invention to avoid making public the IP address of a camera in order to reduce the likelihood of hacking attempts or denial of service attempts on the camera.