Many municipalities, businesses, and other institutions are implementing extensive, large scale, video surveillance networks having multiple video cameras arranged outdoors and/or indoors at various venues for security purposes, as well as for remote monitoring in public and private areas that may need monitoring such as traffic intersections, toll booths, airports, public events, banks, casinos, military installations, convenience stores, and the like. Closed-circuit television (CCTV) systems may operate continuously, or only as required, to monitor a particular venue, and typically run dedicated coaxial cable to each camera. The advent of internet protocol (IP)-based, stand-alone, digital video cameras has removed the need to run coaxial cables. Instead, the video cameras generate video streams, which are typically digitized, compressed, and multiplexed onto a common physical medium or shared backhaul, e.g., a fiber ring, a point-to-point wireless link, an Ethernet network, and the like. For example, ten to thirty or more IP cameras may share a single, fixed bandwidth-limited, backhaul.
Multiplexing is, in part, made possible through the use of digital video compression technology. Modern video codecs, e.g., MPEG4-PART2, H.264, H.EVC, and the like, encode and decode video streams at incredible compression ratios through the use of predictive encoding. In a motion sequence, a video stream comprises a plurality of pictures or frames of different types and generated at a frame rate, e.g., thirty frames per second. One type of frame is an intra-frame or I-frame (also known as a key frame), which is a single frame of digital content that stores all the data needed to display the image of that frame, and is a stand-alone or independent frame that does not rely on data from any other frame to display the image of that frame. A predictive frame or P-frame (also known as a delta frame) is another type of frame, and is a single frame that contains only the data that has changed from a preceding frame. A P-frame sequentially follows, and depends on, an I-frame to fill in most of the data to display the image of that P-frame. Another type of predictive frame is a bidirectional frame or B-frame (also known as a delta frame), and is a single frame that contains data that has changed from a preceding frame, and/or contains data that is changed from the data in the next frame. A B-frame thus depends on the frames preceding and following it to fill in the data to display the image of the B-frame.
Predictive frames, e.g., P- and B-frames, typically use temporal compensation to move around texture-coded pixels from previous and/or future frames. While this digital process radically reduces the amount of bandwidth required to transmit a video stream, as compared to an analog process, it also makes predictive frames susceptible to errors. An error in a single predictive frame can propagate forward in time as future frames apply motion compensation. This, in turn, causes erroneous data to eventually corrupt the integrity of the video stream over time.
To combat such errors, the aforementioned intra-frames are regularly inserted, as fully texture-coded frames, into the video stream to clean up any residual errors. A typical intra-frame insertion rate is one intra-frame every one or two seconds. An intra-frame is relatively large, e.g., an order of magnitude larger, with respect to each predictive frame. When an encoder in, or associated with, the video camera is programmed to provide a constant bit rate, e.g., about 6 megabits per second (mb/s), then the bit rate will instantaneously spike much higher, e.g., to about 30 mb/s, than the programmed bit rate when an intra-frame is generated, and then lower, e.g., to about 5 mb/s, when predictive frames are generated.
In a typical video surveillance network deployment, the video streams from multiple cameras are backhauled to a central location for control by a network video recorder (NVR), also known as a video server. The NVR is operative for recording and for viewing the video streams at a control station typically manned by human operators. When that backhaul comprises a point-to-point wireless link, the aforementioned bit rate spikes can become particularly problematic, because such a wireless link typically operates at a fixed, constant data rate. As such, large intra-frames can take appreciably longer to transmit over the wireless link than smaller predictive frames. This induces jitter into the video stream, which is an issue for real-time video delivery. Jitter, as used herein, is defined as the variation in arrival times of packets comprising video frames in a video stream. Also, a large intra-frame can cause a buffer in a modem of the wireless link to overflow, thereby inducing packet loss. The NVR receiving the video stream typically buffers some nominal (short) time period for the packets comprising a frame to arrive. In some instances, if all of the packets comprising a given frame do not arrive within that nominal time period, then the NVR treats the data as missing, and the resulting decoded data exhibits errors. If the NVR instead waits for all of the packets comprising the frame to arrive, then the resulting video stream will appear “jerky” to a viewer, as the time difference between the frames is inconsistent.
This effect is exacerbated as multiple cameras are multiplexed over the same shared wireless link. Furthermore, it becomes really problematic if those cameras are all generating intra-frames at approximately the same instant in time. It is a common occurrence for the NVR to control and configure all of the cameras to operate in a near simultaneous/parallel fashion, which, in turn, sets up nearly synchronized intra-frames. At network startup configuring each camera resets a clock in each camera. Even if the cameras start off with unsynchronized intra-frames, cameras that are programmed for the same intra-frame period, i.e., the time interval between successive intra-frames in a video stream, e.g., typically about 1 second, may eventually, through clock drift, align and overlap the intra-frames, at least for a certain amount of time.
Thus, it is known to configure video surveillance network architecture such that the intra-frames in video streams from multiple cameras are generated in unison, or nearly so. However, the shared wireless link cannot readily accommodate the massive instantaneous bit rate. For example, if the average bit rate of each camera is configured for 6 mb/s, then ten such cameras can theoretically comfortably share a fixed 100 mb/s link, or even more so, a 200 mb/s wireless link. However, if intra-frames having an instantaneous bit rate of about 30 mb/s overlap, then 10×30 mb/s=300 mb/s, and this will significantly overload the 200 mb/s wireless link. This will delay (jitter) or drop (buffer overflow) frames, thereby causing unacceptable video artifacts in the video streams. This problem not only applies to wireless links, but also to Ethernet switches and routers having low data switching and/or throughput rates.
To prevent such intra-frames from being substantially simultaneously generated, it is known to operate the cameras at different times. However, when thousands of cameras are involved (some cameras sharing a common backhaul, and some cameras not sharing a common backhaul), the video network would require a complex control system, and, even so, it is difficult to determine which of the thousands of cameras are sharing a common backhaul and are simultaneously generating intra-frames. A known way to determine which of the cameras are sharing a common backhaul is to refer to a detailed network map or “blueprint.” If the blueprint exists, then the cameras which share a common backhaul can be statically identified, but, even so, the blueprint cannot identify which cameras are indeed simultaneously generating intra-frames. Nor can the blueprint identify intra-frames that overlap due to clock drift. Often however, a blueprint does not exist. This is common for cameras which are setup to cover ad-hoc events, e.g., a civic festival downtown, or for networks that grew organically without architectural oversight. In such cases, it is not sufficient to simply run an internet control management protocol (ICMP) “traceroute” to uncover common network elements. In any event, the “traceroute” identifies only layer 4 routing elements, and does not identify shared Ethernet switches and point-to-point links.
Accordingly, it would be desirable to reliably identify which of the video streams of the cameras are sharing a wireless link in a video surveillance network, as well as which of the video streams are simultaneously generating intra-frames, especially when no network blueprint exists, and to time-offset such intra-frames to minimize or prevent unacceptable video artifacts from being present in the video streams transmitted over such shared links, while using cost-effective, off-the-shelf cameras, without sacrificing video quality.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions and locations of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.
The apparatus and method components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.