Video are generally composed from a series of images presented over time. An image may be presented as units of information known as picture elements. Each picture element may represent an element of an image as data having numeric value. Picture elements may be referred to as pixels. Pixels may represent visible characteristics of an image. Visible characteristics of an image include brightness and color. Many pixels are composed of multiple components representing various visible characteristics. For example, some pixels may contain numeric values representing the intensity levels of more than one color at a specific location in an image. Users of video include individuals, organizations, computer applications, and electronic devices. Some video may be generated by computer applications. Some video may be generated by a device known as a video camera. A video camera may capture a series of optically sensed images and transform the images into a video stream. Many video streams are composed of units of video data known as video frames. Some video frames include an entire image. Many video frames include parts of an image. Video streams may contain many images. The pixels of many images may represent large amounts of data.
A video data stream may be transmitted over a network, stored in a file system, or processed in various ways. For example, some video is processed to reduce the amount of data required to store or transmit the video. Reducing the amount of data in a video may be accomplished through a process known as video compression. Video compression may operate at various levels of a video stream to reduce the amount of data required to store, transmit, or process the video stream. For example, video compression may operate to reduce the data required to represent a single image, by processing regions of an image to eliminate redundant data from the image. Regions of an image to be processed may be known as macroblocks. Each macroblock of an image may be composed of pixels in close proximity to other pixels within the macroblock. Pixels in close proximity to other pixels may share similar characteristics that may make some pixels or macroblocks redundant. Some characteristics that may be similar enough to allow redundant data to be eliminated may include values of color or intensity at the pixel level. Some characteristics that may be similar enough to allow redundant data to be eliminated may include values of quantization parameters or discrete cosine transform (DCT) coefficients at the macroblock level. Eliminating redundant pixels or macroblocks from an image may help reduce the data required to represent the image.
Some video compression processes may reduce the data required to represent a series of images. Video may be compressed by eliminating data redundant in a series of images. For example, a video stream may be compressed by choosing a frame as a reference frame, and eliminating redundant data from a series of subsequent frames. A reference frame in a compressed video stream may be referred to as a key-frame, index frame, or an I-frame. Data redundant in a series of frames or images may be identified as the pixels in each frame that do not change relative to the reference frame. The data that is not redundant may then be identified as the pixels in each frame that change, in value, position, or other characteristic, relative to a prior frame. Pixels that change relative to a prior frame may be referred to as video frame delta information. Many video compression processes encode video frame delta information relative to each successive frame. Frames in a compressed video stream that contain video frame delta information may be referred to as delta frames. Delta frames in a compressed video stream may be B-frames, P-frames, or D-frames. Some compressed video stream delta frames encode only pixels which have changed relative to a delta frame or a key-frame. In many video streams, movement of pixels between frames may be encoded as motion vectors in the video frame delta information.
Video may also be processed to detect whether features or events occur in the video. Image processing techniques including filtering, edge detection, and template matching may be employed to detect or identify an object in a video stream. Image filtering techniques may be used to refine an image to discriminate regions of interest from background noise in an image. Some detection processes may use edge detection methods to refine or sharpen boundaries of a region of interest, increasing the signal to noise ratio to aid in identifying the region. Template matching is used in some systems, to compare a template representative of the structural form of a potential object to the structural form of a region of interest. In some systems, a template matching procedure may result in a better score for an object having a structure similar to the template. When an object is identified as a video feature, it may be of interest to determine if the object is moving. Some systems may determine if and how an object may be moving based on techniques such as optical flow.
Some video systems may be configured to detect events, such as threats. For example, template matching may be used to identify a series of images based on comparisons to templates representative of various threats. Once an object has been identified, and a potential threat is suspected based on a template match, optical flow techniques may be employed to determine if the object is moving toward a protected region. Detecting threats in video streams may require many computationally intensive operations, including operations such as image segmentation, image filtering, edge detection, and template matching. Many video streams contain many images, with large regions of interest representing significant amounts of data to be transmitted and processed by threat detection systems. Due to the large amounts of data that need to be transmitted and processed, detecting threats in video may be slow, and may require prohibitively expensive, specialized processing hardware.
Therefore, there is a need in the art for a system and method for identifying and detecting events in video streams in an efficient and effective manner. Specifically, there is a need in the art for a system and method for identifying and detecting events in compressed video streams in order to obviate the need for processing compressed video into uncompressed formats prior to identifying and detecting such events.