With the volume of vehicles using roadways today, traffic detection and management has become ever important. For example, control of intersections, detection of incidents, such as traffic accidents, and collection of data related to a traffic scene are all integral to maintaining and improving the state of traffic management and safety. Since the 1950s, point detection devices, such as in-ground inductive loops, have primarily been used for intersection control and traffic data collection. The in-ground inductive loops basically consist of wire loops placed in the pavement, detecting the presence of vehicles through magnetic induction.
Many limitations exist with point detection devices such as the inductive loops. Namely, the inductive loops are limited in area coverage for each individual loop, expensive to install, requiring a roadway to be dug up for their installation, and are difficult to maintain. Further, such point detectors possess substantial limitations in their ability to accurately assess a traffic scene and extract useful information relating to the scene. While point detection devices can detect the presence or absence of vehicles at a particular, fixed location, they cannot directly determine many other useful traffic parameters. Rather, they must determine such parameters through multiple detection and inference. For instance, to calculate the velocity of a vehicle, a traffic management system employing point detection devices requires at least two detection devices to determine the time between detection at two points, thereby resulting in a velocity measurement. Other methods of detection, such as ultrasonic and radar detection also possess similar limitations.
A traffic scene contains much more information than point detection devices can collect. While a point detection device can provide one bit of data, a video image can provide a 300,000 byte description of the scene. In addition to the wide-area coverage provided by video images, the image sequences capture the dynamic aspects of the traffic scene, for example at a rate of 30 images a second. Therefore, advanced traffic control technologies have employed machine vision, to improve the vehicle detection and information extraction at a traffic scene. These machine vision systems typically consist of a video camera overlooking a section of the roadway and a processor that processes the images received from the video camera. The processor then attempts to detect the presence of a vehicle and extract other traffic related information from the video image.
An example of such a machine vision system is described in U.S. Pat. No. 4,847,772 to Michalopoulos et al., and further described in Panos G. Michalopoulos, Vehicle Detection Video Through Image Processing: The Autoscope System, IEEE Transactions on Vehicular Technology, Vol. 40, No. 1, February 1991. The Michalopoulos et al. patent discloses a video detection system including a video camera for providing a video image of the traffic scene, means for selecting a portion of the image for processing, and processor means for processing the selected portion of the image.
The Michalopoulos et al. system can detect traffic in multiple locations, as specified by the user, using interactive graphics. The user manually selects detection lines, which consist of a column of pixels, within the image to detect vehicles as they cross the detection lines. While the manual placement of the detection lines within the image obviates the expense of placing inductance loops in the pavement as well as provides flexibility in detection placement, the Michalopoulos et al. system still roughly emulates the function of point detection systems. The system still detects vehicles at roughly fixed locations and derives traffic parameters by induction, using mathematical and statistical formulae. For example, the system classifies a vehicle based on its length and calculates velocity of a vehicle based on the known distance between detection locations divided by average travel time. Further, if a vehicle crosses through an area within the image where the user has not placed a detection line, the system will not detect the vehicle. Thus, the system does not automatically detect all vehicles within the image.
Before a machine vision system can perform any traffic management capabilities, the system must be able to detect vehicles within the video images. The Michalopoulos et al. system detects vehicles by analyzing the energy, intensity or reflectivity of every pixel in the predefined detection lines and comparing an instantaneous image at every pixel with a threshold derived from analysis of the background scene without the presence of any vehicles.
Other systems have utilized edge detection for detecting vehicles. These systems often perform "blob analysis" on the raw image, which constitutes a grouping of elements. The goal of such an analysis is determining which pixels belong together, based on pixel location, intensity and previous grouping decisions. The basic process may be described as region growing. First, the system picks a center pixel that it determines belongs in a grouping. Then, the system looks to neighboring pixels and determines whether to include the pixels in the grouping. This process continues for each included pixel. Blob detector of this type have run into difficulties because all the decisions are interdependent. Once the system has made initial decisions to include or exclude pixels, subsequent decisions will be based on the decisions already made. Thus, once the system makes an incorrect decision, future decisions are often also incorrect. This series of incorrect decision making may lead to failure of proper convergence. The same is true of edge detection based systems which rely on sequential decision processes.
A further desirable capability of machine vision systems is the capability to track the detected vehicles. Systems that track vehicles usually share some common characteristics. First, the system must identify the starting point of the track. The system may do this by detecting the vehicle by comparing an input image with a background image and judging objects having an area within a predetermined range as vehicles. Other systems perform motion detection to initiate the tracking sequence. Those systems using motion alone to initiate tracking are prone to errors because they must set some baseline amount of motion to initiate tracking. Thus, it is always possible for systems to fail to track slow moving or stalled vehicles.
After identifying a starting point, the systems perform a searching sequence. The systems have a current vehicle location, initially, the starting point. Then they look for potential displacement locations. The systems compare the potential displacement locations and select the location with the greatest suitability. They determine suitability by extracting a subimage region surrounding the current track location. Then, they displace the entire subimage region to potential new locations on the subsequent image frame. Thus, the systems perform a displacement of location and time. The systems perform a pixel-by-pixel correlation to determine which location's image best "matches" the previous location's image. This type of correlation runs into limitations because the system treats the background pixels the same as the pixels of the moving vehicle, thereby causing problems with matching. Further, since all pixel intensities are weighted equally in importance, large areas of uniformity, such as the hood of a vehicle, are redundant. In such areas of uniformity, the system will be able to match a majority of pixels, but still may not line up the boundaries of the vehicle. While the edges of the vehicle constitute a minority of the pixels, they are the pixels that are most important to line up.
Traffic detection, monitoring and vehicle classification and tracking all are used for traffic management. Traffic management is typically performed by a state Department of Transportation (DOT). A DOT control center is typically located in a central location, receiving video from numerous video cameras installed at roadway locations. The center also receives traffic information and statistics, from sensors such as inductive loop or machine vision systems. Traffic management engineers typically have terminals for alternately viewing video and traffic information. They scan the numerous video feeds to try to find "interesting scenes" such as traffic accidents or traffic jams. It is often difficult for traffic management engineers to locate a particular video feed which has the most interesting scene because they must perform a search to locate the video line containing the video feed with the interesting scene. Current traffic management systems also generate alarms based on inferred trends, which tell the traffic management engineers the location of a potentially interesting scene. Because the systems infer trends at a location, the systems require time for the trend to develop. Thus, a delay is present for systems which infer trends. After such delay, the traffic management engineers can then switch to the correct video feed.