The present disclosure relates to an image-based method and system for detecting vehicle occupant activities. The disclosure finds application in detecting certain activities such as electronic device use by a driver of a vehicle. However, it is to be appreciated that the present exemplary embodiments are also amendable to other like applications.
Mobile phone use (talking/texting) while driving is common, but widely considered dangerous. According to a recent government study of distracted driving, 995 out of 5474 (18%) who were killed by distracted drivers in 2009 were considered to be killed by drivers distracted by mobile phones. Due to the high number of accidents that are related to mobile phone use while driving, many jurisdictions, including many U.S. states, have made the use of a mobile phone and/or other devices while driving illegal. For example, at least ten U.S. states, Washington D.C., Puerto Rico, Guam and the U.S. Virgin Islands prohibit all drivers from using hand-held mobile phones while driving.
Many of the enacted laws are primary enforcement which means an officer may cite a driver for using a hand-held mobile phone without any other traffic offense taking place. However, to enforce the rules, current practice requires dispatching law enforcement officers at the road side to visually examine oncoming cars or having human operators manually examine image/video records to identify violators. Both of the processes are expensive, difficult, and ultimately ineffective. Therefore, there is a need for an automatic or semi-automatic solution.
A variety of approaches have been developed for detecting mobile phone use. In one approach, a sensor is installed in a vehicle (with an adjustable range) to detect cell phone usage within that range. Another approach uses a combination of bluetooth signals and vehicle speakers. Both of these approaches require special sensing devices besides a camera.
Yet another approach uses multi-spectral images or videos of individuals and analyzes the data to identify skin pixels and cell phone pixels within the image or the video based on a set of material characteristics. This approach requires special multispectral cameras (non-silicon based, e.g. indium gallium arsenide) and illuminators in the wavelength range of 1000 nm˜1700 nm, which are expensive (e.g., the camera can cost up to $50,000) compared to conventional silicon-based cameras of lower wavelength range (<1000 nm).
Past approaches that focus on object recognition by searching for objects (e.g., mobile phone) based on image content assumptions have not obtained a high level of accuracy. This approach is based on the assumption that different objects within the image, such as faces, seats, seat belts, and electronic devices are visible to the camera. Therefore, parts of the image are analyzed to determine a location of the objects and appearance characteristics, such as color, size, texture, and shape, etc., of the objects. In one example, the appearance characteristic can include spectral features, which can be extracted for detecting pixels belonging to the skin of an occupant. The extraction of the appearance characteristics can be performed via a feature representation of the object. The objects in the image that have characteristics that match a reference object (e.g. mobile phone), are associated as being the same as the reference object. In other words, the object is labeled as being an occupant or a seat or a mobile phone, etc.
One problem associated with conventional object detection is that variations in the captured image can result in incorrect classifications. Moreover, when the particular object is fully or partially obscured, conventional object detection will most likely fail. For example, the object recognition approach may incorrectly classify an image as having a cell phone when a driver is holding another object, such as box of cigarettes. In this instance, the appearance characteristics that are extracted from the image match those of a mobile phone. In another variation in which an occupant is holding a phone to their ear, the object recognition approach may incorrectly classify an image as not having a cell phone, particularly when the mobile phone is partially or wholly obscured from the camera.
Accordingly, there is a need for an improved and more accurate automatic or semi-automatic detection of occupant activities that does not require special equipment or sensors. A system and a method are needed that classify an entire windshield and/or cabin region instead of searching for specific objects situated inside parts of the image and using appearance and spectral features. More specifically, there is needed an approach that makes no assumptions about the content of images in advance of the process.