The exemplary embodiment relates generally to the detection of goods received gestures in surveillance video and finds particular application in connection with a system and method which allows for automatic classification and/or detection of goods received gestures in surveillance video.
Technological advancement and increased availability of surveillance technology over the past few decades has enabled companies to perform new tasks with surveillance video. Generally, companies capture and store video footage of retail settings for their own protection and for the security and protection of employees and customers. However, this video footage has uses beyond security and safety, such as its potential for data-mining and estimating consumer behavior and experience. Analysis of video footage may allow for slight improvements in efficiency or customer experience, which in the aggregate can have a large financial impact. Many retailers provide services that are heavily data driven and therefore have an interest in obtaining numerous customer and store metrics, such as queue lengths, experience time both in-store and for drive-through, specific order timing, order accuracy, and customer response.
Several corporations are patenting retail-setting applications for surveillance video beyond well-known security and safety applications. U.S. Pat. No. 5,465,115, issued Nov. 7, 1995, entitled VIDEO TRAFFIC MONITOR FOR RETAIL ESTABLISHMENTS AND THE LIKE, by Conrad et al., counts detected people and records the count according to the direction of movement of the people. U.S. Pat. No. 5,953,055, issued Sep. 14, 1999, entitled SYSTEM AND METHOD FOR DETECTING AND ANALYZING A QUEUE, by Huang et al., U.S. Pat. No. 5,581,625, issued Dec. 3, 1996, entitled STEREO VISION SYSTEM FOR COUNTING ITEMS IN A QUEUE, by Connel, and U.S. Pat. No. 6,195,121, issued Feb. 27, 2001, entitled SYSTEM AND METHOD FOR DETECTING AND ANALYZING A QUEUE, by Huang et. al, each disclose examples of monitoring queues. U.S. Pat. No. 6,654,047, issued Nov. 25, 2003, entitled METHOD OF AND DEVICE FOR ACQUIRING INFORMATION ON A TRAFFIC LINE OF PERSONS, by Lizaka, monitors groups of people within queues. U.S. Pat. No. 7,688,349, issued Mar. 30, 2010, entitled METHOD OF DETECTING AND TRACKING GROUPS OF PEOPLE, by Flickner et al., monitors various behaviors within a reception setting.
While the above-mentioned patents describe data mining applications related to video monitoring, none of them disclose the detection of goods received gestures within a retail or surveillance setting. Data driven retailers are showing increased interest in process-related data from which performance metrics can be extracted. One such performance metric is a customer's total experience time (TET) from which guidelines to improve order efficiency and customer satisfaction can be extracted. While prior art teaches how to estimate important components of the TET estimate such as queue length, no techniques have been disclosed on accurate estimation of goods received detection, which is a key element in TET measurement. Therefore, there is a need for a system and method that automatically detects and/or classifies goods received gestures in surveillance video.
In general, gesture recognition approaches have been based on modeling human movement. Many approaches use local image and video based approaches, as disclosed in LEARNING REALISTIC HUMAN ACTIONS FROM MOVIES, I. Laptev et al. (CVPR 2008), and RECOGNIZING HUMAN ACTIONS: A Local SVM Approach (ICPR 2004), each of these references describing modeling of the human shape during certain action. More recent approaches have employed space-time feature detectors and descriptors, as disclosed in EVALUATION OF LOCAL SPATIO-TEMPORAL FEATURES FOR ACTION RECOGNITION, by H. Wang et al. (BMVC 2009). These gesture recognition based approaches however have not been applied in the context of surveillance video retail applications, from which payment gestures can be detected using the technology disclosed herein.
A system and method for automatically detecting and/or classifying goods received gestures in surveillance video is desired. Successful detection of goods received gestures with a facile and low computational cost algorithm can prove to be an effective measure in aiding recent efforts by retailers to encapsulate a customer's experience through performance metrics. The disclosed methods and system focus on algorithmic processing of a video sequence to provide accurate detection of various goods received gestures at near real-time speeds.