Retail analytics is the study of how customers behave in shops. Understanding where people look, move, and what they buy can help shop owners advertise more effectively and improve their shop layouts. Traditionally, retail analytics is performed manually by using surveys, focus groups, analysts, and transaction histories. More recently, video surveillance cameras have been used to automate some of these tasks.
Object detection and tracking can be used to determine the number of people in a shop, and where they move within the shop. Events can be detected and flagged, including the detection of people running, going the wrong way through a one-way gate, going into a closed-off or restricted area, loitering, or abandoning or removing objects. Object recognition can be used to detect the presence and identity of objects, such as people, cars, etc. Object recognition can also be used to detect features of people, such as their age, sex, and where they are looking. Behaviour recognition can be used further to detect events such as fights, falls, and sitting or lying down.
Combining the information extracted from retail analytics into a summary can be difficult, however. A user will typically wish to summarise all the activity that has occurred in an area of interest over the course of a predetermined time period, such as a day or a week.
Some systems allow the user to see timelines with events marked on the timelines. This is somewhat useful for determining changing levels of activity during the day, but is much less useful for determining the locations or types of activity in the scene of interest.
Some systems allow playback of video captured from surveillance cameras in fast-motion playback, sometimes in combination with timelines and events. While the user can see the activity, the review process for the captured video is time consuming, and the user can miss details of the activities, as fast playback only provides a rough indication of the locations in which activity occurs. This review process also requires a machine that is capable of replaying the data captured or recorded by the video surveillance cameras.
Some systems improve video playback by automatically condensing sections of a video sequence. This is usually performed by identifying portions or sections of the video with different characteristics or properties and playing those sections at different speeds. The sections may include, for example, sections that contain no objects, sections that contain only objects that are stationary and not causing events, sections that contain objects that are moving but not causing events, and sections that are causing events. These systems further help the user to see activity, but only give a rough indication of locations within the scene in which activity occurs.
Some systems detect objects within a scene of interest and display a line traced through the centroids of the detected objects over a period of time, with the line superimposed over the current video frame. These systems improve the ability to determine locations of activity. However, for the purposes of providing a summary of activity, these systems do not give a good indication of traffic levels. The superimposition of object trails can sometimes give a misleading indication of the traffic levels. In addition, the traced lines do not show the original objects that contributed to those traced lines. These systems do not show the points on the scene that were touched by the objects detected in the scene, or indicate the object sizes, or show places in the scene where detected objects were stationary.
Some systems create average object detection maps over a time period and display the object detection maps as heat maps or opaque overlays on the scene. These systems have limitations that depend on the object detection technology. Systems that rely on motion detection do not accurately show areas where people are stationary. Systems that perform object detection combined with naïve average object detection maps are dominated by areas where people are stationary (“burn in”), unless tracking is used to associate objects over time and compensate for stationary periods. Furthermore, these systems are inflexible. Each object detection is given equal weight. If many objects have passed through a scene then some interesting trails may be hidden by the large number of other, overlapping trails.
Thus, a need exists to provide an improved method and system for providing a summary of activity in a scene of interest.