The present invention generally relates to optical flow as it relates to pattern recognition and, more particularly, to systems and method for automatically using optical flow to select images of interest in order to detect objects, for example, in merchandise check out.
In many retail store environments, such as in grocery stores, department stores, office supply stores, home improvements stores, and the like, consumers use shopping carts to carry merchandise. A typical shopping cart includes a basket that is designed for storage of the consumer's merchandise. At times, a consumer will use the lower shelf space located below the shopping cart basket as additional storage space, especially for relatively large and/or bulky merchandise.
On occasion, when a consumer uses the lower shelf space to carry merchandise, the consumer can leave the store without paying for the merchandise on the lower shelf space. This may occur because the consumer inadvertently forgets to present the merchandise to the cashier during checkout, or because the consumer intends to defraud the store, steal the merchandise or collude with the cashier.
Recently, efforts have been undertaken to minimize or reduce bottom-of-the-basket (BoB) losses. Conventional systems, such as those marketed by Kart Saver, Inc. of Sacramento, Calif. and Store-Scan, Inc. of Scottsdale, Ariz. employ infrared sensors designed to detect the presence of merchandise located on the lower shelf of a shopping cart when the shopping cart enters a checkout lane. Disadvantageously, these systems are only able to detect the presence of an object and are not able to provide any indication as to the identity of the object. Consequently, these systems are, disadvantageously, relatively likely to give false positive indications. For example, these systems are unable to distinguish between, for example, merchandise located on the lower shelf of the shopping cart and a customer's leg or shoe. Further disadvantageously, these systems cannot be integrated with the store's existing checkout systems and instead rely on the cashier to recognize the merchandise and input appropriate associated information, such as the price of the merchandise, into the store's checkout system.
Video surveillance is another supplemental device that attempted to minimize or reduce bottom-of-the-basket losses. One example of a video surveillance device was formerly marketed by a company doing business as VerifEye, Inc. of Ontario, Canada. This system employed a video surveillance device mounted in the lane and directed at the bottom of the basket. A small color video display is mounted adjacent the register (or point of service) to aid the cashier in identifying if a BoB item exists. Again, disadvantageously, this system is not integrated with the point of service (POS), forcing reliance on the cashier to scan or key in the item. Consequently, the system productivity issues are ignored and collusions are not addressed. In one of VerifEye's systems, an option to log image, time and location is available. This configuration nonetheless does not recover the lost items.
As compared to mere object detection, object recognition requires image selection which represents the process of selecting a subset of from a sequence of images to be sent to the object recognition processing. The purpose of the image selection is to take the input of, for example, 30 images per second from the camera and select a small number of images so that the computer can process them fully. It is acceptable to queue-up a few images to be processed, but after about 10 seconds the data is no longer of any interest. Thus, there is a problem of balancing between selecting too many images and not selecting enough images.
A selection algorithm to perform the image selection should execute fast and select images with a high probability of seeing each item, for example, each item that is in the bottom of a shopping cart. Because typical object recognition may not always recognize items in the image due to several factors, including lighting and noise, using a single image may not suffice. On the other hand, if too many images get selected, then a point is reached where images must be dropped due to the lack of processing capacity or the fact that the images are no longer timely.
One image selection method proposed in the past is described in pp 84-90 of “An Invitation to 3-D Vision” by Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry, Springer-Verlag, New York, 2004). Some other conventional methods are summarized below.
The “blind” method simply selects images at the rate they can be processed. When the processing of one image is completed in the object recognition process, the next image captured from the camera is sent to the object recognition process. Under certain lighting conditions, this method works well where the object recognition process can process about 5 images per second. Thus, at every ⅕th of a second an image is processed, and with a reasonably slow moving item, enough images containing the item can be captured. However, at a processing rate of 1.5 Hz, which is about what should be expected, this method cannot process images at a rate sufficient to recognize all objects in a fast moving cart.
A method of motion detection by image subtraction uses a simple motion detector to compare each image to the one prior to it by subtracting the value of every pixel from the corresponding pixel in the other image. For very little CPU processing time, the computer can determine if the contents of the image have moved since the prior image (thus termed “motion detection”). When there is no motion in the image, there is no need to run the object recognition engine. When there is motion in the image, a sequence of images should be selected for processing based on the length (in time) of the motion. A simple motion detector is somewhat susceptible to noise and cannot determine the velocity or composition of the object in the image. Without the velocity information, it is impossible to determine the number of images or, more specifically, which images should be processed. In addition, this method cannot tell which direction the motion is in; thus, if someone drops something in front of the camera (vertical motion), the detector would falsely assume the item is passing by. Background motion in general, such as a cashier in the adjacent lane moving around, will also falsely set off the detector.
A method of use of an off-the-shelf motion detector to trigger the camera has the same problem as above of not being able to correctly select an appropriate set of images for the object recognition. The advantage is that the CPU sits completely idle while waiting for images to process, not using much power or generating heat in the meantime.
A method by use of external IR triggers or “trip sensors” provides an accurately-timed image to process, where the center of an item will be in good view. However, velocity information cannot be obtained due to the fact that there is only one measurement in this method. Without knowing the velocity of the item, it is impossible to select an appropriate set of images to process. This method, however, does not use the CPU until it is needed. Using more than one set of IR triggers may provide rudimentary velocity information, but it becomes expensive and prone to human intervention errors.
As can be seen, there is a need for an improved apparatus and method to select which images from a stream of images with which to recognize items, for example, when those items are located on the lower shelf of a shopping cart in the checkout lane of a retail store environment for the automated detection of merchandise.