Sponsored advertising is a large and dynamic business segment with more than $55 billion spent in 2014. The resulting ecosystem of sponsored advertising includes measurement for potential value of targets (teams, celebrity, retail, stadium spaces) and actual value as measured by “earned viewership” or promotion of the advertising brand. Harvesting of user generated content for displaying or content marketing is another business segment enabled by logo recognition systems. Additionally “competitive brand intelligence” of all media content including online videos, broadcast or streaming video, social images and outdoor display is another use case for more accurate logo recognition systems. Other applications include measurement of product placement within stores, detection and localization of products in retail aisles for a better shopping experience and to provide information for retail management. Additionally, other applications include logistics and industrial applications.
However, current solutions for logo recognition have various limitations. One constraint is time and cost to train a system to recognize new logos due in part to the effort to collect large numbers of trainable images. Another limitation is the accuracy to detect various types of logos in the presence of significant warp, occlusion, blur and varying lighting conditions. Another limitation of general current solutions is a weakness in detecting tiny and often distorted logos on cloth, such as logos located on banners and apparel. Another weakness of such systems is the limited number of logos that can be recognized which is often limited due to accuracy of both current feature detectors that use bag of words methods and learning methods such as neural network classifiers.
In one or more of its several aspects, the present invention addresses problems such as those described above. For example, a method for logo recognition in accordance with an aspect of the present invention may suitably use saliency analysis, segmentation techniques, and character stroke analysis as addressed further herein to segment likely logo regions. Saliency detection relies on the fact that logos have significant information content compared to the background. Multi-scale similarity comparison is performed to remove less interesting regions such as text strings within a sea of text or objects in large sets of objects, such as faces in a sea of faces.
To achieve high robustness and accuracy of detection, multiple methods are used to recognize a logo in images and videos and further verify with feature matching and neural net classification, the accuracy of detection of a likely logo. The methods for logo recognition include feature extraction, signature representation, matching, making use of neural network classification, and optical character recognition.
One aspect of the invention presents a method for optical character recognition (OCR) with a character based segmentation and multi character classifiers. Another method uses stroke analysis and heuristics to select one or more text classifiers for use in recognition. An alternate method for OCR performs segment level character recognition with one or more of selected text classifiers and N-gram matching as addressed further herein.
Another aspect of the invention presents a first method to train classifiers to new objects and logos with synthetically generated images. Another aspect utilizes a second method that utilizes transfer learning features of neural networks. In transfer learning, a neural network is trained to learn interesting and important features for classification, and the trained network is then fine-tuned with a specific training set. The neural network is trained with a large set of images including images that may not be relevant to the classified categories and the neural network and weights are saved. These saved weights and the neural network configuration are improved with further training with the specific logo categories that need classification, while refining the neural network and training a new classification layer. The method using synthetic images for training and the method for transfer learning enable fast addition of new logos into a recognition system, and can be further refined with more data and feedback to improve accuracy.
Another aspect of the invention improves and extends the methods for feature based signature generation. One method combines neighboring detected keypoints with an affine Gaussian Hessian based detector to generate an additional keypoint having a larger feature keypoint region. The additional keypoint improves the accuracy of matching by providing more robust features that can help match the logo. Another method describes lines in the keypoint region to better represent line-based logos and objects and generates complementary and accurate signatures of the detected logo. The signatures generated with the extended feature methods may suitably be employed to detect logos in new images as part of an indexed search and correlation system.
Another aspect of the invention presents methods to verify and iterate around possible matching regions. A likely logo match is verified with a logo specific neural network classifier and a feature based matcher.
Another embodiment applies a method to detect a logo in images in video frames selected from a video stream. A saliency analysis and segmentation of selected regions are applied in a selected video frame to determine segmented likely logo regions. The segmented likely logo regions are processed with feature matching using correlation to generate a first match, neural network classification using a convolutional neural network to generate a second match, and text recognition using character segmentation and string matching to generate a third match. A most likely logo match is decided by combining results from the first match, the second match, and the third match.
Another embodiment addresses a method to detect a brand in images and video streams for broadcast video. A detected product and logos are tracked and segmented to measure and determine a brand. A location of the detected product is identified on a display. The logo is classified as wearable, banner, or fixture. The product and brand are mapped to a three dimensional (3D) map of an event where the product and logo were detected.
A further embodiment addresses a method to detect a specific brand in images and video streams. Luminance images at a scale in the x direction Sx and a different scale in the y direction Sy are accepted in a neural network. The neural network is trained with a set of training images for detected features associated with a specific brand.
These and other features, aspects, techniques and advantages of the present invention will be apparent to those skilled in the art from the following detailed description, taken together with the accompanying drawings and claims.