The use of automatic content recognition (ACR) to enhance the user's experience of entertainment content is growing in popularity. Certain forms of ACR, such as digital watermarking and content fingerprinting, identify entertainment content, including TV shows, movies and songs, based on identity derived from the content signal in a format agnostic way. Robust content recognition can even identify content captured from the user's ambient environment through microphones or image sensors. These forms of recognition are referred to as “robust” because they are able to identify content specifically, despite changes that occur due to distortions incurred in the distribution channel, including channel coding like compression and digital to analog conversion. This independence from signal format and distribution channel affords flexibility and independence from the nuances and constraints of the particular means of distribution channel. It enables recognition triggered services to be delivered on an un-tethered mobile device as it samples signals from its environment through its sensors.
In a world where users get exposed to various forms of entertainment content, ACR is particularly useful when provided in connection with personal mobile devices. When connected with an ACR computing service, the user's mobile device can enhance the user's experience of content by identifying the content and providing access to a variety of related services.
ACR services have become more common with the proliferation of mobile device software markets and associated cloud services. Mobile devices and the associated cloud infrastructure provide an excellent platform for ACR. The mobile device platform is advantageous because users carry mobile devices everywhere, and these devices are often connected to networks (e.g., via wi-fi and cell networks), have multiple sensors to sense audio and visual signals around the user, and are increasingly more versatile and powerful computing devices. When coupled with additional processing power in the cloud, they can both tap the cloud's computing power to assist in identification and deliver relevant network services.
Initial ACR applications focused on providing basic song or program identification services and opportunities to buy related products and services and share information about content with friends. ACR applications have developed separately around particular content recognition functions such as music recognition, image recognition and video programming recognition. More recently, vendors of these services have sought to generalize the functionality across media types and expand the types of services triggered as a result of a recognition event.
In the television industry, for example, ACR has been used to enhance a TV viewer's experience while watching a show on a primary screen by providing associated services on the viewer's personal mobile device, dubbed the “second screen.” The ACR system, in such applications, includes a mobile application that operates on the user's device, and a computing service (e.g., in the cloud) that interacts with the mobile application to provide content recognition and/or delivery of network services associated with content once it has been recognized.
At this stage of development of ACR applications in the entertainment space, functionality is limited by a number of factors. Current recognition applications are limited to operating in discrete recognition modes in which the user prompts the application to recognize a song or show, and the application proceeds to identify it and provide associated information for that single recognition event.
A more sophisticated application for enhanced TV experiences requires that the ACR system synchronize with the show. Synchronizing means that the application keeps track of relative time location where the user is in the show during the viewing experience so that it can provide time relevant experiences, such as alternative story lines, time relevant program data, social network experiences tied to particular events in a show. The time offset relative to the program start or some other time reference of a signal stream is a proxy for program events within the stream. These might be within a show or at its boundaries with other shows or advertising. Typical viewing habits introduce discontinuities in the signal stream that make efficient signal recognition and synchronization challenging for some ACR technologies, particularly content fingerprinting. These discontinuities include, for example, channel surfing, time shifted viewing of previously recorded programs, fast forwarding and rewinding through a show, etc. User behavior can be hard to predict, and one cannot require the user to tell the application what he is doing. Instead, the application should preferably operate in the background in an efficient (i.e. low power consuming) passive recognition mode, effectively maintaining accurate recognition and synchronization, even as discontinuities occur.
There are two primary forms of content recognition in use for enhanced TV experiences. One is digital watermarking, and the other is content fingerprinting. The digital watermark provides data in the content signal that enables identification and synchronization. Content fingerprinting identifies the content signal by computing a sequence of content fingerprints and matching them with a database. It is more challenging to maintain synchronization with content fingerprinting, particularly if the system is intended to operate across many different shows and deal with a variety of user behavior that causes loss of synchronization.
For content fingerprint based ACR, the system designer might attempt “brute force” solutions to the challenges posed above for content recognition. For example, the ACR system might be programmed to operate continuously, to identify the show and relative time location within the show for a large database of shows, across a wide time range, accounting for time shifting, channel surfing (or more generally switching among signal source or just walking from one room to another with different devices simultaneously spewing programs via cable, Internet, over the air, disk, etc.). However, even in today's world, this is impractical. Even while computing power is increasing on mobile devices, such as smartphones and tablets, battery life remains a significant constraint. As such, it is important to limit unnecessary processing on the device, and also, to limit significant use of the device's radio to communicate with computing resources in the cloud. Brute force identification implies that the mobile device is continuously computing fingerprints and/or sending these fingerprints and associated content signals to a fingerprint database for identification. In modes where it is desired that the application operates autonomously (without requiring the user to initiate each signal identification query), the application needs a mechanism to use processing power and radio communication sparingly, yet sufficient to provide precise timing so that applications beyond mere program identification are enabled.