Audio source localization uses one or more fixed sensors (microphones) to localize a moving sound source. The sound source of interest usually is a human voice or some other natural source of sound.
Reversing this scenario, sound signals transmitted from known locations can be used to determine the position of a moving sensor (e.g., a mobile device with a microphone) through the analysis of the received sounds from these sources. At any point of time, the relative positioning/orientation of the sources and sensors can be calculated using a combination of information known about the sources and derived from the signals captured in the sensor or a sensor array.
While traditional Global Positioning System (GPS) technologies are finding broad adoption in a variety of consumer devices, such technologies are not always effective or practical in some applications. Audio signal-based positioning can provide an alternative to traditional GPS because audio sources (e.g., loudspeakers) and sensors (e.g., microphones on mobile devices) are ubiquitous and relatively inexpensive, particularly in application domains where traditional GPS is ineffective or not cost effective. Applications of this technology include indoor navigation, in-store browsing, games and augmented reality.
Audio based positioning holds promise for indoor navigation because sound systems are commonly used for background sound and public address announcements, and thus, provide a low cost infrastructure in which a positioning network can be implemented. Audio based positioning also presents an alternative to traditional satellite based GPS, which is not reliable indoors. Indoor navigation enabled on a mobile handset enables the user to locate items in a store or other venue. It also enables navigation guidance to the user via the mobile handset via directions and interactive maps presented on the handset.
Audio based positioning also enables in-store browsing based on user location on mobile handsets. This provides benefits for the customer, who can learn about products at particular locations, and for the store owner, who can gather market intelligence to better serve customers and more effectively configure product offerings to maximize sales.
Audio based positioning enables location based game features. Again, since microphones are common on mobile phones and these devices are increasingly used as game platforms, the combination of audio based positioning with game applications provides a cost effective way to enable location based features for games where other location services are unreliable.
Augmented reality applications use sensors on mobile devices to determine the position and orientation of the devices. Using this information, the devices can then “augment” the user's view of surrounding area with synthetically generated graphics that are constructed using a spatial coordinate system of the neighboring area constructed form the devices location, orientation and possible other sensed context information. For example, computer generated graphics are superimposed on a representation of the surrounding area (e.g., based on video captured through the device's camera, or through an interactive 2D or 3D map constructed from a map database and location/orientation of the device).
Though audio positioning systems hold promise as an alternative to traditional satellite based GPS, many challenges remain in developing practical implementations. To be a viable low cost alternative, audio positioning technology should integrate easily with typical consumer audio equipment that is already in use in environments where location based services are desired. This constraint makes systems that require the integration of complex components less attractive.
Another challenge is signal interference and degradation that makes it difficult to derive location from audio signals captured in a mobile device. Signal interference can come from a variety of sources, such as echoes/reverberation from walls and other objects in the vicinity. Data signals for positioning can also encounter interference from other audio sources, ambient noise, and noise introduced in the signal generation, playback and capture equipment.
Positioning systems rely on the accuracy and reliability of the data obtained through analysis of the signals captured from sources. For sources at fixed locations, the location of each source can be treated as a known parameter stored in a table in which identification of the signal source indexes the source location. This approach, of course, requires accurate identification of the source. Positioning systems that calculate position based on time of arrival or time of flight require synchronization or calibration relative to a master clock. Signal detection must be sufficiently quick for real time calculation and yet accurate enough to provide position within desired error constraints.
Positioning systems that use signal strength as a measure of distance from a source require reliable schemes to determine the signal strength and derive a distance from the strength within error tolerances of the application.
These design challenges can be surmounted by engineering special purpose equipment to meet desired error tolerances. Yet such special purpose equipment is not always practical or cost effective for wide spread deployment. When designing a positioning system for existing audio playback equipment and mobile telephone receivers, the signal generation and capture processes need to be designed for ease of integration and to overcome the errors introduced in these environments. These constraints place limits on the complexity of equipment that is used to introduce positioning signals. A typical configuration is comprised of conventional loudspeakers driven by conventional audio components in a space where location based services add value and other forms of GPS do not work well, such as indoor shopping facilities and other public venues.
The audio playback and microphone capture in typical mobile devices constrain the nature of the source signal. In particular, the source signal must be detectable from an ambient signal captured by such microphones. As a practical matter, these source signals must be in the human audible frequency range to be reliably captured because the frequency response of the microphones on these devices is tuned for this range, and in particular, for human speech. This gives rise to another constraint in that the source audio signals have to be tolerable to the listeners in the vicinity. Thus, while there is some flexibility in the design of the audio signal sources, they must be tolerable to listeners and they must not interfere with other purposes of the audio playback equipment, such as to provide background music, information messages to shoppers, and other public address functions.
Digital watermarking presents a viable option for conveying source signals for a positioning system because it enables integration of a data channel within the audio programming played in conventional public address systems. Digital watermarks embed data within the typical audio content of the system without perceptibly degrading the audio quality relative to its primary function of providing audio programming such as music entertainment and speech. In addition, audio digital watermarking schemes using robust encoding techniques can be accurately detected from ambient audio, even in the presence of room echoes and noise sources.
Robustness is achieved using a combination of techniques. These techniques include modulating robust features of the audio with a data signal (below desired quality level from a listener perspective) so that the data survives signal degradation. The data signal is more robustly encoded without degrading audio quality by taking human auditory system into account to adapt the data signal to the host content. Robust data signal coding techniques like spread spectrum encoding and error correction improve data reliability. Optimizing the detector through knowledge of the host signal and data carrier enable weak data signal detection, even from degraded audio signals.
Using these advances in robust watermarking, robust detection of audio watermarks is achievable from ambient audio captured through the microphone in a mobile device, such as a cell phone or tablet PC. As a useful construct to design audio watermarking for this application, one can devise the watermarking scheme to enhance robustness at two levels within the signal communication protocol: the signal feature modulation level and the data signal encoding level. The signal feature modulation level is the level that specifies the features of the host audio signal that are modified to convey an auxiliary data signal. The data signal encoding level specifies how data symbols are encoded into a data signal. Thus, a watermarking process can be thought of as having two layers of signal generation in a communication protocol: data signal formation to convey a variable sequence of message symbols, and feature modulation to insert the data signal into the host audio signal. These protocol levels are not necessarily independent. Some schemes take advantage of feature analysis of the host signal to determine the feature modification that corresponds to a desired data symbol to be encoded in a sequence of message symbols. Another consideration is the use of synchronization and calibration signals. A portion of the data signal is allocated to the task of initial detection and synchronization.
When designing the feature modulation level of the watermarking scheme for a positioning application in mobile devices, one should select a feature modulation that is robust to degradation expected in ambient capture. Robust audio features that are modulated with an auxiliary data signal to hide the data in a host audio program in these environments include features that can be accumulated over a detection window, such as energy at frequency locations (e.g., in schemes that modulate frequency tones adapted using audio masking models to mask audibility of the modulation). The insertion of echoes can also be used to modulate robust features that can be accumulated over time, like autocorrelation. This accumulation enables energy from weak signals to be added constructively to produce a composite signal from data can be more reliably decoded.
When designing the data signal coding level for a positioning application, one should consider techniques that can be used to overcome signal errors introduced in the context of ambient capture. Spread spectrum data signal coding (e.g., direct sequence and channel hopping), and soft decision error correction improve robustness and reliability of audio watermarks using these modulation techniques. Direct sequence spread spectrum coding spreads a message symbol over a carrier signal (typically a pseudorandom carrier) by modulating the carrier with a message symbol (e.g., multiplying a binary antipodal carrier by 1 or −1 to represent a binary 1 or 0 symbol). Alternatively, a symbol alphabet can be constructed using a set of fixed, orthogonal carriers. Within the data signal coding level, additional sub-levels of signal coding can be applied, such as repetition coding of portions of the message, and error correction coding, such as convolution coding and block codes. One aspect of data signal coding that is directly related to feature modulation is the mapping of the data signal to features that represent candidate feature modulation locations within the feature space. Of course, if the feature itself is a quantity calculated from a group of samples, such as time segment of an audio clip, the feature modulation location corresponds to the group of samples and the feature of that group.
One approach is to format a message into an encoded data signal packet comprising a set of encoded symbols, and then multiplex packets onto corresponding groups of feature modulation locations. The multiplexing scheme can vary the mapping over time, or repeat the same mapping with each repetition of the same packet.
The designer of the data encoding scheme will recognize that there is interplay among the data encoding and mapping schemes. For example, elements (e.g., chips) of the modulated carrier in a direct sequence spread spectrum method are mapped to features in a fixed pattern or a variable scattering. Similarly, one way to implement hopping is to scatter or vary the mapping of encoded data symbols to feature modulation locations over the feature space, which may be specified in terms of discrete time or frequencies.
Robust watermark readers exploit these robustness enhancements to recover the data reliably from ambient audio capture through a mobile device's microphone. The modulation of robust features minimizes the impact of signal interference on signal degradation. The reader first filters the captured audio signal to isolate the modulated features. It accumulates estimates of the modifications made to robust features at known feature modulation locations. In particular, it performs initial detection and synchronization to identify a synchronization component of the embedded data signal. This component is typically redundantly encoded over a detection window so that the embedded signal to noise ratio is increased through accumulation. Estimates are weighted based on correspondence with expected watermark data (e.g., a correlation metric or count of detected symbols matching expected symbols). Using the inverse of the mapping function, estimates of the encoded data signal representing synchronization and variable message payload are distinguished and instances of encoded data corresponding to the same encoded message symbols from various embedding locations are aggregated. For example, if a spreading sequence is used, the estimates of the chips are aggregated through demodulation with the carrier. Periodically, buffers storing the accumulated estimates of encoded data provide an encoded data sequence for error correction decoding. If valid message payload sequences are detected using error detection, the message payload is output as a successful detection.
While these and other robust watermarking approaches enhance the robustness and reliability in ambient capture applications, the constraints necessary to compute positioning information present challenges. The positioning system preferably should be able to compute the positioning information quickly and accurately to provide relevant location and/or device orientation feedback to the user as he or she moves. Thus, there is a trade-off between robustness, which tends toward longer detection windows, and real time response, which tends toward a shorter detection window. In addition, some location based techniques based on relative time of arrival rely on accurate synchronization of source signal transmissions and the ability to determine the difference in arrival of signals from different sources.
Alternative approaches that rely on strength of signal metrics can also leverage watermarking techniques. For example, the strength of the watermark signal can be an indicator of distance from a source. There are several potential ways to design watermark signals such that strength measurements of these signals after ambient capture in a mobile device can be translated into distance of the mobile device from a source. In this case, the watermarks from different sources need to be differentiated so that the watermark signal from each can be analyzed.
The above approaches take advantage of the ability to differentiate among different sources. One proposed configuration to accomplish this is to insert a unique watermark signal into each source. This unique signal is assigned to the source and source location in a database. By identifying the unique signal, a positioning system can determine its source location by finding it in the database. This approach potentially increases the implementation cost by requiring additional circuitry or signal processing to make the signal unique from each source. For audio systems that comprise several speakers distributed throughout a building, the cost of making each signal unique yet and reliably identifiable can be prohibitive for many applications. Thus, there is a need for low cost means to make a source or a group of neighboring sources unique for the purpose of determining where a mobile device is within a network of sources.
Digital watermarks can be used to differentiate streams of audio that all sound generally the same. However, some digital watermark signaling may have the disadvantage that the host audio is a source of interference to the digital watermark signal embedded in it. Some forms of digital watermarking use an informed embedding in which the detector does not treat the host as interfering noise. These approaches raise other challenges, particularly in the area of signal robustness. This may lead the signal designer to alternative signaling techniques that are robust techniques for conveying source identification through the audio being played through the audio playback system.
One alternative is to use a form of pattern recognition or content fingerprinting in which unique source locations are associated with unique audio program material. This program material can be music or other un-obtrusive background sounds. To differentiate sources, the sounds played through distinct sources are selected or altered to have distinguishing characteristics that can be detected by extracting the unique characteristics from the received signal and matching them with a database of pre-registered patterns stored along with the location of the source (or a neighborhood area formed by a set of neighboring sources that transmit identical sounds). One approach is to generate unique versions of the same background sounds by creating versions from a master sound that have unique frequency or phase characteristics. These unique characteristics are extracted and detected by matching them with the unique characteristics of a finite library of known source signals.
The approaches of inserting a digital watermark or generating unique versions of similarly sounding audio share some fundamental principles in that the task is to design a signaling means in which sources sound the same, yet the detector can differentiate them and look up locations parameters associated with the unique signal payload or content feature pattern. Hybrid approaches are also an option. One approach is to design synthetic signals that convey a digital payload like a watermark, yet are themselves the background sound that is played into the ambient environment of a building or venue where the audio based positioning system is implemented. For example, the data encoding layer of a watermark system can be used to generate data signal that is then shaped or adapted into a pleasing background sound, such as the sound of a water feature, ocean waves or an innocuous background noise. Stated another way, the data signal itself is selected or altered into a form that has some pleasing qualities to the listener, or even simulates music. Unique data signals can be generated from structured audio (e.g., MIDI representations) as distinct collections of tones or melodies that sound similar, yet distinguish the sources.
One particular example of a system for producing “innocuous” background sound is a sound masking system. This type of system adds natural or artificial sound into an environment to cover up unwanted sound using auditory masking. White noise generators are form of sound masking system that uses a white noise type audio signal to mask other sounds. One supplier of these types of systems is Cambridge Sound Management, LLC, of Cambridge, Mass. In addition to providing sound masking, these systems include auxiliary inputs for paging or music distribution. The system comprises control modules that control zones, each having zone having several speakers (e.g., the module independently controls the volume, time of day masking, equalization and auto-ramping for each zone). Each control modules is configurable and controllable via browser based software running on a computer that is connected to the module through a computer network or direct connection.
Another hardware configuration for generating background audio is a network of wireless speakers driven by a network controller. These systems reduce the need for wired connections between audio playback systems and speakers. Yet there is still a need for a cost effective means to integrate a signaling technology that enables the receiver to differentiate sources that otherwise would transmit the same signals.
In this disclosure, we describe methods and systems for implementing positioning systems for mobile devices. There is a particular emphasis on using existing signal generation and capture infrastructure, such as existing audio or RF signal generation in environments where traditional GPS is not practical or effective.
One method detailed in this disclosure is a method of determining position of a mobile device. In this method, the mobile device receives audio signals from two or more different audio sources via its microphone. The audio signals are integrated into the normal operation of an audio playback system that provides background sound and public address functionality. As such, the audio signals sound substantially similar to a human listener, yet have different characteristics to distinguish among the different audio sources. The audio signals are distinguished from each other based on distinguishing characteristics determined from the audio signals. Based on identifying particular audio sources, the location of the particular audio sources is determined (e.g., by finding the coordinates of the source corresponding to the identifying characteristics). The position of the mobile device is determined based on the locations of the particular audio sources.
Particular sources can be identified by introducing layers of unique signal characteristics, such as patterns of signal alterations, encoded digital data signals, etc. In particular, a first layer identifies a group of neighboring sources in a network, and a second layer identifies a particular source. Once the sources are accurately distinguished, the receiver then looks up the corresponding source coordinates, which then feed into a position calculator. Position of the mobile device is then refined based on coordinates of the source signals and other attributes derived from the source signals.
Additional technologies detailed in this document include methods for generating the source signals and associated positioning systems.
These techniques enable a variety of positioning methods and systems. One such system determines location based on source device location and relative time of arrival of signals from the sources. Another determines location based on relative strength of signal from the sources. For example, a source with the strongest signal provides an estimate of position of the mobile device. Additional accuracy of the location can be calculated by deriving an estimate of distance from source based on signal strength metrics.
The above-summarized methods are implemented in whole or in part as instructions (e.g., software or firmware for execution on one or more programmable processors), circuits, or a combination of circuits and instructions executed on programmable processors.
One form of technology described below are methods for indoor positioning of mobile devices in a venue. These methods derive positioning of a mobile device based on sounds captured by the microphone of the mobile device from the ambient environment. These techniques are particularly suited to operate on smartphones, where the sounds are captured using microphone that captures sounds in a frequency range of human hearing (the human auditory range). Thus, while the capture range of the device may be broader, the method is designed to use existing sound capture on these devices. These methods include various processes, including determining a position of the mobile device in the venue based on identification of the audio signal, monitoring position of mobile devices, and generating position based alerts on an output device of the mobile device when the position of the mobile device is within a pre-determined position associated with the position based alert.
These methods can be extended with a variety of features that support mapping of navigation paths in real time, displaying alternative paths, and deriving and generating navigation feedback from a variety of forms of input. This input can be direct from the user or other users through messaging, or indirect, where the input is inferred from contextual information. Examples include navigation based on shopping lists entered by the user, product recommendations from messaging systems, product preferences inferred from user context (such as transaction history, calendar of activities, etc.), and product preferences obtained from social networks. Navigation instructions in the form of paths in a venue such as a store may be computed in advance of a navigation session and updated in real-time during a session, with changing circumstances from the user's affinity group (social network posts or product tagging), changing user context, updated reminders from friends or family members, and changing conditions in the store, such as in-store promotions based on monitored traffic.
Aspects of the invention are implemented in mobile devices and in a network (e.g., cloud computing services offered on one or more server computers). As such, the invention encompasses methods, system and devices for navigation implemented in mobile devices, like wireless telephones, in network computing systems that provide location calculation, monitoring and navigation services, and in a combination of both. Implementations may be executed in one or more computers, including mobile devices and a network of servers in communication with the mobile devices.
For example, another aspect of the invention is a system for indoor navigation in a venue. The system comprises a configuration of audio sources, each transmitting a uniquely identifiable audio signal corresponding to a location. It also comprises one or more computers for receiving audio detection events from mobile devices in the venue. These detection events provide identifying information of audio sources in the venue. The computer (or computers) calculate mobile device location from the detection events, monitor position of the mobile devices at the venue, and send an alert to the mobile devices when the position of the mobile devices is at a position associated with the alert.
Additional aspects of the invention include methods implemented in instructions executing on mobile devices, server systems, or executing on a combination of both.
One aspect of the invention is a method for audio signaling for indoor positioning in a venue. A processor in the venue receives an audio signal program in which at least a first watermark layer is embedded. This first watermark layer conveys an identifier. For each of plural speakers in the venue, the audio signal program is altered to include distinguishing characteristics corresponding to a speaker from which the audio signal is to be played. The altered audio signal programs are transmitted to corresponding speakers in the venue for playback. From a mobile device, the following are received: a distinguishing characteristic corresponding to a first speaker and an auxiliary signal decoded from an electronic audio signal sensed by a microphone on the mobile device, the auxiliary signal comprising the identifier. Based on the distinguishing characteristic and the identifier, an alert associated with the first speaker is selected, and the alert is triggered for output on the mobile device.
Another aspect of the invention is a system comprising speakers and an audio playback system coupled to the speakers. The audio playback system comprises a signal processor configured to receive an audio signal program in which at least a first watermark layer is embedded, the first watermark layer conveying an identifier. The signal processor is configured to alter the audio signal program to include distinguishing characteristics corresponding to a speaker from which the audio signal is to be played. The system further comprises a networked computer configured to receive from a mobile device, a distinguishing characteristic corresponding to a first speaker and an auxiliary signal decoded from an electronic audio signal sensed by a microphone on the mobile device, the auxiliary signal comprising the identifier. The networked computer is programmed to select an alert associated with the first speaker based on the distinguishing characteristic and the identifier, and programmed to trigger the alert for output on the mobile device.
Further features will become apparent with reference to the following detailed description and accompanying drawings.