1. Field of the Invention
The invention relates to methods and systems for performing interpolation on head-related transfer functions (HRTFs) to generate interpolated HRTFs. More specifically, the invention relates to methods and systems for performing linear mixing on coupled HRTFs (i.e., on values which determine the coupled HRTFs) to determine interpolated HRTFs, for performing filtering with the interpolated HRTFs, and for predetermining the coupled HRTFs to have properties such that interpolation can be performed thereon in an especially desirable manner (by linear mixing).
2. Background of the Invention
Throughout this disclosure, including in the claims, the expression performing an operation “on” signals or data (e.g., filtering, scaling, or transforming the signals or data) is used in a broad sense to denote performing the operation directly on the signals or data, or on processed versions of the signals or data (e.g., on versions of the signals that have undergone preliminary filtering prior to performance of the operation thereon).
Throughout this disclosure including in the claims, the expression “linear mixing” of values (e.g., coefficients which determine head-related transfer functions) denotes determining a linear combination of the values. Herein, performing “linear interpolation” on head-related transfer functions (HRTFs) to determine an interpolated HRTF denotes performing linear mixing of the values which determine the HRTFs (determining a linear combination of such values) to determine values which determine the interpolated HRTF.
Throughout this disclosure including in the claims, the expression “system” is used in a broad sense to denote a device, system, or subsystem. For example, a subsystem that implements mapping may be referred to as a mapping system (or a mapper), and a system including such a subsystem (e.g., a system that performs various types of processing on audio input, in which the subsystem determines a transfer function for use in one of the processing operations) may also be referred to as a mapping system (or a mapper).
Throughout this disclosure, including in the claims, the term “render” denotes the process of converting an audio signal (e.g., a multi-channel audio signal) into one or more speaker feeds (where each speaker feed is an audio signal to be applied directly to a loudspeaker or to an amplifier and loudspeaker in series), or the process of converting an audio signal into one or more speaker feeds and converting the speaker feed(s) to sound using one or more loudspeakers. In the latter case, the rendering is sometimes referred to herein as rendering “by” the loudspeaker(s).
Throughout this disclosure, including in the claims, the terms “speaker” and “loudspeaker” are used synonymously to denote any sound-emitting transducer. This definition includes loudspeakers implemented as multiple transducers (e.g., woofer and tweeter).
Throughout this disclosure including in the claims, the verb “includes” is used in a broad sense to denote “is or includes,” and other forms of the verb “include” are used in the same broad sense. For example, the expression “a filter which includes a feedback filter” (or the expression “a filter including a feedback filter”) herein denotes either a filter which is a feedback filter (i.e., does not include a feedforward filter), or filter which includes a feedback filter (and at least one other filter).
Throughout this disclosure including in the claims, the term “virtualizer” (or “virtualizer system”) denotes a system coupled and configured to receive N input audio signals (indicative of sound from a set of source locations) and to generate M output audio signals for reproduction by a set of M physical speakers (e.g., headphones or loudspeakers) positioned at output locations different from the source locations, where each of N and M is a number greater than one. N can be equal to or different than M. A virtualizer generates (or attempts to generate) the output audio signals so that when reproduced, the listener perceives the reproduced signals as being emitted from the source locations rather than the output locations of the physical speakers (the source locations and output locations are relative to the listener). For example, in the case that M=2 and N=1, a virtualizer upmixes the input signal to generate left and right output signals for stereo playback (or playback by headphones). For another example, in the case that M=2 and N>3, a virtualizer downmixes the N input signals for stereo playback. In another example in which N=M=2, the input signals are indicative of sound from two rear source locations (behind the listener's head), and a virtualizer generates two output audio signals for reproduction by stereo loudspeakers positioned in front of the listener such that the listener perceives the reproduced signals as emitting from the source locations (behind the listener's head) rather than from the loudspeaker locations (in front of the listener's head).
Head-related Transfer Functions (“HRTFs”) are the filter characteristics (represented as impulse responses or frequency responses) that represent the way that sound in free space propagates to the two ears of a human subject. HRTFs vary from one person to another, and also vary depending on the angle of arrival of the acoustic waves. Application of a right ear HRTF filter (i.e., application of a filter having a right ear HRTF impulse response) to a sound signal, x(t), would produce an HRTF filtered signal, xR(t), indicative of the sound signal as it would be perceived by a listener after propagating in a specific arrival direction from a source to the listener's right ear. Application of a left ear HRTF filter (i.e., application of a filter having a left ear HRTF impulse response) to the sound signal, x(t), would produce an HRTF filtered signal, xL(t), indicative of the sound signal as it would be perceived by the listener after propagating in a specific arrival direction from a source to the listener's left ear.
Although HRTFs are often referred to herein as “impulse responses,” each such HRTF could alternatively be referred to by other expressions, including “transfer function,” “frequency response,” and “filter response.” One HRTF could be represented as an impulse response in the time domain or as a frequency response in the frequency domain.
We may define the direction of arrival in terms of Azimuth and Elevation angles (Az, El), or in terms of an (x,y,z) unit vector. For example, in FIG. 1, the arrival direction of sound (at listener 1's ears) may be defined in terms of an (x,y,z) unit vector, where the x and y axes are as shown, and the z axis is perpendicular to the plane of FIG. 1, and the sound's arrival direction may also defined in terms of the Azimuth angle Az shown (e.g., with an Elevation angle, El, equal to zero).
FIG. 2 shows the arrival direction of sound (emitted from source position S) at location L (e.g., the location of a listener's ear), defined in terms of an (x,y,z) unit vector, where the x, y, and z axes are as shown, and in terms of Azimuth angle Az and Elevation angle, El.
It is common to make measurements of HRTFs for individuals by emitting sound from different directions, and capturing the response at the ears of the listener. Measurements may be made close to the listener's eardrum, or at the entrance of the blocked ear canal, or by other methods that are well known in the art. The measured HRTF responses may be modified in a number of ways (also well known in the art) to compensate for the equalization of the loudspeaker used in the measurements, as well as to compensate for the equalization of headphones that will be used later in presentation of the binaural material to the listener.
A typical use of HRTFs is as filter responses for signal processing intended to create the illusion of 3D sound, for a listener wearing headphones. Other typical uses for HRTFs include the creation of improved playback of audio signals through loudspeakers. For example, it is conventional to use HRTFs to implement a virtualizer which generates output audio signals (in response to input audio signals indicative of sound from a set of source locations) such that, when the output audio signals are reproduced by speakers, they are perceived as being emitted from the source locations rather than the locations of the physical speakers (where the source locations and output locations are relative to the listener). Virtualizers can be implemented in a wide variety of multi-media devices that contain stereo loudspeakers (televisions, PCs, iPod docks), or are intended for use with stereo loudspeakers or headphones.
Virtual surround sound can help create the perception that there are more sources of sound than there are physical speakers (e.g., headphones or loudspeakers). Typically, at least two speakers are required for a normal listener to perceive reproduced sound as if it is emitting from multiple sound sources. It is conventional for virtual surround systems to use HRTFs to generate audio signals that, when reproduced by physical speakers (e.g., a pair of physical speakers) positioned in front of a listener are perceived at the listener's eardrums as sound from loudspeakers at any of a wide variety of positions (including positions behind the listener).
Most or all of the conventional uses of HRTFs would benefit from embodiments of the invention.