This application, and the innovations and related subject matter disclosed herein, (collectively referred to as the “disclosure”) generally concern digital signal processing techniques and digital signal processors (DSPs) implementing such techniques. More particularly but not exclusively, disclosed approaches for arranging data (and corresponding data structures) in connection with particular examples of digital signal processing techniques and DSPs can reduce the number of computations required to convolve a given signal, and thus can improve computational efficiency, reduce processing latency and, in particular instances, improve a user's audio experience. As but one particular example of innovative approaches for managing signal processing data to improve computational efficiency, innovative data structures are disclosed in connection with an approach for simulating multi-channel audio input signals (e.g., “surround sound” audio) to be played over a different number of audio output channels (e.g., two channels). Nonetheless, following a review of this disclosure and the innovative principles disclosed herein, those of ordinary skill in the art will appreciate the wide variety of data structures that can be used to improve computational efficiency of various digital signal processing techniques (e.g., in connection with other types of audio equipment, such as, for example, a beam-forming loudspeaker array, a crosstalk canceling stereo speaker, a multichannel echo canceller, and other systems involving real-time processing using a digital signal processor, such as, for example, wireless radios, stock market calculations, biomedical engineering measurements of nerve signals, to name but a small number of particular examples).
To aid the reader's understanding of one particular context in which disclosed memory management techniques and related systems can be used, the following is a brief overview of the fundamentals of spatial hearing, surround sound reproduction in rooms, and surround sound recording and mixing.
The earliest models of spatial hearing began with a simple model of the head as a sphere. This so-called “duplex theory,” presented by Lord Rayleigh, described how the inter-aural level differences (ILD) and inter-aural time differences (ITD) of a single sound source at the two ears allowed a subject to localize the source in a horizontal plane. However, this model revealed two curiosities: (1) sources in front of the subject and behind the subject were mirror images that provided the same ITD and ILD cues (the “cone of confusion”), and (2) there was no explanation of how the height of a source was determined. These curiosities lead many researchers in the early 20th century to investigate and measure how sound propagated from a source in a room to the ear drum. They determined that the effects from the pinna (the outer ear), the neck and torso, hair, and other facial features dramatically affect the spatial impression of a sound source.
FIG. 1 shows a sound field 100 of a common 7.1 surround sound system, where a listener 105 is standing in the center of seven sound sources or loudspeakers 110. When a loudspeaker 110 emits sound in a room, the sound received by the auditory periphery (e.g., the ear) can be completely determined by the two impulse responses measured from the loudspeaker to each of the left and right ears. These are called the head-related impulse responses (HRIRs), or, when referenced in the frequency domain, the head-related transfer functions (HRTFs). They are traditionally defined from measurements in an anechoic (reflection free) room, so the effect of a listening environment is not included in the HRIR or HRTF. When measured in a room they are referred to as the binaural room impulse response (BRIR) or the binaural room transfer function (BRTF).
FIG. 2 illustrates exemplary head-related transfer functions (HRTFs) 215 for three sound sources 220 in an anechoic room 200, where HRTF 215a corresponds to the left ear and HRTF 215b corresponds to the right ear. For example, a first portion 225a of the left ear HRTF 215a and a first portion 225b of the right ear HRTF 215b correspond to a sound source 220a that is directly in front of the listener 205. Similarly, a second portion 227a of the left ear HRTF 215a and a second portion 227b of the right ear HRTF 215b correspond to a sound source 220b that is directly to the right of the listener 205. And a third portion 229a of the left ear HRTF 215a and a third portion 229b of the right ear HRTF 215b correspond to a sound source 220c that is directly to the left of the listener 205. As shown in FIG. 2, the HRTFs 215 are a function of the loudspeaker angle and the distance from the listener. FIG. 2 further illustrates that the sound source 220a in front of the listener arrives at each ear simultaneously. However, a sound arriving to the left or right of center (e.g., from sound sources 220b or 220c) will first arrive at the nearest ear and then be attenuated by the head causing a time delay and a difference in the frequency-dependent level at the farther ear. The HRTFs 215 may also vary with head orientation (because the angle between the source and the look direction changes) and from person to person (due to anatomical differences in head shape, pinna shape, ear canal shape, etc.) (not shown).
There are many complicated mechanical transduction processes and neural processes that will also affect the judgment of the location of a sound source including multimodal interactions between the visual and auditory systems and classification of the sound and environment. However, it is assumed herein that the user is provided with sufficient visual and environmental information that does not conflict with the spatial location of a sound source. Taking this standpoint, the following discussion is primarily concerned with the acoustic signal that is received by the ear drum.
When a sound source, such as a loudspeaker, is placed in a room, its radiated sound characteristics are altered by the boundaries of the room. At low frequencies (e.g., below 200-300 Hz in a living room size space), the sound radiation is coupled to the room. Much like a musical instrument, a room responds efficiently to a certain set of vibration frequencies. The amplitude of these vibrational modes varies with source and listener position within the room, making them a function of both frequency and position. At higher frequencies, the walls in the room behave like acoustic mirrors, reflecting, transmitting, and absorbing incoming sound. There is still modal behavior, but due to the sheer number of reflections, many of the modes will overlap making such analysis impractical. Thus, the sound field at high frequencies is usually described statistically and geometrically (like in optics) and not in terms of modes. In time, a sound emanating from an acoustic source travels at a speed of just over 1 foot per millisecond. Treating the walls as mirror-like reflectors, where the angle of incidence equals angle of reflection, one can easily draw the many sound paths from a source to the listener in a room. FIG. 3 shows the first few sound paths 302 from a single source 310 to a single listener 305 in a rectangular room 300, with the direct sound illustrated in green, the first reflections illustrated in red, and the second reflections illustrated in blue. A common way of describing the reflections in the room is by its reverberation time, or the time it takes for the loudest reflection to fall 60 dB below the direct sound (one-fourth to one-half second in small rooms and many seconds in concert halls).
While there is still an amount of work to be done on defining how to build a good sounding listening room, enough work has been done to make the process more of a science than an art. It is important to note that rooms add a significantly beneficial effect to both music and speech. For example, the complex interaction between a violin, with its spatially varying sound radiation pattern, and the symphony hall create a sense of envelopment and spaciousness that cannot be achieved if the same violin is played outdoors (where there are almost no reflections) or in an anechoic chamber. The same is true for rooms and multichannel sound reproduction.
The BS.1116-1 recommendations published by the International Telecommunication Union-Radiocommunications (ITU-R) on broadcasting service encapsulate a modern understanding of how to create a listening room. Firstly, the dimensions of the room need to be designed so that at low frequencies the room modes are well distributed in frequency to avoid issues at low frequencies. Secondly, the reverberation time should be in the range of 250-500 milliseconds (ms) (depending on room volume) and consistent across the frequency range. This can be done with appropriate placement of absorption and diffusion on the walls. Thirdly, there should be a larger amount of lateral reflections than front-back reflections. This creates a sense of envelopment necessary to convey the feeling of being in a space. Finally, the room must be quiet enough to convey the full dynamic range of the music being played back. Due to masking effects, the level of background noise dramatically affects the hearing function.
As explained more fully below, convolution computations, even in the frequency domain, can involve large data sets requiring a large number of multiply-add operations, particularly when filter lengths become long. Consequently, a need exists for efficient computational methods for use with signal processing. Similarly, a need exists for eliminating unnecessary multiply-add operations from convolution computations. And, in context of computer-implemented convolution, a need remains for reducing processing latency associated with the aforementioned large data sets and large number of multiply-add operations.