1. Field of the Invention
This invention relates generally to loudspeakers. More particularly, the invention relates to providing a model for predicting loudspeaker preferences by listeners based on multiple regression analysis utilizing objective measurements.
2. Related Art
Properly controlled listening tests on loudspeakers are difficult, time-consuming and expensive to perform. A more cost-effective solution is to utilize a model that accurately predicts listeners' subjective sound quality ratings based on objective measurements made on the loudspeaker. A few models have been proposed. In assessing such models, however, it becomes clear that there is little agreement about how the loudspeakers should be measured and in what types of environments they should be measured. Choices range from reverberation chambers, listening rooms, anechoic chambers, or a combination of these environments. Low-resolution, ⅓-octave, steady-state measurements appear to be popular choices even though they cannot accurately distinguish medium-high Q resonances from low-Q ones, the later being much more audible at low amplitudes. Opinions diverge widely about the relative importance of the direct, early-reflected and reverberant sounds produced by the loudspeaker in terms of their contribution to its perceived timbre and spatial attributes. These differences in opinion tend to dictate the choices of rooms and measurements employed by the models to predict loudspeaker sound quality. Most of the models have not been adequately tested or validated, which calls into question their accuracy and generalizability. Generalizability describes how well the model predicts sound quality when applied to a large population of loudspeakers and rooms.
Several sophisticated, perceptual-based objective measurements have been recently standardized for predicting the subjective quality of low-bit rate audio codecs. However, such models are optimized for characterizing forms of nonlinear distortions common to audio codecs rather than loudspeakers. Moreover, none of the current codec measurement models include the psychoacoustic effects related to the loudspeaker's complex frequency-dependent radiation properties and its interaction with the room. As these effects can significantly affect the properties of sound at the listeners' ears, they typically should be included in any model employed to predict loudspeaker sound quality.
Current predictive loudspeaker models may be categorized according to how they view the relative influence of the direct, early-reflected and reverberant sounds on listeners' overall impression of a loudspeaker. For instance, three quite different approaches have been taken in how and where the loudspeaker should be measured. One approach is to predict the sound quality utilizing sound power measurements, with the underlying assumption being that the total radiated sound power largely determines the loudspeaker's perceived quality in a room. A second approach is to model the loudspeaker's sound quality utilizing in-room loudspeaker measurements. A third approach is to predict the loudspeaker's sound quality utilizing a comprehensive set of anechoic measurements. In addition, one model utilizes a hybrid approach that combines the free-field on-axis response with an in-room or predicted in-room response.
Advocates of models based on sound power measurements believe that the loudspeaker's sound power response best characterizes what listeners hear in a listening room. One of the earliest sound power advocates was Rosenberg at the Swedish Consumer Testing organization in 1973. He reported good correlation between ⅓-octave speaker measurements performed in a reverberation chamber and listening tests performed by Gabrielsonn and him. However, Rosenberg never specified an exact model to predict his data. Around the same time, another sound power advocate, Staffeldt, argued that the steady-state ⅓-octave response of the loudspeaker better correlated with listening tests if the speaker was measured in-room at the listener location. Later in 1982, Staffeldt argued that the measurement should take into account the directional properties of the ears, since he noted that the diffuse field sensitivity of the ear is higher at higher frequencies than in the direct sound field. He claimed that the timbre of two loudspeakers in two different rooms would be identical, so long as they had identical ⅓-octave spectra measured at the entrance to the ear canal. Unfortunately, Staffeldt's listening tests were based on only one listener and the room was rather large and reverberant. Staffeldt put rather large tolerances on the rooms for which the results apply (up to 1000 m3 with reverberation times less than 1 second). Staffeldt later proposed a model for predicting the timbre of a loudspeaker based on calculating the specific loudness of the ⅓-octave data.
The flat sound power criterion had a large contingent of support in the United States. In 1968, Bose argued that when a loudspeaker is properly placed with respect to the rear reflecting wall, the frequency response measured with respect to the total radiated acoustical energy should be flat. Other supporters of this view included Consumers Union (“CU”) in 1973.
During that period, CU developed an objective-based model based on the loudspeaker's calculated sound power response measured at ⅓-octave resolution in an anechoic chamber. The rationale for this was based on CU's belief that the loudspeaker's total power response predicts to a large degree the sound pressure response taken over several seats in a typical home listening room, and that flat sound power response is the best target. CU does several transformations to the raw sound power response to account for low frequency changes due room boundary effects and wall absorption. The raw sound power response is also adjusted in ⅓-octave bands according to loudness using Steven's Mark VII scheme. As the speaker deviates from equal loudness over a certain bandwidth the error is subtracted from its overall 100-point score. There are many theoretical arguments as to why the CU model might not work, including the accuracy of the loudness model used or even the appropriateness of applying such a model. However, the ultimate test is how accurately the model predicts listeners' sound quality ratings. Tests have established that no correlation is found to exist between listeners' loudspeaker preference ratings and CU's predicted accuracy scores (r=0.05; p=0.81). Thus, because the CU model is based largely on a loudspeaker's ⅓-octave sound power response, measured sound power alone does not accurately predict the perceived sound quality of the loudspeaker.
In 1990, Klippel reported a perceptual-based loudspeaker model for predicting various sound quality dimensions and overall sound quality. The model was based on a massive study involving seven different experiments designed to examine the influence of factors on loudspeaker quality such as listener experience, room acoustics, speaker directivity, program material and method of scaling (semantic differential versus MDS). A total of forty-five different loudspeakers (both real and simulated), three different rooms, thirteen programs and forty different listeners were compared. The rooms included an anechoic chamber, an IEC listening room and a small studio. Factorial analysis revealed seven unique dimensions such as clearness, treble stressing (sharpness), general and low bass emphasis, feeling of space, clearness in bass and brightness.
The subjective magnitude of each dimension could be predicted based on a combination of the ⅓-octave steady-state in-room frequency response measured at the listening position. Klippel claimed that the model could use either in-room measurements or anechoic data containing the on-axis and the calculated sound power responses. With this data and a simple model of the room, the predicted in-room curves agreed within 2-3 dB of the measured ones above 200 Hz. Below 200 Hz, room modes caused large (5-10 dB) deviation, which Klippel believed was not a problem since the deviations would be the same for all loudspeakers. It is not known how Klippel avoided these low frequency positional-related deviations in his listening tests without substituting the positions of the speakers. The final input to the model compared the measured response to an ideal reference with flat frequency response. Superimposed on the reference was the long-term average spectrum of the program to better predict listeners' impressions.
Using a modified loudness model, Klippel calculates the difference in loudness density between the reference and measured curves across each ⅓-octave center frequency using a critical bandwidth filter. The loudness differences are further transformed and weighted for each objective metric used to predict the subjective dimensions. The correlations between objective and subjective dimensions were quite high. Klippel found, however, that the feeling of space associated with loudspeaker directivity depended on the program. More directional speakers were preferred for speech compared to music.
For predicting overall sound quality (pleasantness and naturalness), multiple objective dimensions were selected and weighted on the basis of their high correlations with the overall quality ratings. Each dimension was expressed in terms of its defect or deviation from a predetermined “ideal” value. For naturalness, the three salient weighted dimensions included discoloration defects (DV), brightness defects (DH) and defects in the feeling of space (DR). For pleasantness, Klippel found DV and DH to be the most relevant parameters. The correlations here between predicted and observed values are not as consistently high as the individual sound-related dimensions. For pleasantness, correlation varies across tests from −0.32 to 0.94. For naturalness, correlation values range from 0.52 to 0.93. The sources of these large variations in correlation are not specified. Potential factors may have been differences in the listening rooms, programs, listeners and experimental procedure. This illustrates an important feature of developing any predictive model; it can only be as reliable and accurate as the subjective data on which it is based. The weakest link tends to be the reliability of the subjective data, not the objective data. Human beings are more prone to random errors in judgment than the computers performing the objective measurements.
In 1986, Toole published the results of a two-year study where forty-two listeners evaluated thirty-seven different loudspeakers. Good visual correlations were found between a set of comprehensive anechoic measurements and the listening test results. Toole argued that ⅓-octave in-room measurements lack the necessary frequency resolution to distinguish between low and medium-high Q resonances. This feature is important since the audibility of resonances varies significantly as a function of the resonances' frequency and Q-factor. In order to assess the audibility of resonances, Toole recommended a minimum frequency resolution of 1/20-octave.
Toole introduced the technique of spatially averaging several anechoic measurements to identify and separate resonances from diffraction and acoustic interference effects, which he believed to be less audible in listening rooms. By averaging certain sets of measurements made at specific angles, he was able to calculate and predict the frequency response of the direct, early-reflected and reverberant sounds in a typical room. Utilizing similar objective measurements, recent loudspeakers studies done in different rooms have shown similarly good correlations. However, to date, none have produced a model that uses the measurements to predict listeners' preference ratings. From these studies, it is clear that no one measure of loudspeaker sound output, direct, early-reflected or sound power (reverberant) is dominant at all frequencies. The inference is that the perception of sound quality embraces a combination of them all, weighted according to the reflectivity of the listening room.
It seems most logical that the in-room measurements at the listeners' ears would provide the closest representation of what the listener perceives. However, there are several problems. Steady-state in-room measurements average all of the direct, reflected and reverberant sounds together even though there is evidence that the human auditory system is quite good at processing and analyzing these three components separately. By doing so, these measurements dismiss the complex perceptual processes that two ears and a brain are capable of performing. For example, the direct sound triggers the precedence effect (forward temporal masking), binaural discrimination, in which the direction and timing of later arrivals affect their perception and various other directional and spatial effects.
Finally, there is evidence that equalizing the loudspeaker's sound power response to be flat results in lower preference ratings if the loudspeaker does not have constant (flat) directivity and the listener is not in a reverberant room. Most consumer loudspeakers do not have constant directivity. Typically, the directivity rises with increasing frequency. Equalizing the sound power of these loudspeakers to be flat will be done at the expense of the on-axis response, which will be too bright from the resulting upward spectral tilt at higher frequencies. This can lead to lower preference ratings. Finally, typical domestic listening rooms are not reverberant. On average, they have RT60 values of around 0.4 second.
In summary, three different approaches have been taken in measuring loudspeakers based on three different views on what factors best correlate with perceived sound quality: 1) ⅓-octave sound power measurements, 2) a perceptual model based on a combination of ⅓-octave direct and reverberant sounds, and 3) comprehensive, 1/20-octave, spatially-averaged, anechoic measurements performed at many angles. Two models have been proposed based on the flat sound power criterion while Klippel's model uses the second approach of a perceptual model based on a combination of ⅓ octave direct and reverberant sounds.
Therefore, there remains a need for providing an objective-based approach for predicting the loudspeaker preferences of listeners, which overcomes the disadvantages set forth above and others previously experienced.