The rapid spread of the Internet has brought with it a rush to develop newer and more effective means for using its communicative techniques, beyond mere text-based applications. Two new applications that have garnered interest are audio and video broadcasting. Both of these applications have a common problem: their utility suffers when the connection to the Internet is limited in bandwidth. Because of its greater demands on bandwidth, video broadcasting is particularly problematic for the bulk of the Internet end-users (i.e., clients) who use limited bandwidth connections.
One common method of delivering audio, such as music, on the Internet is the “downloading” of audio files to the client's computer. Digital audio files are also commonly copied and compressed into MPEG audio, or other formats, onto a compact disc (CD), personal player or a computer hard drive, where they may be listened to in a more favorable or portable listening environment, compared to streaming audio.
Another common form of Internet-delivered audio is streaming audio. “Streaming” refers to listening while downloading. Generally, the server has a very high bandwidth connection to the Internet, relative to the client's connection. In the use of streaming audio for music, an Internet host site (i.e., the “server”) provides live music concerts, disc-jockey selected music or archived music to the listening end user (i.e., the “client”) via an Internet connection. But due to the typical limited bandwidth connections of clients, streaming or downloaded (compressed) music is far from an ideal listening experience, particularly for clients accustomed to CD quality music.
The degradation of the listening experience can be traced to two main sources: the compromises made upon compressed signals to compensate for limited bandwidth transmission requirements or reduced file size needs for storage purposes, and poor listening environments of the client. With respect to the latter, Internet-downloading or downloaded music is frequently listened to on speakers attached to the client's computer, and, generally, little attention is paid to providing a good listening environment where the computer is situated. While recent efforts have been directed to ameliorate the limited channel bandwidth problem, the problem of the poor listening environment has yet to be satisfactorily resolved. Accordingly, it would be advantageous to provide for technological solutions that enhance the environment in which a client will receive and listen to sound signals received over a limited bandwidth connection. Furthermore, it would be advantageous to provide a system that can compensate for the distortion that results from compressing audio files into a smaller file size.
Performed music is composed of an extremely complex dynamic sound field. The constantly changing listening environment of audience members and musicians along with variances in timbre, meter and unpredictable live performance dynamics combine to create a unique and moving musical experience. A live sound field is created when instruments and voices, supported by environmental acoustics, meet to form a time domain based acoustical event. Each of these elements is in constant dynamic change. Room modes and nodes vary with listener position; music dynamics change with the artists' moods; even a listener's head position varies the experience from moment to moment.
Various schemes have been used by others to clarify voice and solo instruments in digital recordings. The most common method used in traditional enhancement techniques is the addition of harmonic distortion to the upper frequency range of the sound wave (“exciter”). But artificially injecting distortion into a stereo sound field creates user fatigue and discomfort over time. Enhancement processes based on “exciter” type processing often require a bass boost circuit to compensate for thinness created by over-emphasizing high frequency harmonics.
Another approach deployed in televisions and car stereos for clarity enhancement of a stereo waveform is the addition of a time delay circuit in the low frequency range along with a time delay circuit in the mid frequency range, where both delays are set to a fixed delay point relative to the high frequency range. The purpose of this circuit is not acoustical simulation, but speaker normalization and is meant to compensate for impedance in the speaker circuit causing frequency-dependant phase errors in an amplified and acoustically transduced sound wave. In this design, the high frequency level is adjusted by a VCA control voltage that is initially set by the user with an “adjust to taste” level control and is concurrently dynamically adjusted ratiometrically after a calculation of the RMS summed values of the delayed mid- and low-frequency bands. Banded phase-shift techniques emphasize upper-frequency harmonics and add a high frequency “edge” to the harmonic frequencies of the overall mix, but can mask and reduce the listener's ability to discern the primary fundamental frequencies that give solo instruments and voices depth and fullness, rendering them hollow sounding and not believable. Another problem with this speaker correction method is that it is not useful with all types of transducers, but is only useful with those transducers that exhibit the type of high- and mid-frequency time delay errors that match the time correction circuits within this process.
Another approach used for clarity enhancement of a mix is the addition of a time delay circuit in the low frequency range set to a formulaic delay point relative to the high frequency range. Banded phase-shift techniques emphasize upper-frequency harmonics and add a high frequency “edge” to the overall mix, but mask and reduce the listener's ability to discern the primary fundamental frequencies that give solo instruments and voices depth and fullness. The effect of phase-shift techniques, when combined with a compensating bass boost circuit, is the “loudness curve” effect: more bass and treble with de-emphasized solo instrument and voice fundamental frequencies.
Compressors and voltage controlled amplifiers (VCAs) have been applied to more sophisticated versions of these high frequency boosting circuits to adjust the amount of distortion or phase-shifted material applied to the original sound wave based on detected signal RMS values.
While useful as special effects on individual tracks prior to summing the track into a stereo mix, high frequency boost (“exciter”) processes are too deleterious to the fundamental frequencies of solo instruments and voice, and to the overall balance of the stereo sound field, to be used as a professional-quality stereo mastering tool. Additional compression or downsampling of the music waveform can cause very unpredictable negative effects when distortion or phase-shift signals are added prior to signal density reduction. Loudness curve schemes are effective at low listening levels, yet moderate or high listening volumes cause the mix to sound harsh and edgy, leading to listener fatigue and dissatisfaction.
It is therefore desirable to provide signal processing methodology technology that accurately creates a live performance feeling in a user listening to a digital recording or other source of digital information, without the undesirable side-effects produced by conventional practices.