Music is normally produced and mixed for loudspeaker reproduction. When music is mixed for loudspeaker reproduction however, the resulting listening experience becomes less than optimal when listening through earphones.
The process of music production and music reproduction can together be said to consist of a sound encoding part and a sound decoding part. The encoding part entails music production and storage of the music material on a designated format, e.g. the CD format. The decoding part is the sound reproduction part which entails the whole procedure of reading the music signal from the storage format to the signal processing that enables presenting the music to the ears of the listeners. The decoding part normally entails sound reproduction by either loudspeaker or earphone listening.
A stereo music signal has information encoded in it that, when played back over loudspeakers in a listening room, results in psychoacoustic cues being presented to the listener that gives a certain spatial impression of the sound. By spatial impression is meant aspects of the sound that has to do with e.g. the location and size of each instrument in the sound image and what kind of acoustical space is perceptually associated with each instrument.
These spatial psychoacoustic cues become either strongly distorted or totally missing when earphones are used in the reproduction system.
An often used solution for making the perceived sound field more natural in earphones when reproducing a stereo signal is to use a cross-feed network to feed some of the left signal to the right ear, and some of the right signal to the left ear. See for example references [1], [2], and [3].
FIG. 1 is a schematic block diagram illustrating an example of a cross-feed network. The cross-feed filters as depicted in FIG. 1 are normally designed to give similar head-shadowing and Interaural Time Differences (ITD) as a normal stereo speaker setup in front of the listener would give. The goal is to control the sound stage width so that it becomes more natural.
In some implementations only the frequency dependent head shadowing is simulated and the ITD is kept at zero. The side-effect of this is that the sound stage loses ambience, and becomes too narrow. If a time-delay is inserted in the cross-feed signal paths HRL and HLR the sound stage proportions can be simulated properly but another problem arises—center panned sounds that are correlated between the left and right input channels experience a strong comb filtering effect in the addition of the direct-path and cross-feed path sound. This comb filtering effect colors the spectrum of the sound.