In general, the speech signals accompanying a picture, such as a motion picture, are recorded on the assumption that they are reproduced by speakers arranged on both sides of the picture. This leads to coincidence between the sound source in the picture and the actually heard sound image position, thus establishing a spontaneous localized position relation between the picture and the speech.
However, if such speech is to be appreciated using a conventional headphone device, the sound image is localized in the listener's head and the sound image position localization becomes extremely non-spontaneous, with the picture direction being non-coincident with the position of sound image localization. The same may be said of the case of appreciation of only the speech, such as music sound, since the sound being heard as if it were emanated from the listener's head represents an extremely non-spontaneous phenomenon in distinction from the case of speaker reproduction.
For obviating this inconvenience, there is known a method in which, for producing a sound field equivalent to that produced in the case of speaker reproduction, the impulse response from a speaker placed in front of the listener's both ears is measured or calculated and convolved in the speech signal by a digital filter such as FIR filter so as to be heard through a headphone device. Although the sound image is localized outside the listener's head, the forward side sound image is still localized within or laterally to the listener's head, so that the problem of non-spontaneity is not obviated. If the sound accompanies a picture, the sound image is moved in synchronism with the head movement, thus producing deviation between the image direction and the sound direction and hence an extremely non-spontaneous sound image localization.
There is also known a method in which the head movement of the listener is detected and the digital filter coefficients are accordingly updated from time to time for localizing the direction of the sound image at all times with respect to the hearing environment. The digital signal processor has a digital filter, such as an FIR filter. With this method, the sound image is not localized within the user's head and the sound image strongly resembling the sound image reproduced by the speaker placed on the front side is realized. However, it becomes necessary in this case to update the coefficients each time the user's head makes a minute movement, thus requiring an extremely large number of sum-of-product processors and memories.
There is also known a method for avoiding complexities in sequentially updating the coefficients, according to which the digital filter coefficients are fixed at data of the head transfer function in a pre-set direction and corrections for head movement are made for all input signals by a time difference load device and a level difference addition device. This method eliminates the necessity of sequentially correcting the coefficients and enables the circuit scale to be reduced significantly. However, the direction of sound image localization that may be realized with the time difference load device and the level difference addition device is limited to an angular range of forward 180.degree., while the sound image cannot be localized behind the head.
However, if an attempt to calculate the turning angle of the head using a rotary angle sensor as described above, and the headphone device is used with this angle, the processing volume of the updated impulse response becomes voluminous and the system becomes costly in such a case wherein the multi-channel speech signal is reproduced as a forward or backward localized sound image. If the above arrangement is implemented by a simplified system, the direction of sound image localization is limited to a forward angular range of 180.degree., while it is impossible to realize rear sound image localization simultaneously.