In recent years, mobile phones which are among mobile devices have functions of transmitting/receiving electronic mails and allowing websites to be browsed, in addition to performing conventional voice communication, and communication methods and services in a mobile environment are becoming diversified. In the current mobile environment, operation methods based on visual sense are mainly used in the functions of transmitting/receiving electronic mails and allowing websites to be browsed. However, in such operation methods based on visual sense, although a great amount of information is provided and intuitive understandability is enhanced, danger may be involved in a moving state, for example, during walking or while a car is being driven.
Meanwhile, voice communication based on auditory sense, which is a primary function of mobile phones, has been established as communication means. In practice, however, because of constraints for securing a stable communication path, the service for voice communication is restricted so as to obtain such a quality as to allow contents of the phone call to be understood, by, for example, using monophonic sounds having a narrowed bandwidth.
On the other hand, methods of providing information for auditory sense have been conventionally studied, and a method of providing information by means of sounds is called an auditory display. An auditory display incorporating stereophonic technology makes it possible to offer information with enhanced presence, by placing the information as a sound at an optional position in a three-dimensional audio image space.
For example, Patent Literature 1 discloses technology in which the voice of a user's communication partner who is a speaking person is placed in a three-dimensional audio image space in accordance with the position of the partner and the direction in which the user faces. It is considered that this technology can be used as means for identifying, without shouting, a direction in which the partner is located when the partner cannot be found in a crowd.
In addition, Patent Literature 2 discloses technology in which the voice of a speaking person is placed such that the voice comes from a position at which an image of the speaking person is projected in a television conference system. It is considered that this technology makes it easy to find a speaking person in a television conference, and thus enables natural communication to be realized.
People are surrounded by a large number of sounds and hear a large number of sounds daily. The ability of people to selectively recognize contents to which they pay attention among a large number of sounds is known as cocktail party effect. That is, to some extent, people can selectively follow and listen to contents to which they pay attention even when a plurality of speaking persons are present at the same time. For example, multichannel television sound is in practical use as technology for simultaneously representing a plurality of speaking persons.
Further, Patent Literature 3 discloses technology in which the state of conversation in a virtual space is dynamically determined, and the voice of a specific communication partner and the voices of other speaking persons which are environmental sounds are placed.
Further, Patent Literature 4 discloses technology in which a plurality of sounds are placed in a three-dimensional audio image space and the plurality of sounds are heard as stereophonic sounds generated by convolution.