The most common interface for accessing a music collection is a text-based list. Music collection navigation is used in personal music systems and also in online music stores. For example, the iTunes digital music collection allows a user to search for an explicitly chosen song name, album name or artist name. A list of potential matches is returned, usually in the form of a list and often ranked in terms of relevance. This requires a user to know in advance the details of the music they are looking for, which inhibits a user from discovering new music. The user is often given a list of several thousand songs from which to choose and, because a user is only able to listen to a single song at any one time, they need to invest a significant amount of time to listen to, and browse through, the choices offered, to make a decision about to which song to listen.
Previous audio interfaces have focused on spatializing the sounds sources and approaches to overcome errors introduced in this presentation of the sounds. In known interfaces, sound sources are presented in a virtual position in front of the listener to aid localization and decrease problems introduced in interpolating the head-related transfer functions. The AudioStreamer interface developed in the 1990s presented a user with three simultaneously playing sounds sources, primarily recording of news radio programs. The sounds were spatially panned to static locations directly in front and at sixty degrees to either side of the listener. The virtual position of the sound sources was calculated using head-related transfer functions (HRTFs). Sensors positioned around the listener allowed the sound source preferred by a user to be tracked without any further user input.
Several audio-only interfaces have also been developed to assist a user in re-mixing multiple tracks of the same song, such as the Music Scope headphones interface developed by Hamanaka and Lee. Sensors on the headphones were used to track a user's movement but the invention failed to ensure the accurate spatialization of the sounds because it is concerned with re-mixing rather than navigating through multiple songs. Without accurate spatialization of the sounds sources, a listener is likely to be confused and any selection of sounds source by the user is difficult and so inaccurate. These existing interfaces do not allow a user to directly interact with the sound sources to select which option to play. By using fixed sounds sources, such interfaces are unsuitable for exploring a large music collection.
It is also known to create a combined visual and audio interface wherein music is spatialized for a loudspeaker setup, such as the Islands of Music interface developed by Knees et al. However, such a system would not be suitable for headphone listening and so cannot be applied, for example, to a personal music system or to mobile phone applications.
The majority of existing audio interfaces for interaction with audio files use non-individualized HRTFs to spatialize the sound source and are concerned with overcoming errors common to such methods. The interfaces presented to a user are limited to a front position with respect to a user to aid localization. The systems are kept static to decrease computational load. None of the known interfaces disclose an accurate method for presenting the spatial audio with which a user is allowed to interact. The placement of the sounds in the virtual environment is key factor in allowing a user to interact with multiple sources simultaneously.