Viewing recorded or streamed audio-video or audio content is well known. Commercial broadcasters covering an event often have more than one recording device (video-camera/microphone) and a programme director will select a ‘mix’ where an output from a recording device or combination of recording devices is selected for transmission.
Multiple ‘feeds’ may be found in sharing services for video and audio signals (such as those employed by YouTube). Such systems, which are known and are widely used to share user generated content recorded and uploaded or up-streamed to a server and then downloaded or down-streamed to a viewing/listening user. Such systems rely on users recording and uploading or up-streaming a recording of an event using the recording facilities at hand to the user. This may typically be in the form of the camera and microphone arrangement of a mobile device such as a mobile phone.
Often the event is attended and recorded from more than one position by different recording users at the same time. The viewing/listening end user may then select one of the up-streamed or uploaded data to view or listen.
Where there is multiple user generated content for the same event it can be possible to generate an improved content rendering of the event by combining various different recordings from different users or improve upon user generated content from a single source, for example reducing background noise by mixing different users content to attempt to overcome local interference, or uploading errors.
There can be a problem in multiple user generated or recorded systems where the recording devices are in close proximity and the same audio scene is recorded multiple times. This problem can for example be the selection of at least one of the audio signals from one of the audio sources or recording devices from the large number of recordings available within a range of the selected listening point (also known as the audio event). In other words from the view point of an audio server attempting to generate an audio signal for an end user it can be difficult or problematic to select the a relevant part of the audio scene. For example does the audio server select audio signals from audio sources containing the most relevant sound sources and how does the audio server determine which of the uploaded audio sources are the most relevant audio sources. Similarly where the end user is attempting to identify some isolated detail of the audio scene or the general ambience outside of the audio scene how can the audio server determine audio sources containing the ‘less common’ or ‘less relevant’ audio signal segments which typically describe the ambience sound of the audio scene. The audio signal routers currently in use typically select only audio sources according to very basic criteria such as “nearest”, “loudest” with reference to the audio scene and therefore can miss subtle sound qualities which may be recorded by an audio source at the periphery of the audio scene and requested by the end user.