Recording arrangements enabling spatial audio capture are becoming increasingly common as for example mass-market mobile devices are equipped with multiple microphones or microphone arrays. While such recording arrangements enable recording of a spatial audio image comprising multiple sound sources more precisely than before, such recording arrangements typically eventually downmix the recorded audio signal into a composite stereo or binaural audio signal, where the multiple sound sources are not separable in a straightforward manner. Hence, a challenge lies with process of the user of the mobile device modifying the spatial audio image of the recorded audio signal. Moreover, another challenge is a lack of intuitive processing tools and/or interfaces for a user modify an audio image of an audio signal of any kind.