Listeners and viewers may associate a visual experience with an audio experience, or an audio experience with a visual experience. In certain settings the association of a visual experience with an audio experience may have particular value in, for example, personal entertainment, the entertainment industry, advertising, sports, game playing, inter-personal and inter-institutional communication and the like. At present, however, it is not possible for a user to acquire a preferred visual image, text and/or global positioning system (GPS) data and convert it to a preferred audio composition in real time wherein the user's preferences guide a computer system to generate an audio composition that comprises, for example, the user's input relating to one or more visual image regions, the user's input relating to one or more audio parameters, the user's input relating to methods of generating the audio output, the user's input relating to compatibility of one or more audio outputs, and/or the user's input relating methods of audio storage, reproduction and distribution. Accordingly, the present invention provides methods, systems, devices and kits for conversion of a preferred visual image to a preferred audio composition in, for example, real time and/or off-line.