The invention relates to a multi-media data processing device having visual display means, storage means, user interface means including voice input means and manually actuatable remote control means. Multi media implies a selected combination among character-based text, film material, graphics, graphs, photographs, audio sequences, and possibly other elements that are suitable for producing human sensory experience. Those elements are in some way stored in storage of appropriate type, such as electrically, optically, or magnetically, and may be selectively accessed for temporary or transient display. Various fields of application have seen the need for audio annotation, and also for remote control according what is colloquially called the zap! mechanism. A particular field is the viewing of medical pictures produced by technology such as nuclear magnetic resonance, computer tomography, infrared scanning, dynamic physiological cinema, and various others. However, many other fields of use would see similar interactivity features. A relevant publication is EP 402,911, corresponding Japanese Patent Application No. 149,628/89, priority 890614 to Hitachi, Ltd, herein incorporated by reference. Now, the combination of remote control means and audio inputting means puts appreciable inconvenience on the human user, in particular, when the two mechanisms are used in frequent alternation, because one hand of the operator is needed to hold the remote control unit, whereas it is also necessary to keep his head close to the microphone and vice versa: if the microphone is in a fixed position, this would detract from the remote control ergonomy, whereas, if the microphone is hand-held, this will occupies the second hand of the operator as well. By itself, it is known to have necklace and buttonhole-worn microphones, but the provision of a second, physically separate peripheral device causes much confusion and inconvenience.