People assemble images, audio, video and other media into presentations for various reasons—for professional presentations, to memorialize family events or simply for entertainment. Once assembled, audio annotations can be added to these media presentations to provide narration or to capture a viewer's response to the presentation. Adding audio annotations to a media presentation typically involves using multimedia editing features of a camera or camcorder, or dedicated multimedia editing software executing on a computer, features that generally allow annotations to be made in an “annotation” or “edit” mode that is separate from a “playback” mode. Multimedia editing software or features are controlled through a user's manual interactions with a computing device, such as hitting keys on a keyboard, operating a mouse or touching a touchscreen. This manual interaction allows a user to control the selection of media elements (images, videos, etc.) for inclusion in a presentation, where in a media presentation audio annotations are to be added, and the recording, editing and storing of annotations.