Personal computers have evolved over time to accept various kinds of input. Contemporary computing devices allow audio input via a microphone. Such audio input is supported by contemporary operating systems, such as Microsoft Windows®-based operating systems, which provide a sound recorder to record, mix, play, and edit sounds, and also to link sounds to a document, or insert sounds into a document. Application programs and integrated programs such as Microsoft® Officexp offer speech recognition, which converts speech to text.
While audio input to a computer system is thus supported by software, a certain level of skill and effort is required to use these audio features to any reasonable extent. For example, to insert an audio comment (voice comment or voice annotation) into a word processing document such as via the Microsoft® Word word processing program, the user needs to know to put up a reviewing toolbar, click an arrow next to a new comment icon, click a voice comment icon, and click a record button on a sound object dialog and speak to record the voice comment. When finished recording, the user needs to click a stop button, and then exit the dialog to resume typing. Other programs have like requirements for entering a voice comment. Even a skilled user still has to perform a fair amount of work and manipulate the pointing device a significant amount to enter such comments.
Other uses for audio input include speech recognition to enter text, and command and control, in which a user speaks commands to the computer system to perform operations. These tasks also require a fair amount of familiarity with the audio programs and a fair amount skill and effort to perform.
Video input via a camera is also becoming popular. In general, video input suffers from the same drawbacks as audio input, which is that it is difficult in use.
What is needed is a way for users to efficiently and intuitively leverage the audio and video capabilities provided with contemporary computer systems, operating systems and applications. The method and system should be simple and fast for users to learn and operate, and configurable to some extent to meet various user scenarios and usage patterns.