Many events in the political, business, engineering, medical, journalism, education and legal domains, among others, today are being recorded, and the recording of events, both video and audio, is likely to become even more useful in the future world of collaborative work environments. It is, therefore, increasingly important and necessary for a system user to have the ability to correlate user-produced notes or information about an event to the recorded signals of the event. In addition, the increasing proliferation of hand-held processor-based machines that use a stylus (or pen-like) device for capturing a user's input make such machines ideal candidates for use in taking notes about an event, replacing both the conventional pen and paper and keyboard modes of note-taking. As with many other processor-based systems, a well-designed user interface that both supports and enhances a person's natural style of note-taking is crucial to the ultimate utility and successful use of such a note-taking system.
Existing systems support note taking and correlation of notes to recorded signals in a variety of ways. Some are intended for use in a real-time environment while others are structured for use after an event has been recorded, i.e., as a "post-processing" step. Some have few or no user-interface features specifically designed for the note-taking, annotation or indexing process, while other user interfaces have special purpose features tailored to a specific application, such as the correlation of a legal deposition transcript to a video recording of the transcript. The discussion of some of these systems that follows highlights their basic features and disadvantages.
European patent application publication EP 0 495 612 by Lamming discloses a computer-based note-taking system integrated with an audio or video recording system. The computer presents a document editor style user interface to the user who either creates a new document or retrieves an existing document to which the user adds notes as a recording is made or played via the integrated audio or video recording system. As the user enters each note (mark or indicum), the indicum is added to the document and it is time stamped and stored in an indicumo-to-time-stamp index. The time stamps are not visible to the user; they are stored with the computer's internal representation of the indicia entered by the user. A video-frame time stamp function time stamps time code data received from the audio or video recorder and creates a time-stamp-to-time-code index. A browser function permits the user to retrieve sections of the recording using the indicia directly by selecting the indicia. The browser looks up the indicia in the first index to retrieve the time stamp, and looks up the time code of the recording in the second index using the time stamp, playing the section of the recording in the area indicated by the time code. EP 0 495 612 also discloses how time stamping the indicia may be applied to creating topic or key word data. By entering new, separate indicia spatially near a previously entered indicia that is a key word or topic whenever an idea or speaker or topic applies to the previously entered indicia, later selection of all of the marks spatially associated with a topic will result in all sections of the recording indexed by the time stamps of the respective indicia to be replayed.
U.S. Pat. No. 4,841,387, entitled "Arrangement for Recording and Indexing Information" and issued to Rindfuss, discloses a system for recording information relating to an event on a recording medium, such as an audio or video tape, and for indexing positions of handwritten notations made on a touch sensitive device and concerning the event to positions on the recorded medium in order to allow the user to identify portions of the handwritten notations for which review of the correlated material on the recorded medium is desired. In the recording mode, the device makes an audio recording of the event on a standard cassette tape. Simultaneously, the electronic touchpad senses the position of the user's handwritten notes on the writing surface, and provides this information to the microprocessor which correlates the record of the positions of the handwritten notations on each page with the position of the recorded information on the audio tape at corresponding instants in time. Real-time constrained correlation vectors representing rectangular areas of the display surface each containing a cohesive group of handwritten text are each combined with a tape position that correlates to the instant in time the handwriting within that area began.
U.S. Pat. No. 4,425,586 issued to Miller discloses a system that combines a video tape recorder with a computer in such a manner that these two components each automatically record and display where related information is stored in its own mechanism as well as in its counterpart mechanism, allowing the user to determine the location of all the corresponding data stored both on video tape and on a storage medium such as a diskette, by examining only one storage medium. Notes about the recorded event or document may be entered onto the diskette along with the automatic entry of the corresponding reel number and frame number of the video record and diskette and file address number of the computer storage medium. Another feature disclosed is the capability of the system to enter and display the time and date on both the video tape and diskette recording mediums as well as on both video monitors along with the data address location information.
U.S. Pat. No. 4,924,387 issued to Jeppeson discloses a computerized court reporting system which provides for periodically annotating the stroke record made by the user of a court stenographic machine with a time stamp from a system clock while simultaneously sending a time stamp to a recording system making a video and audio recording of the testimony. The logic of a control system determines automatically when to time stamp the stroke record and permits the user to trigger a control function to annotate the video recording with automatic "on the record" and "off the record" messages with associated time stamps.
These real-time data correlation and access systems have several similar disadvantages. The user's ability to index notes to the recording is entirely controlled through the indicia, or notes, the user has entered in a document, since the time stamps or positions captured are those made at the time the notes are entered. Each system assumes, therefore, that the time of entry of a note sufficiently corresponds with the time or position of the recording to provide an adequate index into the recording. In the case of U.S. Pat. No. 4,924,387, this provides an adequate indexing structure since the stroke record made is intended to be a verbatim transcription of the verbal testimony made in a courtroom. In the case of EP 0 495 612, however, where a verbatim transcription of the event may not be the note-taker's intention, such an assumption does not allow for the later, real-time augmentation of previously entered notes, while still recording, with additional notes related to the previously entered notes, since the later notes will be time stamped with the time they were entered rather than with the time of the material to which they relate or are relevant. In the case of the system disclosed by Rindfuss, notes may be entered later, but the later-entered notes will be correlated with the position of the tape at the time the later notes were entered, rather than with the position on the recording of the material to which they relate or are relevant. In the case of the system disclosed by Miller, notes added to the diskette record of the event at a later time are entered entirely during a post-processing phase, and not during the real-time recording of the event. In some of these systems, neither time stamps nor tape positions are visible to the user, and so the temporal or spatial context of the entered indicia is not available to the user to enhance the retrieval function. In addition, in EP 0 495 612 the function provided for creating topics or key words from the entered indicia may be practically limited to one display "page" or screen unless the user reenters the topic or key word on a second screen or scrolls between screens to add a mark to a previously entered topic or key word. As with augmentation of notes in general, there is no facility for associating a key word or topic name created at a later time with notes entered earlier. Finally, no user interface design is explicitly suggested in the note-taking systems for facilitating or enhancing a user's personal note-taking style or for accommodating the note-taking function to a variety of applications.
Existing post-processing (non-real-time) annotation systems in the field of post-production video editing provide for the creation of annotations about scenes correlated with "in" and "out" time codes identifying the scenes on a video recording. U.S. Pat. No. 5,218,672 is an example of such a system. It is disclosed there that scene descriptions may be revised after initial creation, but the correlation of the annotations are confined temporally to the identified scenes. There is no provision for grouping one scene description with other related scene descriptions.
In the post-processing system for the correlation of legal depositions with video recordings thereof disclosed in U.S. Pat. No. 5,172,281, a time code number is assigned by an operator of the system to both the computer transcript and the videotape segment where each question/answer passage begins. The location of individual words in the transcript may also be correlated with their corresponding position in the video recording. However, the system does not appear to provide for the entry of notes or annotations.
As can be seen from the discussion of the deficiencies in existing systems, these methods and systems require a user to adapt his or her natural note-taking process, which may be both temporally linear and non-linear with respect to the perception and recording of the event, to requirements and restrictions imposed by each respective implementation and they fall short of facilitating and enhancing a person's personal note-taking process while still providing accurate access to recorded information. In the case of video and audio logging tools, exclusively post-processing systems are inadequate for generating notes about relationships between recorded segments, and are time consuming because they require review of the entire tape in order to generate an index. Automatic indexing of video and audio notes by system time stamping of key- or handwritten strokes or automatic detection of speaker voice changes do not provide adequate context markers for the recorded signals or do not permit user control of the amount of detail to be captured. For example, an index created on the basis of speaker segmentation of the material would tell who was speaking but not the substance of the talk.