The increasing proliferation of small, laptop or handheld processor-based systems for capturing a user's input make such systems ideal candidates for use in taking notes about virtually any kind of event, making the replacement of the conventional pen and paper mode of note-taking a realistic and practical goal. The rapid technological advances in the use of a stylus (or pen-like) device for input, in place of the conventional keyboard device, make such a note-taking system even more like the natural note-taking process associated with the conventional pen and paper mode. As with many other processor-based systems, a well-designed user interface that both supports and enhances a person's natural style of note-taking is crucial to the ultimate utility and successful use of such a note-taking system.
Existing systems that support functions that are broadly classified as note-taking have generally evolved in relationship to systems concerned with the correlation of notes to recorded signals. Some of these have been intended for use in a realtime environment while others are structured for use after an event has been recorded, as a postprocessing step. Some have few or no user-interface features specifically designed for the note-taking, annotation or indexing process, while other user interfaces have special purpose features tailored to a specific application, such as the correlation of a legal deposition transcript to a video recording of the transcript. The discussion of some of these systems that follows highlights their basic features and disadvantages.
European patent application publication EP 0 495 612 by Lamming discloses a computer-based note-taking system integrated with an audio or video recording system. The computer presents a document editor style user interface to the user who either creates a new document or retrieves an existing document to which the user adds notes as a recording is made or played via the integrated audio or video recording system. As the user enters each note (mark or indicium), the indicium is added to the document and it is time stamped and stored in an indicium-to-time-stamp index. The time stamps are not visible to the user; they are stored with the computer's internal representation of the indicia entered by the user. A video-frame time stamp function time stamps time code data received from the audio or video recorder and creates a time-stamp-to-time-code index. A browser function permits the user to retrieve sections of the recording using the indicia directly by selecting the indicia. The browser looks up the indicia in the first index to retrieve the time stamp, and looks up the time code of the recording in the second index using the time stamp, playing the section of the recording in the area indicated by the time code. EP 0 495 612 also discloses how time stamping the indicia may be applied to creating topic or key word data. By entering new, separate indicia spatially near a previously entered indicia that is a key word or topic whenever an idea or speaker or topic applies to the previously entered indicia, later selection of all of the marks spatially associated with a topic will result in all sections of the recording indexed by the time stamps of the respective indicia to be replayed.
U.S. Pat. No. 4,841,387, entitled "Arrangement for Recording and Indexing Information" and issued to Rindfuss, discloses a system for recording information relating to an event on a recording medium, such as an audio or video tape, and for indexing positions of handwritten notations made on a touch sensitive device and concerning the event to positions on the recorded medium in order to allow the user to identify portions of the handwritten notations for which review of the correlated material on the recorded medium is desired. In the recording mode, the device makes an audio recording of the event on a standard cassette tape. Simultaneously, the electronic touchpad senses the position of the user's handwritten notes on the writing surface, and provides this information to the microprocessor which correlates the record of the positions of the handwritten notations on each page with the position of the recorded information on the audio tape at corresponding instants in time. Realtime constrained correlation vectors representing rectangular areas of the display surface each containing a cohesive group of handwritten text are each combined with a tape position that correlates to the instant in time the handwriting within that area began.
U.S. Pat. No. 4,425,586 issued to Miller discloses a system that combines a video tape recorder with a computer in such a manner that these two components each automatically record and display where related information is stored in its own mechanism as well as in its counterpart mechanism, allowing the user to determine the location of all the corresponding data stored both on video tape and on a storage medium such as a diskette, by examining only one storage medium. Notes about the recorded event or document may be entered onto the diskette along with the automatic entry of the corresponding reel number and frame number of the video record and diskette and file address number of the computer storage medium. Another feature disclosed is the capability of the system to enter and display the time and date on both the video tape and diskette recording mediums as well as on both video monitors along with the data address location information.
U.S Pat. No. 4,924,387 issued to Jeppeson discloses a computerized court reporting system which provides for periodically annotating the stroke record made by the user of a court stenographic machine with a time stamp from a system clock while simultaneously sending a time stamp to a recording system making a video and audio recording of the testimony. The logic of a control system determines automatically when to time-stamp the stroke record and permits the user to trigger a control function to annotate the video recording with automatic "on the record" and "off the record" messages with associated time stamps.
These realtime data correlation and access systems have several similar disadvantages that make them unsuited for note-taking in general. The user's ability to index notes to an address marker, such as time, is entirely controlled through the indicia, or notes, the user has entered in a document, since the time stamps or positions captured are those made at the time the notes are entered. Each system assumes, therefore, that the time of entry of a note provides a sufficiently useful correlation to the event as a whole. In the case of U.S. Pat. No. 4,924,387, this provides an adequate indexing structure since the stroke record made is intended to be a verbatim transcription of the verbal testimony made in a courtroom. In the case of EP 0 495 612, however, where a verbatim transcription of the event may not be the note-taker's intention, such an assumption does not allow for the later, realtime augmentation of previously entered notes with additional notes related to the previously entered notes, since the later notes will be time stamped with the time they were entered rather than with the time of the material to which they relate or are relevant. In the case of the system disclosed by Rindfuss, notes may be entered later, but the later-entered notes will be correlated with the time the later notes were entered, rather than with the time of the material to which they relate or are relevant. In the case of the system disclosed by Miller, notes added to the diskette record of the event at a later time are entered entirely during a postprocessing phase, and not during the realtime recording of the event. In some of these systems, neither time stamps nor tape positions are visible to the user, and so the temporal or spatial context of the entered indicia is not available to the user to enhance the retrieval function. In addition, in EP 0 495 612 the function provided for creating topics or key words from the entered indicia may be practically limited to one display "page" or screen unless the user reenters the topic or key word on a second screen or scrolls between screens to add a mark to a previously entered topic or key word. As with augmentation of notes in general, there is no facility for associating a key word or topic name created at a later time with notes entered earlier.
Existing postprocessing (non-real-time) annotation systems in the field of post-production video editing provide for the creation of annotations about scenes correlated with "in" and "out" time codes identifying the scenes on a video recording. U.S. Pat. No. 5,218,672 is an example of such a system. It is disclosed there that scene descriptions may be revised after initial creation, but the correlation of the annotations are confined temporally to the identified scenes. There is no provision for grouping one scene description with other related scene descriptions.
In the postprocessing system for the correlation of legal depositions with video recordings thereof disclosed in U.S. Pat. No. 5,172,281, a time code number is assigned by an operator of the system to both the computer transcript and the videotape segment where each question/answer passage begins. The location of individual words in the transcript may also be correlated with their corresponding position in the video recording. However, the system does not appear to provide for the entry of notes or annotations.
As can be seen from the discussion of the deficiencies in existing systems, these methods and systems require a user to adapt his or her natural note-taking process, which may be both temporally linear and nonlinear with respect to the perception of the event, to requirements and restrictions imposed by each respective implementation. They do not provide a more flexible interface for facilitating and enhancing a person's personal note-taking process in a wide variety of situations. Automatic indexing by system time-stamping of key- or handwritten strokes or automatic detection of speaker voice changes do not provide adequate context markers for the event as a whole or do not permit user control of the amount of detail to be captured. For example, an index created on the basis of speaker segmentation of the material would tell who was speaking but not the substance of the talk.