The invention relates generally to information processing devices and processes. In particular, the invention relates to information processing devices and processes that automatically link user notations on a page to time-varying data (e.g., handwritten notes linked to a corresponding audio presentation).
The present invention addresses the problem of trying to capture and later review orally presented information (e.g., lecture, meeting, interview, telephone call, conversation, etc.). A listener must simultaneously attend to a talker while attempting to write notes about what is said. A tape recorder can capture exactly what and how things are said, however, it is time consuming and often frustrating to find information on a tape. A user must shuttle between fast forward and rewind to find the portions of interest on the tape. It is difficult to skim through a recording or correlate it with one""s handwritten notes.
Systems that capture writing on paper or a document but do not record audio or video include: U.S. Pat. Nos. 5,629,499 to Flickinger et al. (xe2x80x9cFlickingerxe2x80x9d); 5,734,129 to Belville et al. (xe2x80x9cBelvillexe2x80x9d); 5,627,349 to Shetye et al. (xe2x80x9cShetyexe2x80x9d); and 5,243,149 to Comerford et al. (xe2x80x9cComerfordxe2x80x9d) and the CrossPad (described in Mossberg, Walter S. The CrossPad Sends Paper-and-Ink Notes To Your PC Screen. Wall Street Journal, Apr. 9, 1998, p. B1).
There are graphical computer playback applications that allow a user to select a single point in an audio or video recording (e.g., by positioning a cursor in a visual representation of the media) and then type in a keyword(s) (U.S. Pat. Nos. 5,786,814; 5,717,879; and 5,717,869; and Cruz et al. Capturing and Playing Multimedia Events with STREAMS. In Proceedings of ACM Multimedia 1994, pages 193-200. ACM, 1994). As described in Degen et al. Working with audio: Integrating personal tape recorders and desktop computers. In Proceedings of CHI ""92, pages 413-418. ACM, 1992, a user manually creates an index or xe2x80x9cmarkerxe2x80x9d during recording by pressing one of two buttons on a tape recorder and these marks are then displayed graphically. These systems have limited utility because they rely on the user to manually index the recordings.
Similarly, U.S. Pat. Nos. 5,592,607 and 5,564,005 describe a system where a user indexes a video recording by manually creating xe2x80x9ctime zonesxe2x80x9d. A time-zone is created by drawing a line across the screen. There is a single time point in the video (the time the line was drawn) associated with the area of the screen below this line until the next line is drawn. Users can write notes with a stylus. Individual pen strokes (i.e., handwritten notes) do not index the videoxe2x80x94the strokes are located inside a time zone which corresponds to the instant that the time zone was created. Additional writing can be added to a time zone at any time but this does not create any new indices into the recording. This system has many disadvantages. Instead of leveraging the natural activity of the user, this system enforces one particular behaviorxe2x80x94drawing a line across the screen to manually create an index into the recording. The granularity of indices is limited since each time zone only relates to a single time point in a recording, and individual pen strokes do not index the recording.
Some systems attempt to automatically generate indices into recorded media using signal processing techniques. For example, some systems have attempted to segment recordings by speaker changes (e.g., speaker A started talking at time t1, speaker B at time t2, etc.) as described in Kimber et al. Speaker Segmentation for Browsing Recorded Audio. In Proceedings of CHI ""95, pages 212-213. ACM, 1995.
Some systems use handwritten notes taken during recording to index audio or video. U.S. Pat. No. 4,841,387 describes a system that indexes tape recordings with notes captured during recording. The writing surface is an electronic touchpad. All indices are created during recording only and stored in a reserved portion at the beginning of a microcassette tape. The user cannot add notes that index the recording during playback. The display surface is grouped into rectangular areas to save storage space; this has the disadvantage of making the system coarser grained than if each mark or pen stroke was indexed. In addition, a user has to put the device in a special xe2x80x9creview modexe2x80x9d (by pressing a button) before being able to select a location in the notes for playback. Other systems index audio and/or video recordings with notes handwritten on a computer display screen or electronic whiteboard during recording (U.S. Pat. Nos. 5,535,063; 5,818,436; 5,786,814; 5,717,879; and 5,717,869, as described in Whittaker, Steve et al. Filochat: Handwritten Notes Provide Access to Recorded Conversations. In the Proceedings of CHI ""94, pages 271-277, ACM-SIGCHI, 1994, and as described in Wilcox, Lynn. et al. Dynomite: A Dynamically Organized Ink and Audio Notebook. In the Proceedings of CHI ""97, pages 186 193, ACM-SIGCHI, 1997).
Systems described in Stifelman, Lisa J. Augmenting Real-World Objects: A Paper-Based Audio Notebook. In the Proceedings of CHI ""96. ACM-SIGCHI, 1996 (xe2x80x9cStifelman 1996xe2x80x9d) and Stifelman, Lisa J. The Audio Notebook: Paper and Pen Interaction with Structured Speech. Doctoral Dissertation. Massachusetts Institute of Technology, Sep. 1997 (xe2x80x9cStifelman 1997xe2x80x9d) index digital audio recordings with notes written in a paper notebook during recording. Some limitations of these systems are as follows. Like the previous systems just described, Stifelman (1996) and Stifelman (1997) focused on real-time indexingxe2x80x94a limitation is that notes written during playback do not index the recording. A further problem is the issue of distinguishing writing activity from selections made for playback. In Stifelman (1996) and Stifelman (1997), if a user adds to their notes when in a play mode, this could falsely trigger a playback selection. Also, selections left visible marks on the pages. With systems that use a display screen as the writing surface instead of paper, sometimes a circling gesture or other gesture is used to select areas of writing for playback. This can also be error-prone because the system has to distinguish between a circle drawn as data versus a circling gesture or else the user must put the system in a special mode before making the gesture, causing selection to be a two-step procedure.
The present invention offers several advantages over the art. The objects and advantages of the invention include the following. One object of the invention is to allow a user to index time-varying media (audio, video, etc.) while recording, while playing, or while stopped. Another object is for the indexing to be automatically created from natural activity (e.g., user notations such as handwritten notes and page turns) of the user during recording and playback, and that these indexes can be created in a continuous fashion while the recording is originally being made or while it is being played back, or when stopped. Still another object is to allow the indices to be dynamically updated with new indices added during playback, while creating additional recordings, or while stopped. Yet another object is to allow a user to create multiple indices for any part of a recording. Another object of the invention is to allow a user to add new recorded segments of audio, video, etc. for any page of data.
Another object of the invention is to reliably distinguish between user notations created to index the recording and selection actions that are intended to cue playback to a location associated with a user notation. Still another object of this invention is to allow this distinction to be made without requiring a user to explicitly instruct the device to enter a special xe2x80x9cmodexe2x80x9d, and that only a single step or action is needed to make a selection. Another object is for the selection action to be intuitive and not require training or reading a manual to learn. Yet another object is to allow a single input device to be used both for making notations and selections, and without creating unwanted marks.
In a multimedia notebook recording application, a real-time continuous stream of media such as audio and/or video is recorded and linked with handwritten notes, other notations, or other types of indexing information (referred to as xe2x80x9cuser notationsxe2x80x9d or simply xe2x80x9cnotationsxe2x80x9d). Such an application or device will be referred to as a xe2x80x9cmultimedia recorderxe2x80x9d. This indexing information can then be used to cue a recording to a location corresponding to the user notation.
The present invention describes a multimedia recording device that combines the best aspects of a paper notebook and a media recorder (i.e., for recording audio, video, music or other time-varying media). The device can be used to record and index interviews, lectures, telephone calls, in-person conversations, meetings, etc. In one embodiment, a user takes notes in a paper notebook, and every pen stroke made during recording, playback, or while stopped is linked with an audio and/or video recording. In other embodiments, the writing medium could be a book, flip chart, white board, stack of sheets held like a clip-board, pen computer, etc. (hereinafter referred to as a xe2x80x9cbookxe2x80x9d). Hereinafter, the term xe2x80x9cpagexe2x80x9d generically refers to planar surfaces such each side of a leaf in a book, the book cover, a surface below the book, a touch sensitive surface, a sheet of paper, flip chart, image on a screen, whiteboard, etc.
For playback, users can cue a recording directly to a particular location simply by turning to the corresponding page of notes. An automatic page identification system recognizes the current page, making it fast and easy to navigate through a recording that spans a number of pages of data. Users can select any word, drawing, or mark on a page to instantly cue playback to the time around when the mark was made. A selection is made using a xe2x80x9cstylusxe2x80x9d, where stylus is defined as a pen (either the writing end of a digitizing pen or the selecting end of a digitizing pen), finger, or other pointing mechanism. The multimedia recorder is able to reliably distinguish between user notations that index the recording and selections intended to trigger playback.
More particularly, the invention links a user notation (e.g. handwritten notes) on a page to time-varying data (e.g. audio). The invention includes a user interface for capturing attribute data for each user notation made on the page during record-time and play-time, a recording device for recording the time-varying data corresponding to the attribute data for each notation, and a processor for dynamically linking the attribute data to a corresponding element of the time-varying data.
The user interface can for example include a stylus for making and selecting a user notation and a digitizing tablet or other sensing device for capturing the attribute data for each user notation. The attribute data can include pressure information from the stylus corresponding to the pressure applied by the writing end of the stylus when making the notation onto the page, location information corresponding to the location of the stylus when making the notation, time information for when each user notation was made, and index-type information (e.g. play-time, record-time, and stop-time). The stylus can include both a writing end for making the notation onto the page and a selection end for selecting a user notation and thereby selecting the corresponding time-varying data to reproduce.
The recording device can include a sensory device (e.g., a microphone, telephone output, digital camera, television output, etc.) for receiving the time-varying data and a storage device (e.g., a hard disk, removable disk, etc.) for storing the time-varying data.
The processor is coupled to the sensing device (e.g., a digitizing tablet), the recording device, and a memory (where the attribute data is stored) for dynamically linking the attribute data for a particular user notation to the corresponding time-varying media that was recorded or reproduced at the same time that the user notation was made.
During record-time, the invention records the time-varying data and the attribute data for each user notation and links the attribute data to a corresponding element of the time-varying data. When the user wants to review these notes and listen to the corresponding time-varying data, i.e. during play-time, the user selects the user notation desired with a selection end of the stylus and the system automatically plays/reproduces the recorded time-varying data (e.g., audio). The user can then add additional notations to the page while the time-varying data is being played and the invention will automatically link these new notations to the time-varying data. The user can also stop the recorder and make notations on the page at her leisure (i.e. during stop-time). These stop-time notations will be automatically linked to the time-varying data that was playing when the playback was stopped.