Organizations that have technical engineering personnel including: computer programmers, natural science engineers, project managers and information technology (IT) support staff typically conduct regular status or review meetings throughout the lifespan of a engineering or software development project. Professional service consulting firms having personnel in similar roles, typically conduct project design and review meetings with customers who have engaged their services. For some project members, attending in-person may require traveling a large distance at a significant cost. Audio or video teleconferencing systems are well known to anyone schooled in the art, and are often used to conduct project meetings, reducing the need for travel and hopefully fostering greater communication frequency and reporting among team members. Furthermore, project management software is often utilized to plan project tasks and schedules, assign resources to a task, monitor development progress, and determine prerequisite tasks and other constraints to the timely completion of a project. During a project status call, the project plan often drives the meeting agenda, discussion topics, and action items.
Since conferencing calls are often used as an alternative to in-person meetings, and because conferencing calls rarely achieve the same level of information interchange among all participants, it is desirable to be able to record a conference call for later use or review. The benefits of recording a conference call include the ability to review what a stated about a specific topic. Recording allows participants to fully concentrate on the current discussion rather than being sidetracked by extensive note taking. Recording establishes a full and complete record of the call rather than a summary or subjective dictation, and it allows anyone who has missed the meeting to replay a complete and unfiltered recording of the meeting.
Many U.S. patents address teleconferencing systems and several specifically address the area of recording audio or video conference calls as discussed above. U.S. Pat. No. 5,668,863 describes a method and apparatus for recording audio conference calls in which each participant is independently recorded. These recordings are then combined in a linear fashion, and each block is associated with the participant's voice name tag.
U.S. Pat. No. 5,710,591 describes an audio and video recording apparatus which makes use of a Multipoint Control Unit (MCU) that is capable of determining which one of the participants should be recorded (usually the loudest) at any given time. By switching between the loudest speaker, the recording device creates a single recording which tracts a single speaker at a time. This single recording can then be processed by a transcription application. The time at which a new speaker is recorded can be saved and later used to navigate (i.e. fast-forward to a specific speaker) within the recording.
U.S. Pat. No. 6,239,801 describes a method and system for using symbolic icons (an analog clock face is illustrated) that designate the start time, stop time, and progress of an interval or portion of an multimedia recording. These icons are manually designated by the presenter during the recording, created at designated intervals, or designated at pre-defined events such as when a new slide is presented. These icons can be used to control the playback of a recorded multimedia session.
U.S. Pat. No. 6,100,882 describes a method of creating a textual representation of an audio conference call by individually recording each participant using separate recording device that is located on the participant's terminal. Each participant's recording is processed by a speech-to-text application to create a textual representation. The text is propagated to each other participant on the call and is joined to create a single linear transcription of the meeting.
U.S. Pat. No. 6,298,129 describes an audio teleconferencing system allowing the meeting leader to time stamp or bookmark the call as it is in progress and being recorded. The system includes a computer server with a web-based interface allowing participants to playback portions of the recording from one bookmark to another over the computer system.
U.S. Pat. No. 6,334,022 describes a video recording device that allows simultaneous recording and playback of video recordings, allowing viewers to time-shift their viewing of video recordings. The system has embodiments that include automatic video indexing based on scene detection image detection, or at preset time intervals. A filtering module is described to allow only the recording of video programs based on channel programming information.
U.S. Pat. No. 6,349,303 describes an audio video information processing apparatus that uses a speaker segmentation method and speech-to-text recognition software to create a linear meeting summary document. The documents can embed a still picture taken from a video camera next to the text representation for each block or time interval.
U.S. 2002/0002584 describes an information sharing system that includes a messaging communication module between two or more terminals. The system allows a non-participant to view a conference recording as a spectator. Participants can manually tag relevant time intervals within the recording. Multiple recordings can be searched based on these manually entered comments.
Problems that limit the usefulness of the above conference recording and indexing systems have been identified. One class of systems and methods known in the prior art, requires manual tagging or indexing. For example, the meeting leader has to manually mark or designate the start and stop times and provide commentary for each interval of interest. Another class of prior art automates the interval time marking by detecting physical events, such as the change in a speaker. The problem in that approach is that the detected events have no notion of conceptual knowledge. For example, when a manager and subordinate are discussing a topic, intervals are created for each change of speaker, but no interval is created covering the entire topic being discussed. A third class of prior art utilizes speech recognition software to automate the time stamping of any words that are recognized during the conferencing call. That approach suffers from the inherent ineffectiveness of the speech recognition software (estimated between 70-90% word accuracy rates) and only results in indexing of words. There is no notion of conceptual knowledge.
Context-based analysis of text documents, or text derived from speech recognition of audio, is an analysis method to identify and represent the sections of one document, or a collection of documents, that have similar traits. These traits can be assigned to a topic or can be associated to a classification hierarchy, or can be used to identify new documents having similar traits. Several U.S. Pat. Nos. 5,918,223 6,185,527 and 6,186,531 address context analysis or topic analysis for audio information. Additionally, other patents address the abstraction or gisting of documents and the subsequent visual presentation of abstract data.
U.S. Pat. No. 5,918,223 uses segmentation and acoustical feature (loudness, bass, pitch, etc.) analysis to create sound fingerprints of many very-short intervals in an audio file. The fingerprints provide an index that can be used to search by example in a management system to find similar sounds. This work operates only on the audio sound and does not try to recognize words spoken.
U.S. Pat. No. 6,185,527 uses segmentation and acoustical analysis to identify the type of audio (music, speech, speech on music, silence) prior to passing the speech segments to a speech recognition analysis module for automatic indexing. Points of emphasis, or concluding remarks based on acoustical analysis and frequency are noted. Furthermore, a navigation method can skip over certain types of audio, for example, multiple 30-second intervals of speech or music representing commercials.
U.S. Pat. No. 6,185,531 covers topic based indexing using context-vectors to automate classification of a text document, or text derived from speech recognition of audio. This approach requires training sets to create topic models, and then classifying a new document to the topic it most closely matches.
U.S. Pat. No. 5,794,178 covers a visualization technique by which the context-vectors of sampled data are displayed with multiple attributes. Documents that are similar in topic are graphically displayed in neighboring regions of a computer display. Furthermore a navigation technique including zooming, rotating the point of view and selectively choosing the abstraction level are noted.
U.S. Pat. No. 5918-236 illustrates the automatic creation of multiple gists or abstracts for a simple document using thematic profiling. A document browsing system is shown, allowing different users with different topic interests to view the same collection of documents, where only the gist of interest to the user are displayed.
U.S. Pat. No. 6,308,187 covers a method of displaying a chronologically arranged collection of documents allowing a user to visualize relationships between different documents or information elements within a document.
Needs exist for better recording and indexing systems for video conferencing.