1. Field of the Invention
The present invention relates to data processing and speech signal processing, and relates more particularly to the synchronization of text or images to speech.
2. Description of the Related Art
Literary works and the like have long been reproduced in differing audio formats for users who may wish to review the work by listening to it rather than reading it. Audio books, for example, have been produced so that people may listen to “books on tape” while driving. Additionally, such audio books have been produced for use by the blind and the dyslexic. These audio books have typically been provided in analog cassette form or on compact-disc (CD-ROM) by having a human narrator in a recording studio read and record the text. In one way of making the recording, one narrator starts at the beginning of the book and continues in page number sequence and continues recording until the end is reached.
There exists an open source data standard called the Digital Audio-Based Information System, or DAISY, by the DAISY Consortium for the creation of digital “talking books”. The DAISY standard provides a selectable table of contents by which a user may choose a section of material to be played back audibly.
Some software packages have been introduced using the DAISY standard to produce an audio book. These allow for recordings of the audio data to form the book made by one or more narrators in segments in accordance with a pre-set format. Typical programs are LP STUDIO PLUS and LP STUDIO PRO developed by The Productivity Works and Labyrinth. These programs allow for the creation, management, editing and production of synchronized multi-media materials including the synchronization of audio recordings to text and image data.
In a typical method of generating digital audio books using these prior art software systems, a playback control format is established, such as a navigational control center (NCC) format, based on sections of the source text to be recorded. The NCC predefines sections of synchronizable text prior to recording corresponding audio portions. Audio data for one or more of the text sections is then recorded, in any order. Once the audio portion has been recorded for one or more text sections, a Synchronized Multimedia Integration Language (SMIL) file is generated. Each SMIL file may correspond to one or more of the text sections with the associated audio data of one or more audio portions. In the LP STUDIO products, a skeleton SMIL is generated before the audio portion is recorded and is filled in as the recording is made. Once corresponding audio data portions have been recorded for all sections of the text source, the SMIL files are integrated according to the original definitions provided by the pre-established NCC and placed on a medium for audio playback by a user.
For a typical audio recording of a commercial type talking book, a single narrator reads the entire text. Situations may exist in which it is not economically feasible to have one narrator do this and a group of volunteer narrators is to be used each to record audio data for one or more sections. This particularly is advantageous in charitable type work, such as in producing talking books for the blind or dyslexic. The plurality of narrators are to record the audio data in a rather unstructured manner. That is, one volunteer may have time only to record one segment, a volunteer works periodically, and several volunteer narrators are available to record data at the same time.
For the above described prior art and other recording systems, once the NCC file has been defined, it can not be altered without recreating an entirely new data structure. This limits the ability of the ordering sequence and other attributes of the text files once the NCC has been established. Accordingly, it would be preferable to introduce a more narrator-friendly method and system for generating digital audio books whereby the formatting of sections of text to be recorded can be freely changed without having to redefine the entire NCC structure. This is particularly useful in an application where a number of different narrators, usually volunteers, are to produce a talking book by reading different sections of the same text source.