This invention relates to the field of text-to-speech conversion, especially in a voice messaging and communications setting. More particularly, this invention relates to a method of and apparatus for efficient sharing of a text-to-speech conversion resource in a unified messaging application.
Increasing numbers of users are accessing e-mail messages. At its inception, a user necessarily could only review an e-mail message from their desktop, either from a terminal or personal computer (PC). Modem users require more freedom which prompted remote e-mail access, for example via a laptop computer and modem. More recently, users"" desire for more efficient access to e-mail has prompted the introduction of voice delivered e-mail. In voice delivery, a machine or human operator reads the e-mail message directly from the caller""s mailbox. The merging of text and voice messaging into a single delivery source is known in the art as Unified Messaging. This allows the recipients to retrieve their e-mail messages at any time they have access to a telephone. Owing to cellular and satellite telephony technology, such a system, in essence, allows users to access their e-mail at any time and from almost any place.
The machine conversion of an e-mail message to voice message utilizes a text-to-speech (TTS) conversion resource. Unified Messaging applications in addition to other applications which read text over the telephone, use a TTS conversion resource. As is well known in the art, TTS can be implemented in either host-based software or using separate voice processing hardware. In either form it should be considered as a xe2x80x98scarce resourcexe2x80x99. TTS is expensive in either throughput or hardware expenditures. In the host-based software implementation the CPU cycles associated with conversion limit the number of concurrent conversions which a single system can support. Using separate voice processing hardware incurs additional cost and consequently there is a need to operate with a limited number of resources.
Often users do not listen to long recitations of detailed e-mail messages. Rather, users will listen to a first part of the message then skip the remainder until they return to their PC or laptop computer and review the details of the e-mail message in text format. Converting such a message in its entirety would in essence be a wasteful use of a scarce resource.
For at least these reasons, it is desirable to perform TTS conversions on demand. In other words, the conversion is performed when the user is on the telephone and determines that they want to hear their e-mail messages. Unless there was a dedicated TTS resource for each user, the likelihood exists that a user would be required to wait an extended period of time for other users to complete the review of their e-mail messages so that the TTS resource will be available. Under certain circumstances, this delay could prevent the user from retrieving their e-mail messages until a later time.
What is needed is a more efficient method and apparatus for sharing a TTS resource.
What is further needed is an efficient just-in-time sharing of a TTS resource.
An architecture is provided for sharing text-to-speech (TTS) resources. A TTS controller manages the allocation of the TTS resources. An application provides a conversion request which is provided to a first queue. An available TTS resource begins a conversion upon sentence boundaries and converts a predetermined minimum amount of text. Once a sufficient amount of text is converted, the digitized speech data is played to a user. The amount of converted data is monitored during the playback operation. As the totality of the converted data falls below a predetermined minimum the TTS controller is notified. If more text remains in a message being converted, the TTS controller places a request into a second queue. The second queue has a higher priority so that continuing conversions are completed before subsequent conversions begin.