Natural language requests often make use of a compound form, in which implicit quotes exist around a part of the request sentence. The ability to understand combined requests generally pose no problem for humans to sort out. When a teacher asks a student, “Tell your mother to call me tomorrow,” the student has no trouble understanding what the teacher meant and how to execute his request. The teacher's sentence has two parts, a communication request (“Tell your mother”) which describes a transmission act, and a message (“call me tomorrow”) which is the main content to be communicated, along with the knowledge that the message was from a certain teacher, and in what context, such as time and place. To perform a similar task in response to a user's spoken request, a computer system will analyze the utterance, recognize a communication request and separate it from the message, analyze the communication request to determine what transmission action is requested, and analyze the message to transmit. Automated conversational agents exist today (such as Apple's Siri) that are able to perform such a task. In this case, and in other prior art systems, the effect of the user's request is to send a message as transcribed text. This is useful, especially if the transcription is correct, the recipient is equipped with a device to display text, and the meaning of the message is properly conveyed by text alone, in the absence of prosody features (such as tone of voice and pauses) found in the original voice signal.
The ability to automate the process of analyzing a complex statement such as the one above, consisting of a communication request and a message, coupled with the ability to perform the required communication acts, will also prove advantageous in the area of voice mail. Voice mail is essential in the modern age, as people are accustomed to the idea that every phone call will result in a communicated message, whether or not the recipient was available to answer the call. With voice mail, the message to transmit is taken straight out of the user's speech, as an audio signal. It is interesting to note that smart phones are able to record audio, and they handle telephone communications, the latter possibly resulting in leaving voice mail after various steps and delays; but they do not allow a user to send voice mail in one deliberate step, where a single request results in sending voice mail. The use of virtual agents that understand natural language is becoming more widespread, and in this context, a more effective way to send voice mail will be for users to do this with a single spoken request. This request will combine a communication request (which specifies the destination) and the message (i.e., the content of the voice mail); yet the art does not offer that possibility. In fact, directly embedding voice mail in a spoken utterance requires novel techniques.