In the field of telecommunication, speech recognition is sometimes employed in various communication services, meaning that a user is able to speak voice commands into a User Equipment, UE, for controlling some functionality therein or in a communication network, rather than entering written commands and pressing buttons on a keyboard or the like. In some applications, a speech recognition function in the UE or in the network is able to translate the entered voice command into a text such as a recognizable message or just a single word. A spoken voice command in the UE may also be sent in digitally encoded form to a speech recognition entity where the actual speech recognition is executed by analyzing and translating the speech into corresponding text. Recently, speech recognition has been applied for smart phones e.g. the speech-based service called “Siri” developed for Apple iPhones.
FIG. 1 illustrates an example of how conventional speech recognition can be used in a communication network for controlling some service function or apparatus which could be any voice-controllable device or function such as, e.g., a teleconference bridge, a banking service, an electronic game, functions in a telephone or computer, control of various home appliances, and so forth. Thus, when a spoken command is entered in a UE 100, shown as an action 1:1, the UE 100 provides a digitized version of the speech as signals to a speech recognition entity 102, shown as another action 1:2. The speech recognition entity 102 then translates the received speech signals into a text version of the speech, in an action 1:3. As said above, the speech recognition entity 102 may be implemented in the network or in the UE 100 itself.
Possibly, the entity 102 may also utilize a function referred to as “Artificial Intelligence”, AI, 104 to make a more or less elaborated interpretation of the spoken command, as shown by a schematic action 1:4. In that case, the AI function 104 basically deduces the meaning of a spoken question or command once it has been converted to text by the speech recognition 102. As a result, the speech recognition 102 may issue a control message or command corresponding to the entered speech, as shown in an action 1:5, which somehow controls or otherwise interacts with a service function or apparatus 106. The service function or apparatus 106 may then process the control message and operate accordingly such as providing a suitable response back to the UE 100, as shown by a final action 1:6.
In general, the speech recognition services known today include two parts, the actual speech recognition and the interpretation thereof e.g. by means of an AI function or the like. In different typical implementations, both of these parts may reside in the UE or partly or completely in nodes of the network. In the above-mentioned service Siri for iPhones, a simplified speech analysis and AI analysis is made by the phone, which in parallel may send the speech in text form to an AI function in the network for obtaining a more advanced analysis and creation of a suitable response or other action.
In some cases, a user of a UE may want to convey a message to another user without actually calling the other user and have a conversation. For example, the first user may not want to disturb or take time to talk with the other user, but may just want to convey the message in a simple and convenient manner. One option is of course to send an SMS or other written message which may however be time-consuming and difficult to do depending on the current situation. It would be much easier to just send a spoken message to a voice-mail box or convey it to the other user's UE in real time, e.g. by means of the known “Push-to-talk over Cellular”, PoC, service. However, this would still require entering the telephone number or other address of the other user which may not be easily available to the first user. It is thus a problem that conveying a written or spoken message to a recipient according to conventional technique requires some additional efforts by the first user. This could be an even greater problem if the first user wants to convey the spoken message to several recipients which basically requires that the message is sent separately to each recipient by entering the telephone number or address of each individual recipient.
Another area associated with similar problems of conveying speech messages to particular recipients is conference calls where voice commands can be used for controlling a conference call. In conference calls, the speakers may use a push-to-talk function by pressing a button on the UE when they want to speak. However, in order to direct a spoken message in a conference call to a particular recipient, it is necessary that the sending user enters a number or address of the recipient. Otherwise, it is common in conference calls that spoken messages are always routed to all registered participants of the conference, which sometimes may not be desirable or suitable.