When a user is dealing with a streaming media server (e.g., a voicemail server), he may decide to initiate an audio and/or video communication (call-back) with someone who left a message (caller). Some multimedia mail servers provide a facility where the streaming is interrupted and where the server initiates a communication with the caller on behalf of the user. When the user has finished the call-back, he comes back to the media server and may listen to the next message. With this approach, there are three technical problems. First, two media server ports are used while the user is having his real time communication (call-back to the caller). Second, signalling and sometimes media flows are hair-pinned through the media server. This may burden the network. Third, as the media server initiates a call on behalf of the user, there is usually inconsistencies in authentication (the media server does not know the user's password), display (the media server is calling the caller instead of the user is calling the caller), authorisation (the media server does not know the rules associated to the user), and reporting (billing, call-log).
FIG. 1 is a block diagram of a typical call-back situation according to prior art, illustrating the aforementioned drawbacks. FIG. 1a shows a user 10 connected to a media server 11. For example, the user 10 has called the media server 11 to listen to his telephone mailbox managed by the media server 11. The connection comprises a SIP dialogue 101 and a transmission of media streams 111 from the media server 11 to the user 10 (SIP=Session Initiation Protocol). After the media server 11 has replayed to the user 10 a voice message left by a caller 12, the media server 11 announces to the user 10 the instruction 1001: “Press 1 for calling this person back.” When referring to a user 10 and a caller 12, we implicitly mean that the user 10 uses a user's terminal and the caller 12 uses a caller's terminal for establishing telecommunications connections.
FIG. 1b shows the successive situation after the user 10 has pressed the key “1” on his device. The media server 11 sends a SIP INVITE message 103 to the caller 12 for establishing a call-back to the caller 12. The connection between the user 10 and the media server 11, comprising a SIP dialogue 102 and a media stream 112, is maintained.
The successive situation has two alternatives: Either, cf. FIG. 1c1, all data exchange between the user 10 and the caller 12 are routed through the media server 11, i.e., both signalling traffic 104, 105 and media streams 114, 115 (Full hair-pinning). Or, cf. FIG. 1c2, only signalling traffic 106, 107 between the user 10 and the caller 12 is routed through the media server 11 whereas media streams 118 are sent on a direct way under avoidance of the media server 11 (Signalling hair-pinning). FIG. 1d shows a successive situation after the call-back to the caller 12 has been terminated. The connection between the user 10 and the media server 11, comprising a SIP dialogue 109 and a media stream 119, is maintained. The user 10 resumes to listen to the mailbox entries. For example, the media server 11 announces to the user 10 the instruction 1009: “Press 1 for deleting the message. Press 2 for listening to the next message.”
FIG. 2 is a message flow diagram of a similar call-back situation according to prior art. This description focuses on the role of a media server with respect to a caller. It is implied that the conversation between the caller and a user is bridged by the media server. A user 20, called Alice, is subscriber to a corporate telecommunications network which is connected via a corporate SIP server 23 to a media server 21. The user 20 wants to listen to her telephone mailbox managed by the media server 21. The user 20 initiates a SIP session by sending a SIP INVITE message 201 addressed to her mailbox address “mymailbox@corporate.com” to the corporate SIP server 23. The corporate SIP server 23 forwards 202 the SIP INVITE message to the media server 21. The media server 21 replies to the corporate SIP server 23 with a “200 OK” message 203, which is forwarded 204 from the corporate SIP server 23 to the user 20. The “200 OK” message is acknowledged (ACK) 205 by the user 20. The ACK message is forwarded 206 from the corporate SIP server 23 to the media server 21.
A connection 207 for the exchange of RTP media streams is established between the user 20 and the media server 21 (RTP=Real-time Transport Protocol). The media server 21 announces to the user 20 the instruction 208: “Bob's message: ‘Hello Alice! Blablabla.’ To call back Bob, press 1. To listen to next message, press 2.” Bob is the name of a caller 22 who has left a voicemail on the telephone mailbox of the user 20. The user 20 responds to this announcement 208 by pressing 209 the key “1” on her device. When referring to the user 20 and the caller 22, we implicitly mean that the user 20 uses a user's terminal and the caller 22 uses a caller's terminal for establishing telecommunications connections.
Triggered by the pressing of the key “1” on the user's 20 terminal, the media server 21 initiates a call-back SIP session with the caller 22 by sending a SIP INVITE message 210 addressed to the caller's 22 address “bob@corporate.com” to the corporate SIP server 23. The corporate SIP server 23 forwards 211 the SIP INVITE message to the caller 22. When the caller's 22 terminal, triggered by the INVITE message 210, 211, starts playing a ring tone, a 180 RINGING response 212, 213 is sent back, via the corporate SIP server 23, to the media server 21. On receipt of the 180 RINGING response, a UAC of the media server 21 is responsible to play a ring-back tone 214 at the media server 21 (UAC=User Agent Client).
Once the caller 22 picks up 215 the phone, the successful 200 OK response 216, 217 is sent from the caller 22 via the corporate SIP server 23 back to the media server 21 because the request is correctly processed. The 200 OK message 216, 217 is acknowledged (ACK) 218 by the media server 21, and the call is connected. Now the actual conversation 219 between the caller and the media server 21 is transmitted as data via RTP. The media server 21 bridges 220 the RTP flows to the user 20, and thus acts as a man-in-the middle between the user 20 and the caller 22. During the call-back to the caller 22, the media server 21 is a stateful media server 290. When the called party says “Good-bye” 221 and hangs up, a BYE request 222 is sent to the media server 21. The media server 21 responds with a 200 OK 223 to the caller 22.
After the call-back connection to the caller 22 has been terminated, the media stream transmission 224 from the media server 21 to the user 20 is resumed at a point where it had been interrupted for the set-up of the call-back to the caller 22. The media server 21 may resume the media stream 224 with the announcement 225: “Next message is from Charles: ‘Hello Alice!’”.
Moreover, when a user performs a call via internet using SIP (QSIG or H.323) then such a call will usually be stateless (QSIG=Q signalling). This has big drawbacks when the user would like to initiate a call-back to a caller after hearing some message left by that caller on the user's (audio/video) mailbox.
Rosenberg, J., Lennox, J. and Schulzrinne, H., “Programming Internet Telephony Services”, IEEE Internet Computing, Vol. 3, No. 3, 1999, pp. 63-72 (Tech-Report Number CUCS-010-99), describe a solution to achieve that a script related to a call persists so to allow to continue to interact with the server to process subsequent responses. It is achieved by defining a state token, called a script cookie, which is passed from the script to the server through a SIP CGI (=Common Gateway Interface) meta-header. When the script is re-executed at some later point, the server passes the cookie back to it through environment variables.
The currently best solution is for the server to act as man in the middle and renegotiate the media flows so that media ports of the media servers are free until the real-time communication. However, the media server still keeps a signalling context open; if there is a VXML interpreter, the context is still active as well (VXML=Voice Extensible Mark-up Language). As aforementioned, there is a way to prevent media hair-pinning. However if devices of the users and media server are separated by a Session Border Controller (=SBC), the SBC keeps contexts open as well. Another currently available solution is that there is proprietary information in the message initiated by the media server that can indicate on whose behalf the call is made. This is partly done by the OmniTouch® Unified Communication media server. However, this mechanism does not work in a multi-vendor environment and is complex in networks.