1. Technical Field
The present invention relates to packet-based communication sessions, such as Voice over Internet Protocol (VoIP) sessions, and, more particularly, to conducting surveillance with respect to such sessions.
2. Description of Related Art
Given the recent rise in popularity of the Internet and packet-switched communications generally, it is becoming more and more common for people to engage in packet-based, real-time media sessions over packet-switched networks rather than, for example, more traditional circuit-switched telephone communication sessions. These real-time media sessions may take the form of VoIP sessions, and/or any other type of real-time media sessions. To engage in these sessions, communication devices may use a packet-switched protocol such as the Internet Protocol (IP), relevant aspects of which are described in “Internet Protocol,” RFC 791 (September 1981), which is incorporated herein by reference.
Certain types of media sessions, such as VoIP sessions, may be set up using a protocol such as the Session Initiation Protocol (SIP), relevant aspects of which are described in Rosenberg et al., “SIP: Session Initiation Protocol,” RFC 3261 (June 2002), which is incorporated herein by reference. The SIP messages involved in setting up these sessions may include description of one or more parameters of those sessions according to a protocol such as the Session Description Protocol (SDP), relevant aspects of which are described in Handley and Jacobson, “SDP: Session Description Protocol,” RFC 2327 (April 1998), which is incorporated herein by reference.
Once the session parameters have been agreed upon by the session participants, the session may be conducted using a bearer protocol—and via one or more bearer elements such as routers, switches, media servers, media gateways, etc.—such as the Real-time Transport Protocol (RTP), relevant aspects of which are described in Schulzrinne et al., “RTP: A Transport Protocol for Real-Time Applications,” RFC 3550 (July 2003), which is incorporated herein by reference. Many other protocols may used instead of or in addition to SIP, SDP, and RTP, however.
With respect to communication sessions in general, whether they be circuit-switched or packet-switched, it sometimes occurs that law-enforcement agencies need to and are authorized to monitor them. Along these lines, on Oct. 25, 1994, the United States government enacted the Communications Assistance for Law Enforcement Act (CALEA) to clarify the duty of telecommunications carriers to cooperate in monitoring communications for law-enforcement purposes. CALEA requires these carriers (e.g., telephone companies, wireless service providers, etc.) to make available both call content (voice signals) and call data (digits dialed, length of call, etc.) to requesting law-enforcement agencies in response to a valid court order.
Among the known techniques for conducting surveillance of communications are speaker verification and speaker identification. Speaker verification refers to comparing a voice sample against a stored digital representation—often known as a voiceprint—of a person's voice, for the purpose of verifying the identity of the speaker. This is often most useful alongside some identity-corroborating data, such as a name, account number, and the like. Speaker identification involves comparing a voice sample against multiple voiceprints, to determine the identity of the speaker, and is often used when no identity-corroborating data is available.
However, it may also happen that a target of surveillance may not convey his or her actual voice during a communication session, perhaps in an effort to avoid the voiceprint analysis mentioned above. This could occur if, for example, the target communicated using instant messaging (IM) or in a chat room. The target could also use a computer-generated voice, perhaps produced in part by text-to-speech (TTS) technology, or perhaps by a combination of speech-to-text (STT) technology and TTS technology. Still another example might involve a target using a device to garble or distort his or her voice. And other examples are certainly possible, each of which may render voiceprint-matching strategies ineffective.