Controlled-environment communication systems are telecommunication systems designed to enable members within a controlled-environment facility to communicate with parties outside of that facility. These systems allow telecommunications activities for the populations of those facilities to be highly regulated. They are designed with security measures and apparatus that enable administrators of such facilities to set policies for allowed and disallowed activity, monitor voice calls to detect members within the facility engaging in disallowed activities, and also to bill parties on the call as appropriate. These systems are designed for many contexts in which monitoring of telecommunications activity is desirable, such as health facilities, military facilities, and correctional facilities such as prisons.
The prison application has an especially urgent need for strong security measures and apparatus. Prison inmate communication is highly circumscribed because of the potential for abuse. Inmates have been known to use inmate communication systems in the past to engage in illicit activity outside of the prison, threaten parties of interest such as judges and attorneys, and witnesses, and communicate with inmates in other prison facilities about possibly illegal activity. As such, several security measures have been developed for use with these systems over the past two decades which have now become standard. Combinations of several features such as personal identification number (PIN) entry, biometric validation of inmates including fingerprint and voice print identification, per inmate allowed and disallowed contact lists, physical phone enclosures, and so on are all common features in several prison communication systems on offer. These features allow call requests by inmates to be validated such that only valid requests, such as an inmate requesting a call to a family member evaluated as a non-threat, are allowed at the onset of the call request.
However, these security features have struggled to keep up with schemes to circumvent them. For example, within the facility itself, an inmate may coerce another inmate into initiating a phone call to an outside party that appears on that inmate's block list, but not on the block list of the coerced party. An inmate may then converse with the outside party, evading detection by security features by simply posing as another inmate.
A common (and more subtle) class of circumvention attempt involves the assistance of a called party that is allowed by the prison system. An allowed called party can be contacted without alerting any security alarms by the prison communication security apparatus, and the called party may assist the inmate in contacting a third party for nefarious purposes using features commonly available to public telephone network customers. Three-way calling is a prime example: an allowed called party can establish a three-way call with a third party, which then allows the inmate and the third party to communicate using a call session originally established between the inmate and the allowed called party. Thus, contact between the inmate and the undesirable third party evades detection by the prison security apparatus.
In response, several schemes have been developed to detect three-way calling attempts. Several techniques fall under the umbrella of “sound detection,” in which sounds associated with three-way call activity are detected. One such method is the detection of a loud “clicking” sound called a “hookflash,” “switchhook,” or “flashhook” that is made when a called party switches to a different line to initiate a call session with a third party. To detect this sound, the energy of the call audio is used to detect a short burst of energy over the call session that exceeds a threshold. Another common scheme infers a three-way call attempt by detecting an extended period of silence. This detection scheme is based on the observation that the called party leaves the call session with the inmate for some period of time to initiate a call session with a third party, and thus the inmate call session may be silent for some amount of time.
Yet another scheme compares the silence during a known period of conversation with other silence periods, based on the observation that the background noise characteristics of silence made by a central office, as is the case when a called party has left the session to contact another party, is fundamentally different than the background noise made when the called party is present. In yet another iteration of the competition between inmates and prison telecommunication system designers, several detection schemes now exist to detect inmates' attempts to mask the silence or hookflash sound associated with three way calling by creating a loud sustained noise on the call line, for example, by blowing into the receiver. Echo characteristic detection is yet another technique to detect potential three-way calling, based on the observation that there is a “characteristic echo” caused by the natural electromagnetic reflection caused by the interfacing between common telephone switches and telephone line materials. When the echo characteristic changes, this may be indicative of a third party being added to the call which has added yet another echo to the call. Combinations of techniques also exist in the art, such as detecting the hookflash click and a silence immediately following the click.
All of these techniques achieve varying levels of success, and reducing false-positive detections are a key challenge for all of them. Furthermore, most of these methods are only applicable when inmates and called parties are served through legacy Public Switched Telephone Network (PSTN) technologies such as analog “plain old telephony service” (POTS) or Integrated Services Digital Network (ISDN) technologies. As voice communication shifts towards Voice over Internet Protocol (VOIP), many of these techniques have become obsolete. VoIP operates on a “packet-switch” paradigm, in which packets representing samples of encoded voice are sent between speakers on a voice call, where, unlike the “circuit-switched” paradigm used in PSTN, packets do not require a dedicated line to be established for the entire path between the call parties. VoIP signaling comprises two distinct streams, voice data which carries packetized digitally-encoded voice between call parties, and signal data that carries signaling message packets that enable call session initiation, routing, session parameter negotiation between call parties, and teardown of a VoIP call.
In particular, techniques designed to reduce bandwidth usage of VoIP calls have created challenges for legacy three-way call detection techniques. Silence suppression, in which a phone terminal serving a user who is not speaking does not generate voice data to send to the other call party, poses significant problems for existing detection schemes, as loud clicking sounds from hookflash may be missed or not generated at all, and background noise without speech present often results in no sound packets being sent between users. The digitization of voice also allows for better fidelity of the sound generated at the speaker end to be reproduced at the receiver end, negating echo-based detection schemes significantly.