1. Field of the Invention
The present invention generally relates to passive recording of audio, voice, video, and data information transmitted over a network, and more particularly relates to Voice-over-Internet Protocol (VoIP) recording.
2. Description of the Related Art
1.0 Introduction to VoIP Recording
Since the mid-1990s, Voice over IP (VoIP) has steadily changed the telecommunications industry. The convergence of data and voice in the communications market allows for value-added services not available on traditional circuit-based networks, in addition to cost saving advantages. VoIP technology enables businesses to reduce costs, consolidate and simplify networks, and improve customer service applications. VoIP, once viewed as just a new technology, is now recognized as a reliable and cost-effective business solution.
To remain competitive, businesses that develop call-recording applications must now implement VoIP solutions. VoIP recording will be discussed and differentiated from traditional circuit-based recording by starting with an overview of the IP telephony network and then examining the unique challenges of VoIP call recording. A suite of components available under the mark IPX/IPR from Ai-Logix, Somerset, N.J. 08873, which are designed to support VoIP call recording applications, will then be discussed.
VoIP, also known as Internet telephony, IP telephony, packet-voice, packetized voice, and Voice-over-IP, transmits voice traffic in the form of packets. Since VoIP is reliable and efficient, call centers seeking to improve customer service and to reduce network costs have adopted it. Looking ahead, call-recording businesses are expected to do the same.
1.1 Hierarchical VoIP Network Structure
A typical IP network includes interconnected routers that form a packet switching fabric. VoIP is designed to take advantage of this IP infrastructure. There are many ways to add VoIP technology to a LAN network. The simplest design requires the addition of a VoIP call control server, such as the Call Agent 26 shown in FIG. 4. This server 26 provides the logic and control functions required to maintain the call state. In this scenario, a phone call from the Internet 28 enters the local network via a router 30. Signaling information passes to the Call Agent 26, which then sets up and manages the call. Once a connection is established, the voice conversation passes directly from the router 30 to the IP phone via LAN switches 32. Unlike circuit-based systems, where voice traffic passes along the same cable as signaling traffic, VoIP technology may separate the two.
1.1.1 Hybrid Networks
VoIP networks can also be designed to interface with a conventional Public Switched Telephone Network (PSTN) network, usually a T1 or E1 line, as shown in FIG. 5. In this situation, a Gateway 34 is used to convert traffic between the two networks. In some scenarios, the local phone network consists entirely of IP telephones 36 and a Call Agent 38 that manages call states. In other environments, the local phone system is a combination of VoIP and conventional PSTN phones. In this case, call control requires both a Call Agent, and a conventional PBX. Alternatively, hybrid PBXs can be used so that VoIP and PSTN phones can coexist.
1.1.2 Integrating Distant Offices
VoIP technology enables businesses with distant offices to reduce operating costs by consolidating and simplifying network design as shown in FIG. 6. Many companies, specifically those with worldwide call centers, adopt VoIP technology for this reason. As a hypothetical example, assume a call center has three offices (segments) located in California 40, New York 42, and Texas 44. With VoIP technology, a single Call Agent 46 manages call control on all three networks while the local network's existing Ethernet switches voice traffic to/from IP phones 48. Operational costs decrease dramatically because a separate telephone network is no longer required. FIG. 6 illustrates the efficiency of VoIP technology.
1.2 Customer Expectations from VoIP Recording
For a business that purchases a call recorder, VoIP simply allows networks to carry telephone conversations. Customers who already have a conventional recording system expect enhanced capabilities from a new VoIP recorder. Customers who are new to call recording have their unique business requirements in mind and are looking to solve their business objectives. Ultimately, customers focus on the recording product's features, rather than on the underlying VoIP technology.
Application developers who design call-recording systems recognize VoIP technology's ascendancy. A VoIP recorder is important to remain competitive in the call recording market. This recorder must be able to provide, at least, the same features available on PSTN recording applications.
Passive call recording systems rely on hardware components that tap into the telephone network and direct data into a recording application. Recording applications require the same data: the voice conversation (for recording purposes), the call control information (to monitor call states), and data for value added services, such as DTMF, CallerID, and the like.
1.3 Features Distinguishing a VoIP Network
VoIP's packet-based network presents a new tapping environment with a unique set of challenges. When designing a VoIP recording system, it is important to carefully research these differences and plan for them.
1.3.1 Jitter and Synchronization
One of the most significant differences introduced by VoIP is how audio data arrives. In a conventional circuit-based network, once a call is established, the physical path between the two endpoints is fixed. In analog systems both upstream and down-stream traffic are carried on the same wire and are presented as a waveform. In digital systems, up-stream and down-stream traffic are carried on separate wires, but are synchronized to prevent interruptions within the call. In the IP world, the two endpoints are not fixed and are viewed as connectionless. Media RTP packets carrying voice data for a single call can be routed through different paths. As a result, packets of voice data arrive at the endpoint at different times (jitter) and out of sequence.
To compensate for jitter, IP data networks use buffers to store incoming packets. This allows the network to compensate for delayed packets before the data is eventually sorted and passed to the end user. This system is designed for data networks where real-time guarantees are not required and delays in packet delivery are acceptable. However, on a telephone network, delayed packets reduce voice quality. Packet buffering, though required on a VoIP network, must meet or exceed the standards of a telephone network, which specify a maximum delay of 500 ms.
Assuming an Ethernet cable is tapped for voice packets and the VoIP recorder intercepts the packets before they have been buffered, the packets pulled off the network are misaligned and, predictably, the audio quality is poor. To compensate for this, hardware components used for VoIP call recording preferably time the buffering of incoming packets.
1.3.2 Packet Filtering
In a conventional circuit-based telephone network, the line is used to transmit only voice data. On an IP network, many types of packets, such as data, voice, audio, and media, are present on the same Ethernet cable. Packet filtering is the selective passing or blocking of packets as they pass through a network interface. Packet filtering is used by VoIP recording systems to isolate voice related packets from the other data and media packets.
Many conventional VoIP recorders rely on host resources for packet filtering. This is a viable solution on networks with light traffic. However, this system is not scalable and quickly reaches its limits when the system density grows beyond 100 ports. A better solution is a logging system that uses hardware components capable of packet filtering. This system would no longer be limited by host resources and provides a scalable solution for low- and high-density environments.
1.3.3 Voice CODECS
An important consideration in the design of any logging or recording system is its ability to encode and decode numerous compression schemes. Like all recording environments, the type of CODEC used for media transport is controlled by the network. As a result, when selecting hardware components for call recording, application developers prefer products that support multiple CODECs. This is crucial when tapping a VoIP network. When call setup is negotiated between two Call Agents, the media format is also negotiated. As a result, the type of media format used can change from call to call on one network. Unlike circuit-based recording systems, a VoIP recorder has the ability to determine the type of media format on a per call basis. This is accomplished by decoding the packet's header, in which the media format is identified. Currently, the formats, G.711, G.723.1, or G.729A, are prevalent on most VoIP networks and are preferably supported by recording hardware.
The type of media format used for recording is driven by the business needs of the customer. Application developers are often asked to design one system that maximizes storage capabilities and then another system that requires web-enabled playback. The best approach preferably provides a versatile hardware component capable of encoding a variety of media formats. Components that offer both low bit rate CODECS and .wav header support are preferred by application developers to meet these market requirements.
1.3.4 Signaling
Call recording applications typically rely on hardware components to interpret call control and signaling information. Applications monitor call states to observe line activity and control the recording process. Some applications are designed to monitor the caller's experience or agent behavior. These recorders rely on detailed information, such as hold states, to complete their task.
Tapping into a VoIP network requires a component capable of decoding VoIP protocols. More than one type of protocol is used on VoIP networks, but the most common are H.323 and SIP. Also, many PBX manufacturers have designed proprietary protocols to manage call control between the PBX and IP phones. SCCP (Skinny), which is available from Cisco Systems, Inc. (www.cisco.com) is one example. The call logging system is preferably designed around a hardware component capable of decoding standard and proprietary VoIP environments. When designed properly, this single solution would be able to integrate with any VoIP network.
1.3.5 Transporting DTMF
A DTMF signaling system detects touch-tone dialing. When a button on a touch-tone phone is pressed, the tone is generated, compressed, transported to the other party, and then decompressed. On VoIP networks, which use low-bandwidth CODECs, the tone may be distorted during compression and decompression. To address this, VoIP protocols include a relay method that allows for out-of-band DTMF delivery. Relay methods vary from network and include the following:
1. Real-Time Transport Protocol (RTP) can be used to carry specially marked RTP packets. Here the DTMF tones are sent in the same RTP channel as the voice data. The DTMF tones are encoded differently from the voice samples and are identified by a different RTP payload type code.
2. When H.323 is used, either the H.245 signal or H.245 alphanumeric method is available. These methods separate DTMF digits from the RTP channel and send them through the H.245 signaling channel.
3. Using Named Telephone Events (NTE). Using NTE to relay DTMF tones provides a standardized means of transporting DTMF tones as RTP packets. With the NTE method, the endpoints perform per-call negotiation of the DTMF relay method.
At the time a VoIP network is deployed, the preferred DTMF delivery method is selected. However, calls are not processed uniformly. There are cases when the actual delivery method differs from the preferred delivery method. This underscores the importance of selecting a versatile recording component.
1.3.6 Encryption
Companies that have experienced security problems with their data networks are concerned about security with VoIP. There are standards for encrypting data on VoIP networks and some companies are using them. What this mean to the call recording industry depends on the type of encryption method deployed.
Companies typically encrypt data passing between office locations over a Virtual Private Network (VPN). The data encryption/decryption takes place at the endpoints of the VPN, which is external to the local network. The data passing along the local network is unsecured. The voice related packets between the VPN and the IP phones are not encrypted. A tap positioned anywhere on the local network is capable of recording.
Alternatively, the data could be encrypted at the endpoints, that is, at the IP phones. VoIP traffic traveling along the local network is encrypted and cannot be tapped. Conventional IP phones generally lack the processing resources for this type of implementation. It is also expensive for a company to deploy. It is unlikely that a call recording company would encounter this type of environment.
1.3.7 Data Path
On traditional telephone networks, voice and call control information pass through a central location, that is, the switch or PBX. Each channel on the network is tapped individually, and a central tapping system obtains all voice and call control information on the local network. With VoIP, when an incoming or outgoing call is initiated, only the call control information is passed along the Ethernet to the Call Agent. After call setup is complete, the voice packets are passed to the endpoint, which is either a phone on the external network or a local IP phone. An IP network does not have a central location where voice and call control information converges. FIGS. 7 and 8 illustrate this concept.
In FIG. 7, an incoming call enters the external facing Router or Gateway 50. The call control passes to a Call Agent 52, which then negotiates the call with a local IP phone 54. Once the call is connected, the voice packets pass directly to the phone.
In FIG. 8, Agent 1 56 initiates a call to Agent 2 58. Call control information passes to the Call Agent 60. Once the call is initiated, the voice packets pass directly to the other local IP phone. The two phones are connected to the same switch, so the voice packets do not leave this LAN segment.
Recording on the VoIP network may be accomplished in one of the two methods: Active Recording and Passive recording. These two methods are described in Section 2 and Section 3 below respectively. Passive recording is the invention of this application.
2.0 Active Recording on a VoIP Network
The introduction of Voice-over-Internet Protocol (VoIP) telephone networks greatly changed the design of call recording systems. On a VoIP network, voice traffic is packetized and travels across the corporate data network (LAN/WAN), not over traditional copper twisted-pair wiring. This greatly changed the methods that could be used to tap into the telephone network. Hardware components used to tap the wires on circuit-based telephone networks must be replaced with alternative methods.
Active recording is one method that can be used to implement a VoIP recording solution. A software interface is used by a call logging application to monitor call states on the VoIP network. When a call needs to be recorded, third party call control is used to actively join the recorder into the conversation through a conference bridge. The recorder is designed with a media component for terminating the active call.
Active recording provides a viable solution for integrating an existing call recording solution to a VoIP network. Third party call control and the use of a VoIP Media component for recording, which is available as part number IPM260 from Ai-Logix, Inc., Somerset, N.J. 08873, will now be discussed.
Active recording is designed so that the call recorder becomes an active participant with each call on the network. This is accomplished by creating a conference bridge between the call's endpoints and the recording device. Using a software interface, the logging application monitors all calls on the network and controls recording by initiating the conference bridge. Once the call recorder is bridged into the call, the conversation is accessible for recording purposes. In this scenario, call negotiation is required between the IP Private Branch Exchange (PBX) and the recorder. An endpoint is defined herein as a point of entry and exit of media flow. It is a service terminating point that can be either physical (a phone or T1/E1 port) or virtual (a conference server, or a media resource, or the like).
Active call recording works in the following way:
1. The logging application monitors all calls on the network via a Computer Telephony Integration (CTI) interface, which refers to a system that enables a computer to act as a call center by accepting incoming calls and routing them to an appropriate device or person.
2. To start recording, the logging application commands the PBX to initiate a conference bridge.
3. The IP PBX invites the VoIP Media component and conferences it into the call.
4. The VoIP Media component terminates the Real-Time Transport Protocol (RTP), which is an Internet protocol for transmitting real-time data, such as audio and video. RTP does not guarantee real-time delivery of data, but provides mechanisms for the sending and receiving applications to support streaming data.
5. The Media component records the voice and passes the recording to the database.
It is to be noted that silence observation or 3-way conference capability are required on the IP PBX
2.1 Third Party Call Control
Third party call control enables an external entity to setup and manage a communications relationship between two or more other parties via a software interface. In this scenario, the logging application relies on third party call control to initiate a conference bridge making the recording device an active participant.
As shown in FIG. 1, most IP PBXs are designed with a Call Control Server (Call Agent) 10, which runs on a personal computer independent of the PBX. The Call Agent 10 manages all calls on the network, and negotiates call setup and tear down. The Call Agent 10 is connected to an IP PBX 12 via a specialized communications protocol. Two technologies have been proposed for this interface: Computer Supported Telecommunication Applications (CSTA) and Switch to Computer Application Interface (SCAI). However, most PBX vendors have adopted CSTA as the industry standard. CSTA is the base on which a Telephony Server API (TSAPI) is defined. Almost every CSTA service has a one-to-one correspondence to a TSAPI function call. To open this system up for CTI application development, PBX manufacturers provide an Application Program Interface (API) (usually TAPI or JTAPI) that allows an external application to directly interface with the PBX 12. An API is a set of routines, protocols, and tools for building software applications. This client/server architecture extends telephone functionality to the logging application.
The Call Agent's API enables a speech/data application to setup and tear down calls, monitor call progress, detect Calling Line Identification (CLID), perform identification, and activate features, such as hold, transfer, conference, park, and pickup. It can redirect, forward, answer, and route incoming calls. It is also possible to generate and detect Dual Tone Multi-Frequency (DTMF) signals, which is the system used by touch-tone telephones to assign a specific frequency (including two separate tones) to each key so that it can easily be identified by a microprocessor.
To implement Third Party Call Control, a logging application 14 with a CTI interface accesses the Call Agent's API. From the CTI interface, the application 14 monitors each call. When recording is required, the logging application 14 commands the PBX 12 through the CTI interface to create a conference bridge. This client/server architecture extends telephone functionality to the logging application.
2.2 VoIP Media Component
Unlike passive recording solutions, active recording solutions participate with each call on the network. As a result, the logging application 14 is able to negotiate and terminate calls originating from the IP PBX 14. In the example shown in FIG. 2, a Media Component 15, such as the IPM260 available from Ai-Logix, Inc. is installed on a computer hosting the logging application 16. The IPM260 provides RTP termination, buffer, and synchronization capabilities, as well as recording.
When a call needs to be recorded, the call logging application 16 uses third party call control to request a conference bridge. The IP PBX 12 initiates a call to the IPM260. When the call is accepted, the IP PBX 12 creates a conference bridge with one leg terminating on the IPM260.
Call negotiation is required between the IPM260 and the IP PBX 12. Call negotiation is managed by a Call Control Interhop (hosted by the logging application 16). The IPM260 supports the Media Gateway Control Protocol (MEGACO) services, which is configured to point to the Call Control Interop. A gateway is defined herein as a system or device that links two dissimilar networks or domain. The interop must support the same protocol used on the local VoIP network (SIP or H.323). Once the call is accepted, a channel is opened on the IPM260 for the incoming RTP stream. Since both sides of the conversation have been summed by the conference bridge on the PBX 12, the complete conversation is passed into the IPM260 as a single stream.
A channel is defined herein as a concatenation of layers within the network to establish a path between two endpoints. A channel is generally the smallest subdivision of a transmission system. A channel may also be defined as a media-processing instance.
One of the most significant differences introduced by VoIP is how audio data arrives at an endpoint. On a conventional circuit-based network, the physical path between the two endpoints is fixed once a call is connected.
In the IP world, the two endpoints are not fixed and are viewed as connectionless. Media RTP packets carrying voice data for a single call can be routed through different paths. As a result, packets of voice data arrive at the endpoint at different times (jitter) and out of sequence. Designed for VoIP networks, the IPM260 supports both buffering capabilities (for removing jitter) and synchronization services. These capabilities are essential for high quality recordings.
2.3 Architecture of an Active Call Recording System
Like all recording systems, an active recording system must have access to signaling information to monitor call states and access to voice data for recording purposes. An active call recording system is preferably capable of initiating a conference bridge and terminating an incoming call. This requires third party call control as well as a Media Component 15 capable of terminating voice data. A simple active recording solution can be built with the following components shown in FIG. 3:
1. A CTI Interface 18, which interfaces with the CTI server (Call Agent 20) for third party call control. The CTI interface 18 is also used by a logging application 22 to obtain call details, such as call state, phone number, date, agent name, and DTMF.
2. A VoIP Media Component 17, which is a hardware component installed on the logging server. The VoIP Media component terminates a third leg 19 of the conference call. It then performs recording services.
3. A Call Control Interop 24, which is required for call negotiations between the IP PBX 12 and the Media Component 17.
In a call-recording environment, most call center operators want to record the total call experience. That is, they want to collect information such as which agent their customers are talking with, how soon they are transferred, how long they were on hold, and other information that may be displayed on the agent's terminal. In conventional circuit switching environments, call center recording is accomplished by monitoring the telephone port on a PBX or a switch where:
1. The PBX or switch uses centralized Start topology such that all telephone interfaces are distributed from the PBX or Switch.
2. Each telephone port includes only one conversation.
3. Voice is synchronized in both directions and the delay difference is negligible.
4. Signaling and voice information appear on the same pair of wires.
However, recording voice in a VoIP environment is different in the following ways:
1. The IP network uses a tree topology and each IP network element includes a switching function. Therefore, a call on the VoIP network is not distributed through a central switch as it is done in the circuit-switching environment. As a result, monitoring VoIP is not as straightforward as monitoring a PBX.
2. The IP link is a shared resource, such that there are media types other than voice and there is more than one conversation on the same IP link. Therefore, the recorder must be able to differentiate voice packets from non-voice packets and be able to differentiate one call from another.
3. The VoIP packets in each direction can experience different delays, and the packet delays in one direction can be different from one packet to another. Sometimes, the voice packets can reach the destination out of sequence. As a result, the tapping apparatus must have the ability to synchronize the two voice streams of a conversation. This differs from circuit networks where the voice is delivered in order and synchronization is maintained by network design.
4. When a call agent is used, the signaling data and the voice data can be carried on a different IP link.
Therefore, there is a need for a method and apparatus that can record data, voice, audio, and video from a computer network, such as a VoIP network, without requiring modification of an associated telephone system or impairing normal operation of the network and telephone system.