It is well known to record voice and video telephony data for purposes such as legal compliance, record keeping and training. In businesses such as contact centers, such recordings are often made for all inbound and outbound telephony sessions, while the ever-decreasing cost of data storage and processing power means that such recording is becoming more commonplace for businesses and individuals, particularly in the context of Internet Protocol (IP) and other packet-switched telephony.
In addition to the raw audio (and video, where appropriate) data, it is usual to also record metadata to assist in indexing and later retrieval of a call recording. For example, a WAV audio file of a call received by an agent working in a contact center might be augmented with an XML (extensible mark-up language) file including various items of data including:                Date and time of call        Identifier of the contact center site        Agent's internal extension number/address        Agent's name/ID        Supervisor's name/ID        DNIS (dialed number identification service) identifying the number dialed in an inbound call from customer        CLID (caller line identification) identifying the originating number from which the inbound call originated        Customer ID        Skillset allocated to call by interactive voice response (IVR) system        
In addition, many contact center systems enable an agent engaged in a telephony session to assign one or more so-called “activity codes” to the session. Traditionally this was implemented by an agent pressing a numeric key to generate a DTMF (dual tone multi-frequency) audio signal into the call recording, with the different digits representing different activity codes. Thus, in a retail banking contact center, agents might employ the activity code “1” to denote “new loan”, “2” for “overdraft increase”, etc.
In the event that a call deals with several topics, the agent might insert different activity codes at different points during the call. For example, shortly after beginning a conversation in which the customer asks to check the balance of her account, the agent might press “5” to indicate “balance enquiry”. The customer might subsequently enquire about the possibility of a new loan, at which point the agent enters activity code “1”. The recording of this call would then have these tones (the DTMF tones for 5 and 1) embedded in the recording. An automated search facility can process the recording to index the call as relating to these activities, the activity codes being associated with timestamps which are typically specified as an offset from the beginning of the call.
In more recent systems, DTMF codes recorded onto the conversation itself have been supplanted by the agent using the telephony set or workstation desktop application to send digital signals to the private branch exchange (PBX) or switch and to the contact center management application, and these activity codes would be tracked by the contact center's historical database—i.e. not in call recording itself.
A disadvantage with such activity codes is that they require the agent to remember which code relates to which activity.
A further disadvantage is that such codes are finite and hard coded and require time, effort and money to extend. The addition of a new code requires development effort and staff training. Even after it is added there can be a lag of months or even years until enough data have been amassed from which to extract useful trends.
Activity codes are very much a specific solution useful to contact centers and have little application outside the context of an agent dealing with a specific and well-defined range of issues, each of which can be assigned a code.
A further drawback is that by simply marking a discrete point in the conversation with a code, it can be difficult for a subsequent reviewer of the call to know exactly when to begin listening for relevant information, particularly if the activity code was inserted by the agent some time after the conversation developed gradually to a point where the agent realized that the activity code was indeed relevant.
For example, consider two scenarios. The first is the banking call described above, in which a customer is informed of her account balance and then asks about getting a new loan. If the agent presses the “new loan” code immediately this request is made, it is probably true that the subsequent few minutes of conversation will record the pertinent content, such as discussion about the purpose of the loan, the amount required, the term of the loan, interest rate options and so on. A supervisor or reviewer of the conversation will be able to accurately use the activity codes to home in on the appropriate section of the recording.
The second scenario is one in which an agent is taking a call in a contact center selling various types of insurance, life assurance and pensions. An experienced agent may speak with a customer about a range of topics, such as income levels and future income expectations, family commitments, current insurance levels, existing life and pension policies, and so forth. At a given point during a discussion about pension needs, the agent may realize that the customer is also a suitable prospect for (say) life assurance based on the previous few minutes of conversation. If the agent enters the activity code for “new life assurance prospect” at the point this realization occurs to the agent, a reviewer who begins listening to the recording from that point onwards might not understand the reason why the agent flagged this customer as a particularly interesting life assurance prospect. Combine this with the fact that some agents will necessarily be more astute than others and may grasp a customer's needs sooner than other agents, and it can be seen that using an activity code as a timestamp or offset is a crude and inefficient way of indexing a recording of a telephony session.
Accordingly it can be seen that activity codes, while useful in a limited scenario in a contact center environment, can only insert flags having a limited range of meanings at given points in a conversation, which can be a crude and imperfect indexing method.