1. Field of the Invention
The invention relates to the field of real time digital audio processing, particularly in a telephony switch context.
2. Background of the Invention
Existing telephone systems, such as the Calltrol Object Telephony Server (OTS™), tend to require relatively expensive special purpose hardware to process hundreds of voice channels simultaneously. More information about this system can be found at www.calltrol.com/newsolutionsforoldchallenges.pdf, www.calltrol.com/crmconvergence_saleslogix.pdf, and www.calltrol.com/CalltrolSDKWhitepaper6-02.pdf, each of which is expressly incorporated herein by reference in its entirety.
In many traditional systems, a single dedicated analog or digital circuit is provided for each public switch telephone network (PSTN) line. See, e.g., Consumer Microcircuits Limited CMX673 datasheet, Clare M-985-01 datasheet. In other types of systems, the call progress tone analyzer may be statistically shared between multiple channels, imposing certain limitations and detection latencies.
Digital signal processor algorithms are also known for analyzing call progress tones (CPT). See, e.g., Manish Marwah and Sharmistha Das, “UNICA—A Unified Classification Algorithm For Call Progress Tones” (Avaya Labs, University of Colorado), expressly incorporated herein by reference.
Call progress tone signals provide information regarding the status or progress of a call to customers, operators, and connected equipment. In circuit-associated signaling, these audible tones are transmitted over the voice path within the frequency limits of the voice band. The four most common call progress tones are: Dial tone; Busy tone; Audible ringback; and Reorder tone. In addition to these, there are a number of other defined tones, including for example the 12 DTMF codes on a normal telephone keypad. There may be, for example, about 53 different tones supported by a system. A call progress tone detector, may additionally respond to cue indicating Cessation of ringback; Presence/cessation of voice; Special Information Tones (SITs); and Pager cue tones. Collectively, call progress tones and these other audible signals are referred to as call progress events. Call progress tone generation/detection in the network is generally based on a Precise Tone Plan. In the plan, four distinctive tones are used singly or in combination to produce unique progress tone signals. These tones are 350 Hz, 440 Hz, 480 Hz and 620 Hz. Each call progress tone is defined by the frequencies used and a specific on/off temporal pattern.
The ITU-T E.180 and E.182 recommendations define the technical characteristics and intended usage of some of these tones: busy tone or busy signal; call waiting tone; comfort tone; conference call tone; confirmation tone; congestion tone; dial tone; end of three-party service tone (three-way calling); executive override tone; holding tone; howler tone; intercept tone; intrusion tone; line lock-out tone; negative indication tone; notify tone; number unobtainable tone; pay tone; payphone recognition tone; permanent signal tone; preemption tone; queue tone; recall dial tone; record tone; ringback tone or ringing tone; ringtone or ringing signal; second dial tone; special dial tone; special information tone (SIT); waiting tone; warning tone; Acceptance tone; Audible ring tone; Busy override warning tone; Busy verification tone; Engaged tone; Facilities tone; Fast busy tone; Function acknowledge tone; Identification tone; Intercept tone; Permanent signal tone; Positive indication tone; Re-order tone; Refusal tone; Ringback tone; Route tone; Service activated tone; Special ringing tone; Stutter dial tone; Switching tone; Test number tone; Test tone; and Trunk offering tone. In addition, signals sent to the PSTN include Answer tone; Calling tone; Guard tone; Pulse (loop disconnect) dialing; Tone (DTMF) dialing, and other signals from the PSTN include Billing (metering) signal; DC conditions; and Ringing signal. The tones, cadence, and tone definitions, may differ between different countries, carriers, types of equipment, etc. See, e.g., Annex to ITU Operational Bulletin No. 781-1.II.2003. Various Tones Used In National Networks (According To ITU-T Recommendation E.180) (03/1998).
Characteristics for the call progress events are shown in Table 1.
TABLE 1Call Progress EventCharacteristicsFrequenciesName(Hz)Temporal PatternEvent Reported AfterDial Tone350 + 440Steady toneApproximately 0.75secondsBusy Tone480 + 6200.5 seconds on/2 cycles of precise,0.5 seconds off3 cycles of nonpreciseDetection440 + 4802 seconds on/2 cycles of precise orAudible—4 seconds offnonpreciseRingback—3 to 6.5 seconds afterCessationringback detectedReorder480 + 6200.25 seconds on/2 cycles of precise,0.25 seconds off3 cycles of nonpreciseDetection200 to 3400—Approximately 0.25Voice——to 0.50 secondsCessationApproximately 0.5to 1.0 seconds aftervoice detectedSpecialSeeSee Table 2.Approximately 0.25InformationTable 2.to 0.75 secondsTones (SITs)Pager Cue14003 to 4 tones at2 cycles of preciseTones0.1 to 0.125or any pattern ofintervals1400-Hz signals
Dial tone indicates that the CO is ready to accept digits from the subscriber. In the precise tone plan, dial tone consists of 350 Hz plus 440 Hz. The system reports the presence of precise dial tone after approximately 0.75 seconds of steady tone. Nonprecise dial tone is reported after the system detects a burst of raw energy lasting for approximately 3 seconds.
Busy tone indicates that the called line has been reached but it is engaged in another call. In the precise tone plan, busy tone consists of 480 Hz plus 620 Hz interrupted at 60 ipm (interruptions per minute) with a 0.5 seconds on/0.5 seconds off temporal pattern. The system reports the presence of precise busy tone after approximately two cycles of this pattern. Nonprecise busy tone is reported after three cycles.
Audible ringback (ring tone) is returned to the calling party to indicate that the called line has been reached and power ringing has started. In the precise tone plan, audible ringback consists of 440 Hz plus 480 Hz with a 2 seconds on/4 seconds off temporal pattern. The system reports the presence of precise audible ringback after two cycles of this pattern.
Outdated equipment in some areas may produce nonprecise, or dirty ringback. Nonprecise ringback is reported after two cycles of a 1 to 2.5 seconds on, 2.5 to 4.5 seconds off pattern of raw energy. The system may report dirty ringback as voice detection, unless voice detection is specifically ignored during this period. The system reports ringback cessation after 3 to 6.5 seconds of silence once ringback has been detected (depending at what point in the ringback cycle the CPA starts listening).
Reorder (Fast Busy) tone indicates that the local switching paths to the calling office or equipment serving the customer are busy or that a toll circuit is not available. In the precise tone plan, reorder consists of 480 Hz plus 620 Hz interrupted at 120 ipm (interruptions per minute) with a 0.25 seconds on/0.25 seconds off temporal pattern. The system reports the presence of precise reorder tone after two cycles of this pattern. Nonprecise reorder tone is reported after three cycles.
Voice detection has multiple uses, and can be used to detect voice as an answer condition, and also to detect machine-generated announcements that may indicate an error condition. Voice presence can be detected after approximately 0.25 to 0.5 seconds of continuous human speech falling within the 200-Hz to 3400-Hz voiceband (although the PSTN only guarantees voice performance between 300 Hz to 800 Hz. A voice cessation condition may be determined, for example, after approximately 0.5 to 1.0 seconds of silence once the presence of voice has been detected.
Special Information Tones (SITs) indicate network conditions encountered in both the Local Exchange Carrier (LEC) and Inter-Exchange Carrier (IXC) networks. The tones alert the caller that a machine-generated announcement follows (this announcement describes the network condition). Each SIT consists of a precise three-tone sequence: the first tone is either 913.8 Hz or 985.2 Hz, the second tone is either 1370.6 Hz or 1428.5 Hz, and the third is always 1776.7 Hz. The duration of the first and second tones can be either 274 ms or 380 ms, while the duration of the third remains a constant 380 ms. The names, descriptions and characteristics of the four most common SITs are summarized in Table 2.
TABLE 2SpecialFirst ToneSecond ToneThird ToneInformationFrequencyFrequencyFrequencyTones (SITs)DurationDurationDurationNameDescription(Hz)(ms)(Hz)(ms)(Hz)(ms)NC1No circuit found985.23801428.53801776.7380ICOperator913.82741370.62741776.7380interceptVCVacant circuit985.23801370.62741776.7380(nonregisterednumber)RO1Reorder (system913.82741428.53801776.7380busy)1Tone frequencies shown indicate conditions that are the responsibility of the BOC intra-LATA carrier. Conditions occurring on inter-LATA carriers generate SITs with different first and second tone frequencies.
Pager cue tones are used by pager terminal equipment to signal callers or connected equipment to enter the callback number (this number is then transmitted to the paged party). Most pager terminal equipment manufacturers use a 3- or 4-tone burst of 1400 Hz at 100- to 125-ms intervals. The system identifies three cycles of 1400 Hz at these approximate intervals as pager cue tones. To accommodate varying terminal equipment signals, tone bursts of 1400 Hz in a variety of patterns may also be reported as pager cue tones. Voice prompts sometimes accompany pager cue tones to provide instructions. Therefore, combinations of prompts and tones may be detected by configuring an answer supervision template to respond to both voice detection and pager cue tone detection.
A Goertzel filter algorithm may be used to detect the solid tones that begin fax or data-modem calls. If any of the following tones are detected, a “modem” (fax or data) state is indicated: 2100 Hz, 2225 Hz, 1800 Hz, 2250 Hz, 1300 Hz, 1400 Hz, 980 Hz, 1200 Hz, 600 Hz, or 3000 Hz. Fax detection relies on the 1.5 seconds of HDLC flags that precede the answering fax terminal's DIS frame. DIS is used by the answering terminal to declare its capabilities. After a solid tone is detected, a V.21 receiver is used to detect the HDLC flags (01111110) in the preamble of DIS signal on the downstream side. If the required number of flags are detected, fax is reported. Otherwise, upon expiration of a timer, the call is may be determined to be a data modem communication. See, e.g., U.S. Pat. No. 7,003,093, the entirety of which is expressly incorporated herein by reference. See also, U.S. Pat. No. 7,043,006, expressly incorporated herein by reference.
Therefore, a well developed system exists for in-band signaling over audio channels, with a modest degree of complexity and some variability between standards, which themselves may change over time.
One known digital signal processor architecture, exemplified by the nVidia Tesla™ C870 GPU device, provides a massively multi-threaded architecture, providing over 500 gigaflops peak floating point performance. This device encompasses a 128-processor computing core, and is typically provided as a coprocessor on a high speed bus for a standard personal computer platform. Similarly, the AMD/ATI Firestream 9170 also reports 500 gigaflops performance from a GPU-type device with double precision floating point capability. Likewise, newly described devices (e.g., AMD Fusion) integrate a CPU and GPU on a single die with shared external interfaces. See, for example, www.nvidia.com/object/tesla_product_literature.html, S1070 1U System Specification Document (2.03 MB PDF), NVIDIA Tesla S1070 Datasheet (258 KB PDF), NVIDIA Tesla Personal Supercomputer Datasheet (517 KB PDF), C1060 Board Specification Document (514 KB PDF), NVIDIA Tesla C1060 Datasheet (153 KB PDF), NVIDIA Tesla 8 Series Product Overview (1.69 MB PDF), C870 Board Specification Document (478 KB PDF), D870 System Specification Document (630 KB PDF), S870 1U Board Specification Document (13.3 MB PDF), NVIDIA Tesla 8 Series: GPU Computing Technical Brief (3.73 MB PDF), www.nvidia.com/object/cuda_programming_tools.html (PTX: Parallel Thread Execution ISA Version 1.2), developer.download.nvidia.com/compute/cuda/2_0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdf, developer.download.nvidia.com/compute/cuda/2_0/docs/CudaReferenceManual_2.0.pdf, developer.download.nvidia.com/compute/cuda/2_0/docs/CUBLAS_Library_2.0.pdf, developer.download.nvidia.com/compute/cuda/2_0/docs/CUFFT_Library_2.0.pdf, each of which is expressly incorporated herein by reference in its entirety.
The nVidia Tesla™ GPU is supported by the Compute Unified Device Architecture (CUDA) software development environment, which provides C language support. Typical applications proposed for the nVidia Tesla™ GPU, supported by CUDA, are Parallel bitonic sort; Matrix multiplication; Matrix transpose; Performance profiling using timers; Parallel prefix sum (scan) of large arrays; Image convolution; 1D DWT using Haar wavelet; OpenGL and Direct3D graphics interoperation examples; Basic Linear Algebra Subroutines; Fast Fourier Transform; Binomial Option Pricing; Black-Scholes Option Pricing; Monte-Carlo Option Pricing; Parallel Mersenne Twister (random number generation); Parallel Histogram; Image Denoising; and a Sobel Edge Detection Filter. Therefore, the typical proposed applications are computer software profiling, matrix applications, image processing applications, financial applications, Seismic simulations; Computational biology; Pattern recognition; Signal processing; and Physical simulation. CUDA technology offers the ability for threads to cooperate when solving a problem. The nVidia Tesla™ GPUs featuring CUDA technology have an on-chip Parallel Data Cache that can store information directly on the GPU, allowing computing threads to instantly share information rather than wait for data from much slower, off-chip DRAMs. Likewise, the software compile aspects of CUDA are able to partition code between the GPU and a host processor, for example to effect data transfers and to execute on the host processor algorithms and code which are incompatible or unsuitable for efficient execution on the GPU itself.
GPU architectures are generally well-suited to address problems that can be expressed as data-parallel computations: the same program is executed on many data elements in parallel, with high arithmetic intensity, the ratio of arithmetic operations to memory operations. Because the same program is executed for each data element, there is a lower requirement for sophisticated flow control; and because it is executed on many data elements and has high arithmetic intensity, the memory access latency can be hidden with calculations instead of big data caches. Thus, the GPU architecture typically provides a larger number of arithmetic logic units than independently and concurrently operable instruction decoders. Data-parallel processing maps data elements to parallel processing threads. Many applications that process large data sets such as arrays can use a data-parallel programming model to speed up the computations. In 3D rendering large sets of pixels and vertices are mapped to parallel threads. Similarly, image and media processing applications such as post-processing of rendered images, video encoding and decoding, image scaling, stereo vision, and pattern recognition can map image blocks and pixels to parallel processing threads. In fact, many algorithms outside the field of image rendering and processing are accelerated by data-parallel processing, from general signal processing or physics simulation to computational finance or computational biology.
The Tesla™ GPU device is implemented as a set of multiprocessors (e.g., 8 on the C870 device), each of which has a Single Instruction, Multiple Data architecture (SIMD): At any given clock cycle, each processor (16 per multiprocessor on the C870) of the multiprocessor executes the same instruction, but operates on different data. Each multiprocessor has on-chip memory of the four following types: One set of local 32-bit registers per processor, a parallel data cache or shared memory that is shared by all the processors and implements the shared memory space, a read-only constant cache that is shared by all the processors and speeds up reads from the constant memory space, which is implemented as a read-only region of device memory, and a read-only texture cache that is shared by all the processors and speeds up reads from the texture memory space, which is implemented as a read-only region of device memory. The local and global memory spaces are implemented as read-write regions of device memory and are not cached. Each multiprocessor accesses the texture cache via a texture unit. A grid of thread blocks is executed on the device by executing one or more blocks on each multiprocessor using time slicing: Each block is split into SIMD groups of threads called warps; each of these warps contains the same number of threads, called the warp size, and is executed by the multiprocessor in a SIMD fashion; a thread scheduler periodically switches from one warp to another to maximize the use of the multiprocessor's computational resources. A half-warp is either the first or second half of a warp. The way a block is split into warps is always the same; each warp contains threads of consecutive, increasing thread IDs with the first warp containing thread 0. A block is processed by only one multiprocessor, so that the shared memory space resides in the on-chip shared memory leading to very fast memory accesses. The multiprocessor's registers are allocated among the threads of the block. If the number of registers used per thread multiplied by the number of threads in the block is greater than the total number of registers per multiprocessor, the block cannot be executed and the corresponding kernel will fail to launch. Several blocks can be processed by the same multiprocessor concurrently by allocating the multiprocessor's registers and shared memory among the blocks. The issue order of the warps within a block is undefined, but their execution can be synchronized, to coordinate global or shared memory accesses. The issue order of the blocks within a grid of thread blocks is undefined and there is no synchronization mechanism between blocks, so threads from two different blocks of the same grid cannot safely communicate with each other through global memory during the execution of the grid.
Telephony control and switching applications have for many years employed general purpose computer operating systems, and indeed the UNIX system was originally developed by Bell Laboratories/AT&T. There are a number of available telephone switch platforms, especially private branch exchange implementations, which use an industry standard PC Server platform, typically with specialized telephony support hardware. These include, for example, Asterisk (from Digium) PBX platform, PBXtra (Fonality), Callweaver, Sangoma, etc. See also, e.g., www.voip-info.org/wiki/. Typically, these support voice over Internet protocol (VOIP) communications, in addition to switched circuit technologies.
As discussed above, typical automated telephone signaling provides in-band signaling which therefore employs acoustic signals. A switching system must respond to these signals, or it is deemed deficient. Typically, an analog or digital call progress tone detector is provided for each channel of a switched circuit system. For VOIP systems, this functionality maybe provided in a gateway (media gateway), either as in traditional switched circuit systems, or as a software process within a digital signal processor.
Because of the computational complexity of the call progress tone analysis task, the density of digital signal processing systems for simultaneously handling a large number of voice communications has been limited. For example, 8 channel call progress tone detection may be supported in a single Texas Instruments TMS320C5510™ digital signal processor (DSP). See, IP PBX Chip from Adaptive Digital Technologies, Inc. (www.adaptivedigital.com/product/solution/ip_pbx.htm). The tone detection algorithms consume, for example, over 1 MIPS per channel for a full suite of detection functions, depending on algorithm, processor architecture, etc. Scaling to hundreds of channels per system is cumbersome, and typically requires special purpose dedicated, and often costly, hardware which occupy a very limited number of expansion bus slots of a PBX system.