1. Field of the Invention
The present invention relates to telecommunications systems and methods and more specifically to a high-quality voice network architecture.
2. Introduction
There is a longstanding problem of poor quality in speech delivered over telephone networks and specifically the public switched telephone network (PSTN). The PSTN is the concatenation of the world's public circuit-switched telephone networks. Originally a network of fixed-line analog telephone systems, the PSTN now has many digital and wireless components. The PSTN is largely governed by technical standards and uses telephone numbers for addressing. The basic telephone system still uses basic power communication principles wherein a central office that applies power to the telephone lines. Many users have noticed that in a power outage, they still have a telephone signal for this reason. These lines are typically copper or a hybrid of fiber and coaxial cable. They inherintly are low bandwidth transmission lines. There has been a desire over time to increase the bandwidth of these standard transmission lines such as data by data compression.
The basic digital circuit in the PSTN is a 64-kilobit-per-second channel, originally designed by Bell Labs called a “DS0” or Digital Signal 0. To carry a typical phone call from a calling party to a called party, the audio sound is digitized at an 8 kHz sample rate using 8-bit pulse code modulation. The DS0's are the basic granularity at which switching takes place in a telephone exchange. DS0's are also known as timeslots because they are multiplexed together in a time-division fashion. Multiple DS0's are multiplexed together on higher capacity circuits, such that 24 DS0's make a DS1 signal, which when carried on copper is the well-known, T-carrier system, T1 (the European equivalent is an E1, containing 32 64 kbit/s channels). In modern networks, this multiplexing is moved as close to the end user as possible, usually into cabinets at the roadside in residential areas, or into large business premises.
The timeslots are conveyed from the initial multiplexer to the exchange over a set of equipment collectively known as the access network. The access network and inter-exchange transport of the PSTN use synchronous optical transmission (SONET and SDH) technology, although some parts still use the older Plesiochronous Digital Hierarchy (PDH) technology.
In addition to the mu-law and A-law coding techniques commonly used in the PSTN to improve the dynamic range in the voice passband, various compression techniques (e.g., ADPCM, CELP) for data transmission rates under 64 kbps are also widely deployed. These efforts are attempts to improve the network efficiency with minimal degradation to the quality of sound transmitted over cellular radio access network and packet-based (e.g., IP and ATM) networks. However, there has yet be established a definitive technology for improving the transmission of sound over telephone-band networks.
The reduction in sound quality over the telephone has many downsides. For example, in normal conversation, sounds or portions of words spoken may be dropped or lost via the low bandwidth. These kinds of disturbances hinder the enjoyment of any conversation. In many languages, small sound nuances provide different meanings and any degree of reduced sound quality reduces the capability of hearing and understanding the speaker.
In addition to human-human interaction, the instances of human-computer speech interaction are also increasing. For example, people may call a help line for a business and engage in a human-computer dialog using technology available from AT&T Corp. These speech services include a speech server that includes modules for automatic speech recognition (ASR), language understanding, dialog analysis, and text-to-speech for carrying on a conversation with the user using natural language. These components are known to those of skill in the art. These systems, however, require clean speech from the user to provide accurate and acceptable ASR. With standard telephone speech, however, the low-bandwidth speech, with dropped portions of words transmitted and low quality sound “hear” by the ASR module of a speech recognition system, reduce the capability of the system to engage the user in a normal conversation.
What is needed in the art is an efficient and effective technology for improving the quality of voice and other sounds transmitted over the PSTN or similar network. These improvements will provide more enjoyable personal discussions as well as improve the use of spoken dialog systems over PSTN networks.