Conventional mobile communication platforms include cellular communications, for example, Global Systems for Mobile (GSM) communications. Other conventional platforms that support limited mobility include WiFi, which is based on IEEE 802.11 standards. Cellular and WiFi are both well known and established wireless communication platforms.
Next generation platforms may be designed to permit mobile users to move between cellular and WiFi networks and include an Unlicensed Mobile Access (UMA) standard that may provide a switch controller for carriers to permit users to transcend between cellular and WiFi networks and vice-versa. However, the UMA standard may have disadvantages including that carriers generally control calls and decide if and when to switch users between networks.
An advanced mobile communication platform may be needed to provide enterprise level communication and control over users and the networks such that enterprises (instead of carriers) may select networks and/or control calls based on enterprise driven criteria rather than carrier driven criteria.
Further, in mobile/wireless communication, generally there have been the following problems: (a) echo; (b) packet delay, packet delay variation (packet jitter), and packet loss which affect quality of service (QoS); (c) hardware or software platform dependency of protocols; and (d) security of enterprise resource access. The problems are described as follows:
(a) Echo
In voice communications such as conventional PSTN, conference phone, cellular mobile phone, and voice over IP, echo cancellation (EC) technology has been widely used to improve quality of service (QoS) for end-users. Generally there are two types of echo canceller. One types of echo canceller is generally called line or network echo canceller (LEC). LEC is generally used to remove electrical echoes caused by reflections of hybrid components on a network where 2-line and/or 4-line conversions take place. Another type of echo canceller is generally called acoustic echo canceller (AEC). AEC is generally used to remove acoustic echoes caused by acoustic sound feedbacks from a speaker to a microphone on a hand-free speaker phone, mobile phone, or conference phone. Compared with LEC, implementing an AEC may be more challenging due to some of the following factors: longer echo tail since the sound speed is much slower than the light (or electron) speed, and accordingly the echo canceller is required to have more processing power and more memory; more dynamic change of the acoustic echo characteristics because of movement of the phone or talker and changes in the environment, and accordingly the echo canceller may be required to track and catch up changes in the echo characteristics more quickly; and multiple echo paths due to multiple reflections from different objects with different distances and/or orientations.
Current acoustic echo cancellation technologies generally have limitations. Acoustic echo cancellation technology may have been invented and used for at least 40 years so far. However, the basic approach to cancelling acoustic echoes may not have been significantly changed. In general, a typical AEC utilizes an adaptive filter to model one or more echo path transfer functions and try to produce a replica of the echoes. The AEC may then subtract this replica from the near-end input signal to form a supposedly final echo-free far-end signal output.
Most of acoustic echo cancellation technology advancements so far are to employ different kinds of filters such as a FIR or IIR filter, single band or multiple bands filter, or time-domain or frequency-domain filter. Further, different algorithms such as LMS, RLS, APA, and so on have been used to improve filter efficiency. Nevertheless, even with all these technology improvements, AEC design and implementation may still be a very challenging task today. Conventional filters may show many limitations on handling the acoustic echoes because of the complexity and the variability nature of the acoustic echoes. One of the limitations may be poor double-talk (both near-end and far-end speakers are talking) performance. Calculations in the conventional filters may result in divergence instead of convergence between the echoes and the replica during a double-talk.
(b) Packet delay, packet delay variation (packet jitter), and packet loss which affect quality of service (QoS)
In voice over IP and video over IP communications, voice and/or video media contents may need to be transferred from the transmitter to the receiver in real-time, while the underlying IP network was originally designed for non real-time date communications. Accordingly, providing and maintaining the quality of service (QoS) to the end-users may become a very challenging task. The packet delay, the packet delay variation (packet jitter) and the packet loss from end-to-end may be considered three important QoS parameters which affect the quality and performance of the voice and video communications over IP network.
Current jitter buffer technologies tend to have limitations. A jitter buffer scheme which may also be called de-jitter buffer scheme is usually employed on the receiver side to compensate or remove the network packet jitter. Basically, the scheme may not play out the packet as soon as the packet is received. Instead, the scheme may queue up the incoming packets and play out the queued packets at even intervals. In effect, the packet queuing may represent inserting a delay before the play-out happens. The inserted delay is usually called play-out delay.
There may be at least two issues on the current jitter buffer designs and implementations. The first issue may pertain to how much the play-out delay needs to be inserted. There may be a tradeoff on the amount of the play-out delay. For a large delay, there may be less packet loss. On the other hand, for a small delay, there may be a better interactive experience. The first issue may have been acceptably resolved by the adaptive jitter buffer scheme. In the adaptive jitter buffer scheme, a receiver may estimate the network packet jitter based on the timestamp of the RTP header of the incoming packets and the receiver local time. The receiver may then insert the minimal delay just enough to compensate the network packet jitter.
The second issue on the jitter buffer design may pertain to when to insert the play-out delay. Ideally, the play-out delay can be inserted at the beginning of each talk spurt. Accordingly, each talk spurt may be played out at even intervals, but only the silence periods between talk spurts are expanded or compressed. For example, if the transmitter employs silence suppression technology, the packets coming in the receiver may ideally have gaps between talk spurts such that a device may be implemented to identify the beginning of the each talk spurt based on the timestamp and the sequence number on the RTP headers of the incoming packets.
However, in reality, inserting delays at talk spurt beginnings can be achieved only in limited situations. Most current silence suppression technologies may have limitations and may perform well only for some clean situations such as single human speaker or low background noise. Current silence suppression technologies may not perform well for some other situations such as multiple human speakers in a conference or high background noise such as during mobile communications. Therefore, many applications may be executed without utilizing or activating silence suppression, in order to preserve better audio quality. As a result, the packets coming into the receiver may be continuous without any pauses. There may be no clue on the timestamp and the sequence number to tell if the packets represent silence or a talk spurt. Having no clue for identifying silence, the current jitter buffer technologies tend to perform poorly. One reason for the poor performance may be that the current jitter buffer schemes generally look at only the RTP header information of the incoming packets, but not the content on the RTP payload.
(c) Hardware or software platform dependency of protocols:
Hardware or software platform dependency may cause interoperability and/or configuration problems. For interactive user sessions in communication which involve multimedia elements such as video, voice, chat, gaming, or virtual reality, there may be a need for a light weight protocol over a communication protocol such as, for example, Session Initiation Protocol (SIP) that can efficiently transport information between a server and a client and can work independently of hardware and software platforms, a control plane protocol in use between the server and the client, and an underlying transport layer or the medium over which the server and the client communicate. There may be a need for a protocol that is fast enough to support critical real time control messages and is flexible enough for large-volume data transfer with minimal delay. However, prior-art protocols such as UMA are generally complex and difficult to establish interoperability.
(d) Security when enterprise resources are accessed from mobile devices
More or more enterprises are allowing their employees to use their cellular/mobile phones for business purposes. With availability of high speed networks such as WiFi, Edge, UMTS, CDMA EVDO, etc. to mobile phones, different vendors have been implementing VoIP (Voice over IP) for the mobile phones. Such implementations may require opening enterprise firewalls to allow VoIP related protocols, such as SIP, RTP, etc. to operate.
In addition to VoIP, other enterprise data centric applications may also be extended to the mobile phones. The applications may include one or more of Presence/Instant Messaging, Intranet web resources, CRM, Support database, etc. If the clients for one or more the above applications on the mobile phones access the enterprise resources directly, enterprise firewalls may need to be opened for multiple protocols, and opening the enterprise firewalls may cause security problems.