The present invention is related to remote control of devices, and, in particular, to remote control of devices using telephones or other communication devices using a telephony signaling path.
Many communications and information services are enabled by telephone access to computer applications. A common mode for interacting with such computer applications is through a touch-tone telephone. Services often connect multiple applications to the user. For example, it is possible for a voice messaging service to be accessed through a pre-paid service. However, this creates a modality problem. For example, the pre-paid platform might wish to know when the user presses and holds the “#” key for a relatively long time (this being referred to as both the “long pound” and the “long octothorpe”), while the voice messaging platform might wish to know when the user enters digits, such as for menu navigation. The modality problem is that all digits entered by the user today get sent to both applications, as the digits are sent and both applications listen to the bearer channel. Each application must be prepared to receive and discard notifications of key presses in which it has no interest, complicating the design of the application as well as wasting the use of processing and communications resources during operation.
Numerous applications have been deployed for use in conjunction with the traditional time-division-multiplexed (TDM)-based telephone network. In many cases the applications simply receive the TDM-based “in-band” media stream, i.e., the voice channel, and the applications are responsible for continually decoding the media and monitoring for the presence of certain user input of a signaling nature, such as tones indicating that a particular key on the telephone keypad has been pressed. Certain improvements to the TDM network have been made, such as the Advanced Intelligent Network (AIN), which have the goal of separating signaling traffic from media traffic. However, in practice most of the application logic resides in an “intelligent peripheral” that is coupled along the media path, because of the low-level, device control nature of the associated protocols. There are too many messages with too short a latency budget for a total separation of application logic from the Intelligent Peripheral. The result is that application developers write their applications for deployment on the intelligent peripheral, usually with proprietary intelligent peripheral languages. Thus, AIN does not fulfill the promise of separating application logic from media processing.
The situation is more complex for the packet-switched environments, such as Voice-over-IP. Existing approaches such as H.248.1 (MEGACO), Session Initiation Protocol (SIP) and an in-band technique described in RFC 2833 are described in turn.
H.248.1 (MEGACO) has a provision for reporting key press digits detected or generated by an “endpoint”, which in H.248.1 is a media gateway (MG). The MG can be an IP phone, an access gateway, or a trunking gateway. In the case of the IP phone, the IP phone can transmit the key presses directly at the protocol level. In the case of a gateway, the gateway can detect the key presses using DTMF detectors. Media Gateway Control Protocol (MGCP) is a proprietary Cisco protocol that operates in much the same manner as H.248.1. These protocols employ a master -slave approach in which a Media Gateway Controller (MGC) commands the MG (using a device control protocol signaling link) to connect a tone detector to an incoming circuit and wait for a digit map match. When the MG detects a key press pattern of interest, it notifies the MGC over the same signaling link, returning the actual digit string detected.
In H.248.1, however, one and only one MGC may control the resources in an MG. Applications that have an interest in user signaling must be a part of the MGC application—there is no provision for independent, third-party applications to receive user signaling information. In MGCP, a first MGC may “pass off” control to a second MGC, but one and only one MGC may control a resource at any given time. The limitation of one and only one controller controlling a resource is a direct result of the master/slave nature of the MGCP and H.248.1 protocols. That is, the protocol requires the MG to be in an exclusive relationship to an MGC. Although these protocols also allow for “virtual MGs” within a physical MG, in which case there may be multiple MGCs serving as masters to the set of virtual MGs in a single physical MG, the virtual MGs are simply partitions of a physical MG. There is no provision for enabling multiple independent applications to selectively obtain user signaling information from a single stream of user input.
It has been proposed that a peer-to-peer protocol such as the Session Initiation Protocol (SIP) be used to transport key press signaling, such as via the SIP INFO method. The proposed mechanism closely follows the protocol of MGCP and H.248.1, including the use of MGCP and H.248.1 messages for specifying digit maps and notifications. However, the proposals have envisioned only a single application requesting notifications, which is a result of there being no mechanism for addressing endpoints of interest.
Cisco Systems has introduced a method for transporting DTMF digits using SIP in the SIP signaling path using the SIP NOTIFY method. However, this method has a number of disadvantages. First, notifications can only go to a single egress gateway; it is not possible for a third-party application to register for notifications. Second, the egress gateway receives notifications of every DTMF digit, whether it has an interest in them or not. Third, there is no provision for selectively passing through or clamping the DTMF tones from the media stream. If the ingress gateway passes DTMF, there is the risk of network elements interpreting both the in -band DTMF and the corresponding DTMF signaling received via the NOTIFY mechanism, potentially resulting in incorrect operation.
Another proposed method of transporting key press signaling is to use in-band representations for the keys. For example, RFC 2833 describes transporting key presses as named events, rather than as digital waveforms representing the key presses. While this approach uses less bandwidth and processing resources in the media path, it has serious drawbacks that limit its usability. First, a point-to-point media relationship between the endpoint and the application is generally assumed, leaving no provision for third-party involvement with collecting digits. Although in theory RFC 2833 could be used with third-party applications, rather complicated and unrealistic setup and operation are required. Additionally, because of the particular way that RFC 2833 handles redundancy, it does not meet the reliability requirements for signaling traffic. Moreover, RFC 2833 uses more bandwidth than is necessary, by sending multiple copies of the same packet for normal, lossless operation. Finally, applications receive all key presses, whether they have an interest in the key presses or not, making for inefficient use of communication and processing resources.
Many prior systems enable users to remotely control devices, such as household appliances, office equipment or equipment at unmanned locations, via telephone. In such a system, the user presses buttons on the telephone keypad to send commands to a remotely controlled device. Most such systems require establishing a voice path to carry DTMF signals between a remote telephone and the device or a separate controller that is connected to the device. (For simplicity, the device and such a separate controller are collectively referred to in this Background as a remotely controlled device.) Such systems require DTMF detection hardware or software in the remotely controlled device. In addition, if a remote control telephone call is routed over a voice-over-IP (VoIP) or similar packet-switched network to the remotely controlled device, the device requires a real-time protocol (RTP) stack to receive the DTMF audio signals, even though the device receives no other audio signals. Thus, expensive special purpose equipment is required in or near the remotely controlled device.
Additional complications are introduced if local telephones, i.e., telephones located on the same premises as the remotely controlled device, are to be used to control the device. When a device is controlled via a telephone call placed to the device from a remote telephone (i.e., a telephone not located on the same premises as the remotely controlled device), the device typically employs some form of authentication, such as requiring entry of a passcode, to ensure the caller is authorized to remotely control the device. However, from the perspective of the remotely controlled device, local telephones are treated the same as remote telephones, in that the same authentication method is used for local telephones as for remote telephones. For example, if attempts to remotely control the device from a remote telephone require entry of a passcode, attempts to control the device from a local telephone also require entry of the passcode, even though the local telephone is in a secure location, such as within the same house as the remotely controlled device. Such authentication methods for local telephones make the user interfaces to such systems cumbersome and unfriendly.
In some systems, a user of a mobile telephone sends a short text message (SMS message) to a vending machine to authorized a purchase, provide billing information (such as the user's mobile telephone number) and/or send a product selection to the vending machine. In such systems, the user must compose the SMS message and send the message to the vending machine's address. However, no call is placed to the vending machine.