Speech-to-speech translation (STS) systems are usually delivered in one of two different forms: online over the Internet or offline embedded on a user's device (e.g., smartphone or other suitable computing device). The online version has the advantage that it can benefit from significant processing resources on a large server (the cloud), and provides a data feed to the service provider that makes improvements and customization possible. However, online processing requires continuing network connectivity, which cannot be guaranteed in all locations or is not desirable in some instances due to roaming costs or privacy/security concerns. As alternative deployment, speech-to-speech translators, such as JIBBIGO speech translation apps, can be delivered as software running embedded locally on the smartphone itself, and no network connectivity is needed after the initial download of the translation application. Such offline embedded speech translation capability is the preferred deployment for many if not most practical situations where language support is needed, as networks may not be available, intermittent or too expensive. Most travelers experience such intermittent or absent connectivity, for example, during airline flights, remote geographic locations, buildings, or simply because data roaming is turned off to avoid the associated roaming charges while traveling in a foreign country.
The way such speech translation services or software are delivered also has implications to what extent the software can/must operate in a domain-dependent or -independent manner and whether it can adapt to the user's context. STS systems will usually work rather well for a domain and not so well for another domain (domain-dependence) if they have been closely optimized and tuned to a specific domain of use, or they attempt domain-independence by working more or less equally well for all domains. Either solution limits performance for all specific situations.
A user commonly runs an online client program on his/her computing device. This device typically digitizes and possibly encodes speech, then transmits samples or coefficients over a communication line to a server. The server then performs the heavy computation speech recognition and/or translation and sends the result back to the user via a communication line, and the result is displayed on the user's device. Different online designs have been proposed that move different parts of a processing chain off to the server and do more or less computing work on the device. In speech recognition, translation and translation systems, the user's device can be as simple as just a microphone, or an analog to digital converter, or provide more complex functions such as noise suppression, encoding as coefficients, one or more speech recognition passes, or one or more language processing steps. An off-line design by contrast runs the entire application on the device itself as an embedded application. All computation is done on the device locally and no transmission between a client and a server is needed during use.
Typically, an online design has the advantage that it needs only a very simple client and thus an application can be run on a very simple computing device or mobile phone, while all the heavy computations and processing are done on a large computing server. For speech and machine translation this can mean that more advanced but computationally intensive algorithms can be used, and up-to-date background information can be used. It also has the advantage that the developer or operator of the service can maintain/improve the service or capability on the server, without requiring the user to download or upgrade new system versions.
The disadvantage of an online design is the fact that it critically depends on network connectivity. As a user moves and travels to remote locations, however, connectivity can be intermittent and/or very expensive (roaming), and thus in many ways unavailable. For speech and speech translation systems this requirement is frequently unacceptable. Unlike text or email transmissions, voice cannot permit a temporary lapse of connectivity as it cannot permit a corresponding interruption of the speech stream without loosing information or real-time performance. An online design must therefore ensure continuous, real-time transmission and thus continuous connectivity during use.