Based on service provision manners, a speech synthesis technology may include speech synthesis based on a cloud engine (briefly referred to as “online speech synthesis” below) and speech synthesis based on a local engine (briefly referred to as “offline speech synthesis” below). The two speech synthesis technologies have respective advantages and disadvantages. The online speech synthesis has advantages such as high naturalness, high real-time performance, and not occupying a client device resource, but its disadvantages are also obvious, that is, since an application (briefly referred to as App below) using the speech synthesis may send a long text to a server end at a time, but speech data synthesized by the server end is returned in segments to a client in which the App is installed, and the speech data is large in amount even if compressed (for example, 4 kb/s), if a network environment is not stable, the online speech synthesis becomes very slow and is not consecutive. However, the offline speech synthesis does not have network dependency, and can ensure stability of the synthesis service, but has a poorer synthesis effect than the online synthesis.
In conclusion, in the related art, products using the speech synthesis technology are all based on separate online speech synthesis or separate offline speech synthesis. The online speech synthesis consumes a large amount of data traffic, and when encountering a network error, can only prompt a user that the error occurs, and the offline speech synthesis does not have a natural effect. Therefore, user experience is poor.