Unit-selection text-to-speech (TTS) synthesis can be desirable for producing a more natural-sounding voice quality compared to other TTS methods. Conventionally, unit-selection TTS synthesis can include three stages: front-end text analysis, unit selection, and waveform synthesis. In the unit-selection stage, a unit-selection algorithm can be implemented to select a sequence of speech units (e.g., speech segments, phones, sub-phones, etc.) from a database of audio units. The speech units can be obtained by segmenting recordings of a voice talent's speech that represent the spoken form of a corpus of text. Implementing a sophisticated unit-selection algorithm can be desirable to select the most suitable speech units from the database. The most suitable audio units can have acoustic properties that best match the target pronunciation of the text to be converted to speech, which can enable the synthesis of high-quality, natural sounding speech.