Automatic speech recognition (ASR) systems, text-to-speech (TTS) systems, or both, may use word pronunciation data to determine an utterance of a word encoded in an audio signal or to generate an audio signal encoding a synthesized utterance of the word, respectively. Some ASR and TTS systems may use a manually curated pronunciation dictionary. The entries in the dictionary may include phoneme sequences, e.g. “foo”→/f u/(in X-SAMPA notation).