Collection of high quality voice data from many different individuals may be desirable for a variety of applications. In one example, it may be desired to create text-to-speech (TTS) voices for a person, such as a person who has only limited speaking ability or has lost the ability to speak. For such people, it may be desirable to have a voice that sounds like him or her and/or matches his or her qualities, such as gender, age, and regional accents. By collecting voice data from a large number of individuals, it may be easier to create TTS voices that sound like the person.
The people from whom voice data is collected may be referred to as voice donors and a person who is receiving a TTS voice may be referred to as a voice recipient. A collection of voice data from many different voice donors may be referred to as a voice bank. When collecting voice data for a voice bank, it may be desirable to collect voice data from a wide variety of voice donors (e.g., age, gender, and location), to collect a sufficient amount of data to adequately represent all the sounds in speech (e.g., phonemes), and to ensure the collection of high quality data.