Speech recognition and voice-assistant systems are typically configured to receive audible input from one or more users, perform speech recognition operations on the received input to identify one or more spoken words, and perform one or more operations based on the identified words. For example, a voice-assistant system may receive audible input from a user, perform speech recognition on the received input to determine that the user has asked a question, and perform one or more operations to provide the user with an answer (e.g., visual or audible answer) to the question. In some cases, if the user has spoken a command, the system may perform the requested command or send the command to another system for handling. Such systems are typically implemented on dedicated devices or on general-purpose computing devices such as smartphones, tablet computers, or personal computers.
These systems typically use acoustic models during the speech recognition process. Training applications are often configured to train these acoustic models prior to their use for real-time speech recognition. These training applications often train the acoustic model using various different environmental conditions (e.g., background noise, room size, room shape), they often use a room simulator application to generate simulated, audible sounds. The room simulation application is typically a software system that is capable of generate such simulate audio under various environmental conditions.