In some instances, it may be useful to sequence-train a speech model to recognize a spoken phrase that includes words uttered in a sequence. Sequence training requires computationally intensive processing compared to frame-level training. Asynchronous optimization may allow sequence training of multiple speech models to update model parameters in parallel and enable scalable sequence training.