Speech recognition model development processes involve performing a large number (e.g., 100) of processing steps. Each step is implemented by a processor that creates one or more outputs based on the consumption of inputs, which are the outputs of other processor steps. A problem is to decide which step is to be performed next. The process can be divided into several phases such as planning in which human experts define high-level tasks of the model development, modeling in which software programs perform data processing and build the models and model tuning in which a number of modeling parameters are tested in order to optimize speech recognition performance.
A current planning phase requires the manual specification of the “predecessors” for each task. The author has to manually order the tasks to ensure that all inputs are prepared before executing a task. When there are a large number of tasks, creation and management of the plan then becomes unreliable.
The modeling phase currently is implemented as rigid (e.g., hard-coded) procedural software. The work-flow is controlled by predesigned switches that determine the sequence of model building actions (e.g., configuration files).
The tuning phase currently is also implemented as rigid procedural software. In this phase, based on the intermediate data files or parameters that are tuned, some processes (and only those processes) must be activated. Since the work-flow is controlled by predesigned switches, it becomes difficult to satisfy the requirement by changing the code.
This approach has several limitations. By design the developer has control over which step to perform or not to perform by turning on/off switches in a control file. Consequently, the subsequent steps are not codified in the process and there is no enforcement to complete all the steps. Moreover, to add new functionality or update to the tool, the whole training process has to be tested against regression. This is expensive and a principal reason why deploying new technology can be costly. Additionally, if the modeling process is interrupted, there is no mechanism to guarantee that the restart will perform only the needed steps without duplicated or missing steps. The tuning stage is more complex because once an input file or a configuration is changed, it is difficult to determine which components are affected and should be rebuilt. Finally, there is no automatic way to know how many subcomponents will be affected if one input is missing.