Training people to stable levels of high performance in specialized skills requires a great deal of investment in both time and capital. This is particularly true in highly complex domains like military operations, where a warfighter may receive years of training before achieving mission ready status. Given the length, complexity, and cost of warfighter training, it is ironic and disappointing that warfighter readiness status is often defined by simple number of training hours completed or by subjective ratings of performance using checklists, rather than through objective means. Thus, it may be the case that two individual warfighters appear identical on paper in terms of training history, but operationally, quantitatively and qualitatively function at very different levels of effectiveness.
Intelligent tutoring systems are intended to optimize learning by adapting training experiences on the basis of proficiency. These systems continuously estimate the trainee's current knowledge and skill levels based on performance history and build what has been termed a representation of the student. They dynamically update estimates of the knowledge state in the student model as the learner accumulates more experience and expertise, and then adapt training to improve the efficiency and effectiveness of learning opportunities.
Among the demonstrably successful intelligent tutoring systems is the COGNITIVE TUTORS® system that originated at Carnegie Mellon as test beds for the ACT® theory of skill acquisition. Their implementation was inspired by ACT-problem solving, with skills decomposed into production rules. The tutors proved so effective that a successful spinoff company, Carnegie Learning, eventually formed to mature and distribute the technology to school districts around the country. The tutors are now being used by more than 800 schools.
The student modeling capability in the COGNITIVE TUTORS® is a Bayesian estimate of the probability of having mastered each of the knowledge units (production rules) that are targets of current instruction. Their Bayesian equation is used in a process called “knowledge tracing” to keep this mastery estimate current and provide a basis on which to determine the course of instruction. This approach has been quite successful in classroom applications.
Notwithstanding the documented utility of the knowledge tracing approach, it does have a critical limitation, as does every other contemporary student modeling approach. The limitation is that intelligent tutors have no underlying mechanism for memory decay represented in the model. Thus, even over significant periods of non-practice, when some forgetting would inevitably occur, the student model assumes that the learner's knowledge state remains stable across periods of non-use, leaving all prior learning completely intact. This limits the utility of traditional student modeling approaches entirely to estimates of current readiness/proficiency/mastery. They have no capacity to predict what future readiness will be at specific points in time.
Furthermore, traditional student modeling approaches are unable to make predictions regarding knowledge and skill changes under various future training schedules or to prescribe how much training will be required to achieve specific levels of readiness at a specific future time. They function only on the learner's last computed knowledge state, and provide training for only the current benchmark task needed to be learned.
One of the more consistent findings from past research in human memory is that performance is generally enhanced when learning repetitions are spaced farther temporally. This phenomenon, often termed the spacing effect, is extremely robust and has been observed not only in artificial laboratory settings, but in real-life training situations as well. Due to its ubiquity, it may be inferred that basic principles of learning and retrieval are involved.
This phenomenon is not captured by most existing models of human memory, which generally assume that memory traces additively strengthen with each learning opportunity and continually decay with passage of time. These models reveal contrary prediction to empirical human data as a result, showing improved performance under massed compared to distributed conditions.
As a common practice in the field of cognitive modeling, most modelers judge the explanatory power and descriptive adequacy of their models on the basis of goodness-of-fit measures comparing model predictions to human empirical data in each highly specialized task environment for which those models had been developed. It is far less typical to assess the generalizability or predictive power of a single model across multiples sets of data, tasks, or domains. It is also atypical for modelers to investigate substantive variations in the implementation of a single model, where multiple mechanisms could potentially achieve equivalent values in goodness-of-fit. Thus, the common practice of basing model performance on the goodness-of-fit criterion alone may lead a modeler to erroneously conclude that true underlying process regularities have been captured, which could in turn lead to faulty theoretical claims.
To minimize this probability and to effectively evolve cognitive theory, the modeling community must conduct more thorough investigations of model instantiations, whereby selection should be based on formal comparison criteria. The most widely used means of model comparison is quantitative in nature, and is referred to as goodness-of-fit or descriptive adequacy. Assessment in this criterion includes optimizing model parameters to first find the best fit, and then choosing the model that accounts for the most variance in the data (typically calculated as root mean square deviation {RMSD} or sample correlation {R2}). This practice is a critical component of model selection, but simply selecting a model that achieves the best fit to a particular set of data is critically insufficient for determining which model truly captures underlying processes in the human system. In fact, basing model selection on this criterion alone will always result in the most complex model being chosen, meaning that over fitting the data and generalizing poorly could be very real problems, and interpreting how implementation ties to underlying processes may be all but impossible.
The inclusion of additional qualitative model selection criteria (i.e. weighing the necessity of added parameters) helps overcome these pitfalls and improves the chances of selecting models that offer more insight into how human memory functions. Because complex models are more likely to have the ability to capture a particular set of data well, including the possibility of capturing noise, it is necessary to embody the principle of Occam's Razor in model selection tools by balancing parsimony with goodness-of-fit. This translates into accounting for both the number of parameters included in a model, and the model's functional form, defined as the interplay between the model factors and their effect on model fit.
Contemporary methods of attaining capabilities such as effectively tracking a trainee's unique learning dynamics, user-specified future training regimens and performance predictions related to those regimens, visual and graphical examinations and comparisons, extrapolation to generate precise, quantitative predictions of performance for each specified future training time, etc., is to handcraft each mathematical model for every data set (e.g., each learner needed to be modeled separately for each variable of interest) to be examined using contemporary tools such as a spreadsheet. As such, all training history and training specification needed to be hand-entered, meaning that the precise timing (in seconds) associated with training event length, time between training events, and overall time in training needed to be calculated to equip the mathematical model with the necessary information to track, predict, and prescribe human performance. This is an error-prone, time consuming process, requiring a high degree of knowledge and skill to do correctly.
The spreadsheet's solver may be utilized to identify optimal model parameters to best fit data from a learner's history (using maximum likelihood estimation). These values are then integrated into the model to generate and extrapolate predictions beyond the learner's history for specified future dates in time. Use of the spreadsheet's graphing capability may then be employed to visualize the trainee's historical performance and model predictions. Statistics (including correlations and root mean squared deviations) may also be calculated within the spreadsheet to examine the model's goodness-of-fit to the human data, as well as its predictive validity and cognitive plausibility.
If the modeler wished to examine multiple future training regimens, each of those regimens would have had to be entered by hand into separate spreadsheets, requiring the modeler to correctly enter the number of seconds associated with the training length, time between training events, and the total amount of time in training as described above; and the modeler would also have to ensure that the model implementation for the new predictions integrated the correct, optimized parameters, calculated using the solver function. Then additional graphs would need to be produced either separately, or integrated across spreadsheets, so that performance effectiveness across regimens could be adequately compared. This proves to be a very laborious, slow, inefficient way to examine the learning and retention tradespace, it is very easy to make an error in handcrafting these spreadsheets correctly, and it requires modeling expertise to ensure that the model implementation and optimized parameters are correctly set for each predicted point.
Further, if more optimal training schedules wished to be generated in the spreadsheet, the process in finding the ideal timing for each event could only be achieved through trial-and-error, incrementing or decrementing the amount of time that passes, and checking the model prediction to seek how it compares to the desired threshold performance level. Thus, the modeler would have needed to input specific future times (in seconds) one at a time, to find the tipping point where performance effectiveness no longer met the desired goal. Once that point was found, the same procedure would need to be repeated to identify when the next training event should occur, and on and on.
What is needed in the art, therefore, is an automated, cognitively-principled tool that can underpin decisions for “just in time” training and would assist in optimizing performance obtained and resources expended. Additionally, a tool that assists managers in overall resource allocation and in optimizing training programs such that individuals are provided adequate training opportunities to achieve needed performance without waste would be of substantial benefit.