Many cognitive systems are driven by machine learning models. Such machine learning models can include a variety of algorithms (e.g., supervised learning, unsupervised learning, reinforcement learning, knowledge-based learning, natural-language-based learning such as natural language generation and natural language processing, deep learning, etc.) and can access execution engines comprising software packages that enable implementation of the algorithm. These machine learning models are trained using data. This training data is used to modify and fine-tune the weights associated with the machine learning models, as well as record ground truth for where correct answers can be found within the data. As such, the better the training data is, the more accurate and effective the machine learning model will be.
However, there are many challenges associated with the management of training data. For example, the quality of the data may be inconsistent, the data may be stale (i.e., old), the data may no longer be relevant, or the data may no longer be accessible because of ownership restrictions. These same challenges can apply to the machine learning model itself, e.g., ownership, versioning, and/or freshness.
Further, current data management protocols continue to be tightly controlled by human moderators, particularly in risk-averse and tightly-regulated industries like audit. These human-based data management policies are appropriate given older, less data-intensive processes, but do not scale well as organizations undergo digital transformation. In particular, as the processes become more data-intensive, processing overload becomes a greater risk, leading to a higher chance of failure due to human error, resulting in diminished accuracy. Further, if certain training data is removed because of the ownership, versioning, and/or freshness challenges stated above, the current human-based data management systems are unequipped to either (i) determine when such data is actually removed or (ii) update the machine learning model based on the removal.
It would be desirable, therefore, to have systems and methods that could overcome these and other deficiencies of known systems.