Field of the Invention
The disclosure relates to the field of machine learning, more particularly to model improvement using biases contained in data distributed across multiple devices.
Discussion of the State of the Art
In traditional machine learning, data is generally gathered and processed at a central location. The gathered data may then be used to train models. However, gatherable data, through means such as web scraping, or news aggregation, may be relatively narrow in scope compared to what is stored on devices, for instance a person's mobile device or crime data from a local police station. This data may seldom leave the devices it is stored on due to the possibility of it containing sensitive data, and also the bandwidth required to transfer the data may be extensive.
Generator models trained on data sets (whether anonymized or not) may be restricted for transfer (for example, due to GDPR). Existing tools such as SNORKEL™ can provide substantial mechanisms to accelerate generating realistic and labelled training data from small seeds. The same concept can support transferring valuable models without moving the restricted information itself. This also applies to other sharing of models, data sets, visualizations, pipelines, and other data, such as for multi-tenant deployments, inter-organizational sharing or hybrid cloud-edge use cases.
What is needed is a system for using transfer learning to accommodate data models trained in particular environments or applications that may not have identical feature space or distributions within target data or applications.