Field of the Invention
The invention described herein generally relates to the use of learning machines to identify relevant patterns from disparate datasets, and in particular, to a method and system for aggregating models generated from data that are disparate, large, segregated, isolated, or do not share a fully overlapping feature set from a plurality of disparate data sources.
Description of the Related Art
Current financial product and insurance underwriting and risk assessment model development is limited to an individual institution's underwriting and risk assessment standards. For example, insurance companies each have their own criteria for modeling risk. In limited areas, cross industry cooperation exist where parties of an industry are able to share and query the shared data. Specific examples include consumer credit reporting agencies and insurance outcome reporting groups. However, a lack of cooperation on sharing data across company servers and firewalls inhibits the ability to model risk based on different features and criteria.
Some of the lack of cooperation is due to proprietary data and security concerns. In addition, modeling of data between common industry members can be difficult due to non-overlapping feature sets that occur due to each party having unique underwriting requirements. Not all data and outcomes maintained by all parties are stored in a common format, including fraud or distress data stemming from public information (e.g., news articles about plant closings or social media posts about criminal activities). The size of the data also provides a computational challenge to efficiently model, although models based on more data can be more accurate. Current monolithic modeling procedures produce one-dimensional predictions and do not account for additional predictive power that may be provided from other institutions.
There is thus a need for a computerized system to create models over a diverse group of data incompatible to be aggregated or commingled, protect the data with a computer security infrastructure, and transmit the models to a prediction server without transmission of the actual protected data while maintaining anonymity and data confidentiality.