In various types of computer systems, there may be a need to collect, maintain, and utilize confidential data. In some instances, users may be reluctant to share this confidential information over privacy concerns. These concerns extend not only to pure security concerns, such as concerns over whether third parties such as hackers may gain access to the confidential data, but also to how the computer system itself may utilize the confidential data. With certain types of data, users providing the data may be somewhat comfortable with uses of the data that maintain anonymity, such as the confidential data merely being used to provide broad statistical analysis to other users.
One example of such confidential data is salary/compensation information. It may be desirable for a service such as a social networking service to entice its members to provide information about their salary or other work-related compensation in order to provide members with insights as to various metrics regarding salary/compensation, such as an average salary for a particular job type in a particular city. There are technical challenges encountered, however, in ensuring that such confidential information remains confidential and is only used for specific purposes, and it can be difficult to convince members to provide such confidential information due to their concerns that these technical challenges may not be met. Additionally, it can be difficult to ensure accuracy and reliability of the confidential data.
In other examples, public data is available for integration into the online social networking service to accentuate information about the members. However, it is not readily known which entities represented in the public data are members of the online social networking service or which entities in the public data map to members of the online social networking service. Furthermore, on occasion, members provide inaccurate information for a variety of reasons. Determining an amount of inaccurate information that the mapped dataset can tolerate before being unacceptably skewed is challenging.