The mobile era brings with it exciting possibilities to provide applications that are customized to meet the needs and desires of specific users. Notable examples include location-based services, contextual recommendation and advertising systems, and social media features. Along with these opportunities, however, various threats to a user's integrity and privacy may arise. Mobile applications frequently demand access to private information. This information may include a phone number that identifies a specific user, an International Mobile Station Equipment Identity (IMEI) number that identifies a specific physical mobile device, social networking data, contact lists, a current geographic location for the mobile device, audio data gathered by a microphone on the mobile device, and visual information gathered by a camera on the mobile device.
While private information is often used to implement one or more core functionalities of a mobile application, this information can also be used to serve other purposes, such as advertising, analytics, and cross-application profiling. At the same time, a typical mobile phone user is unable to distinguish legitimate, intended usage of their private information from illegitimate and unintended scenarios. An illustrative example of an illegitimate and unintended scenario would be an application causing the mobile device to transmit its IMEI number to a remote advertising website to create a persistent profile of the user.
Existing platforms provide limited protection against privacy threats. Both the Android™ and the iOS™ platforms mediate access to private information via a permission model. Each of a plurality of respective permissions is mapped to a corresponding designated resource, and each permission holds for all application behaviors and attempted resource accesses. In Android™, permissions are given or denied at installation time. In iOS™, permissions are granted or revoked based upon first access to the corresponding designated resource. Hence, neither of these platforms are able to disambiguate legitimate from illegitimate or unintended usage of a resource once an application is granted a corresponding permission.
The existing shortcomings of mobile platforms in ensuring user privacy have led to a surge of research in connection with real-time privacy monitoring. One foundational technique in this research is information flow tracking, which may be provided in the form of taint analysis. Private data, obtained via privacy sources, is labeled with a taint tag denoting its source. The tag is then propagated along dataflow paths within the code of the application. Any such path that ends in a data release point or privacy sink leads to triggering of a leakage alarm. For example, consider a first statement such as TelephonyManager.getSubscriberId( ), which reads an International Mobile Subscriber identity (IMSI) number of the mobile device. A second statement, WebView.loadUrl( . . . ), which sends out an HTTP request, would trigger the leakage alarm.
The taint analysis approach performs information leakage alarm judgments using Boolean reachability queries. Use of these queries can lead to false alarm reporting. Consider the flowchart of FIG. 1 which sets forth an illustrative code fragment from an internal Android™ library. The operational sequence commences at block 101 where an IMSI number of a mobile device is read, for example, using an instruction “String mImsi= . . . ”. Next, at block 103, a test is performed to ascertain whether or not the IMSI number is valid based upon a number or quantity of digits that the IMSI number includes. A valid IMSI number should be greater than or equal to six digits, but less than or equal to fifteen digits. This test may be performed using an instruction “if (mImsi=null && (MImsi.length( )<6 jj mImsi.length( )>15)).” When the number is invalid, the operational sequence progresses to block 105 where the IMSI number read at block 101 is written to an error log, for example, using an instruction “{log e(“invalid IMSI”+mImsi); mImsi=null;}. The affirmative branch from block 103 leads to block 107 where the first six digits of the IMSI number read at block 101 are written to a standard log while masking away a nine-digit suffix of the IMSI number as “x” characters. Block 107 may be performed using an instruction “log (“IMSI: “+mImsi.substring(0,6)+”xxxxxxxxx”). Thus, the step of block 107 may be regarded as a data sink step. However, data flow into the standard log is not a privacy problem because the first six digits of the IMSI number merely convey model and origin information. But existing taint analysis procedures are unable to exercise the necessary discrimination to determine whether or not the step of block 107 constitutes a security risk.
Quantitative extensions of the taint analysis procedure have been proposed to address the foregoing limitation. One example is a quantitative information-flow tracking system developed by McCamant and Ernst which quantifies a flow of secret or private information by dynamically tracking taint labels at the bit level. See, for example, “Quantitative Information-Flow Tracking for C and Related Languages” by Stephen McCamant and Michael D. Ernst, MIT Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2006-076, Cambridge, Mass., Nov. 17, 2006. Other approaches are based upon distinguishing between secrets, determining a rate of data transmission, or assessing influences of input values on output values. However, these approaches are tailored for offline analysis and are not adaptable to meet the performance requirements of real-time monitoring solutions due to the high complexity of their underlying algorithms. For example, the flow-tracking system of McCamant and Ernst needs to analyze a workload for over an hour before a report on the workload can be generated.
Yet another approach is to cast privacy judgments into a Bayesian reasoning framework. Bayesian reasoning is based upon statistical methods that assign probabilities or distributions to events (as rain tomorrow) or parameters (as a population mean) based on experience or best guesses before experimentation and data collection. These probabilities and distributions are then revised after obtaining experimental data. Pursuant to this approach, data leakage is formulated as a classification problem. This formulation generalizes the source/sink reachability judgment enforced by standard information flow analysis, permitting richer and more relaxed judgments in the form of statistical classification. One may observe that reasoning about information release is fuzzy in nature. While there are clear examples of legitimate versus illegitimate information release, there are also a number of less obvious cases. Consider, for example, a variation on the IMSI number used in FIG. 1 with a ten-digit rather than a six-digit prefix. A statistical approach, accounting for multiple factors and based on rich data sets, may be better equipped to deal with such subtleties.
Even though statistical approaches provide some advantageous features, these approaches still lack the ability to customize or specialize reports in accordance with the specific needs of users or groups of users. Different users may have different preferences with regard to privacy. As an example, some users may prefer to disclose their exact addresses and profile information in exchange for high-quality, highly relevant contextual ad content, as perhaps they are fond of shopping. However, other users may prefer to sacrifice ad quality and relevance in return for more privacy. There is no general recipe for enforcing privacy, and so accounting for fuzziness statistically is necessary yet insufficient. Thus, there exists a need to overcome at least one of the preceding deficiencies and limitations of the related art.