Smartphones have become ubiquitous, and as mobile users continue to rely on apps for personalized services or business mobility, app are increasingly entrusted with more and more private and sensitive information. Meanwhile, a large number of apps without functional dependency on user data also use (monetize on) user privacy to varying degrees, ranging from typically benign cases like targeted advertising to ill-intended ones like identity thefts. As a result, mobile users on one hand are largely in favor of personalized services, but on the other hand, become more and more concerned about apps abusing their data. This issue is worsened by the current lack of tools or methods that can inform users of potentially harmful privacy leaks in their apps without distracting or confusing users with apps' legitimate privacy disclosures.
Mainstream smartphone OSs, such as Android and iOS, provide basic protection on user sensitive information, such as the permission system, which enforces coarse-grained and static access control on sensitive resources as per users' explicit consent. However, success of such systems largely relies on users' common awareness of, and sometimes deep understanding on, the privacy impacts of apps' advertised features, which often turns out to be unrealistic to assume in practice. Moreover, these systems offer little clues and help to users when soliciting their consent.
Serious concerns have been raised about stealthy disclosures of private user data in smartphone apps, and recent research efforts in mobile security have studied various types of detection of privacy disclosures. Existing approaches are not effective in informing users and security analysts about potential privacy leakage threats. This is because these methods largely fail to: 1) provide highly accurate and inclusive detection of privacy disclosures; 2) filter out the legitimate privacy disclosures that usually dominate the detection results and in turn obscure the true threats. Most existing works only focus on privacy discharge detection. Those approaches can tell you App X has sent your privacy sensitive data Z to location Y. However, many apps have to use user privacy data for their important features. For example, you have to send your GPS location to Google Maps.
The growth of smartphone application (i.e., app) markets have been truly astonishing, as reflected in the ever increasing user population and the constantly enriching offering of apps that virtually satisfy all forms of digital needs of the users. As apps are used for more and more private and privileged tasks by the users, concerns are also rising about the consequences of failure to protect or respect user's privacy (i.e., transferring it to remote entities or publicizing it). As a result, many approaches have been proposed to automatically uncover privacy disclosures in Android Apps, falling into two major categories: static control and data flow analysis, and dynamic data flow tracking.
Although previous works successfully revealed the pervasiveness of privacy disclosures in apps and made significant progress towards the automatic detection of privacy disclosures, two major shortcomings remain to be addressed: (1) relatively low coverage of data-flows; (2) incapability of judging the legitimacy of detected flows. The first shortcoming prevents current data-flow analysis from identifying complex data-flows, such as conditional or joint flows, which are frequently seen in Android apps. Conditional flows are unique to Android apps and caused by generic system APIs that can access a variety of data sources and sinks, including sensitive ones (e.g., contacts and sensitive content providers), which are determined solely by the runtime parameters (e.g., URIs and flags). Joint flows consist of two or more sub-flows, implicitly connected outside of app code (e.g., inside database, file system, or OS), which may form a channel at runtime and disclose private data. The second shortcoming often results in inflated detection results containing too many false alerts (e.g., benign or functional privacy disclosures). These alerts usually do not represent violations of user privacy, and therefore, distract or even overwhelm human users and analysts when interpreting detection results. Our study shows that more than 67% app privacy disclosures found using conventional methods are in fact legitimate (i.e., necessary to apps' core functionalities). For example, navigation apps need to report user's current location to remote servers for up-to-date maps and real-time road conditions. From now on, we use the term privacy disclosure to generally describe apps' actions that propagate private data to external entities. We reserve the term privacy leak only for describing a privacy disclosure that cannot be intuitively justified by the app's intended functionaries.