This disclosure is directed to privacy violation detection of a mobile application program.
Applications for mobile devices frequently demand access to private information. This includes unique device and user identifiers, such as the phone number or IMEI number (identifying the physical device); social and contacts data; the user's location; audio (microphone) and video (camera) data; etc.
While private information often serves the core functionality of a mobile application, it may also serve other purposes, such as advertising, analytics or cross-application profiling. From the outside, the user is typically unable to distinguish legitimate usage of their private information from illegitimate scenarios, such as sending of the IMEI number to a remote advertising website to create a persistent profile of the user.
Existing platforms provide limited protection against privacy threats. Both the Android and the iOS platforms mediate access to private information via a permission model. Each permission is mapped to a designated resource and the platform holds the permission for all application behaviors and resource accesses.
In Android®, permissions are given or denied at installation time. In iOS, permissions are granted or revoked upon first access to the respective resource. Hence, both platforms cannot disambiguate legitimate from illegitimate usage of a resource once an application is granted the corresponding permission.
The shortcomings of mobile platforms in ensuring user privacy have led to a surge of research on real time privacy monitoring. Two main approaches have been proposed, which are both brittle.
One technique in this research is information-flow tracking, often in the form of taint analysis. Private data, such as data obtained via privacy sources (e.g. TelephonyManager.getSubscriberId( ) which reads the device's IMSI) is labeled with a taint tag denoting its source. The tag is then propagated along data-flow paths within the code. Any such path that ends up in a release point, or privacy sink (e.g. Web View.loadUrl( . . . ), which sends out an HTTP request) triggers a leakage alarm. The tainting approach effectively reduces leakage judgments to boolean reachability queries. This approach is challenged by covert channels and implicit flows, as well as if the data is transformed in custom ways. These challenges are not merely theoretical, but occur in practice (e.g. in malware and in ad libraries).
Another technique that has recently been proposed is a data-centric analysis in which only privacy sources and sinks are monitored without tracking intermediate flow steps between the source and sink statements. Privacy enforcement is based on a comparison between the values arising at the source and sink points. This enables elimination of false positives if only a small amount of data from the source ends up reaching the sink. Also, some of the instrumentation overhead is obviated, though there is still the need to instrument source and sink APIs. A limitation of this approach is that certain values arising both at the source and at the sink are in fact benign, leading to false alarms. Yet another limitation is that this approach cannot handle custom data transformations, but only standard encryption/encoding/hashing schemes (e.g. SHA-1 or Base64 encoding).
Both of the above existing approaches are limited due to their focus on dataflow and/or data transformations. These are properties that are difficult to reason about directly, rendering both approaches brittle.