Field of the Invention
The present invention generally relates to the field of data management. Specifically, the present invention relates to a system for managing personal data in digital scenarios.
Description of the Related Art
Presently, data turn out to be an essential resource in a wide range of digital scenarios, comprising computer systems, communication networks, engineering processing systems, wireless networks, electronic commercial systems, social networks, as well as in the emergent “smart city” and “smart space” scenarios.
Through data mining and social mining techniques, data has become an invaluable source of information for portraying and describing communities and individuals (hereinafter, “users”) under multiple perspectives. Within these scenarios one of the most interesting class of data are represented by the so-called Personal Data (PD), i.e. data related to, and/or (e.g., dynamically) generated by single individuals, which describe their socio-demographic information, their actions and activities, their preferences, their behavior, their life-style and context, and so on.
PD may comprise static data (e.g. socio-demographic information) as well as dynamic data (e.g. position expressed via GPS) related to a specific user, both explicitly provided by the user himself/herself or implicitly collected through a user device and/or a sensor, or generated by the fruition of personal services. Dynamic data comprise in turn “real-time” data, i.e., data representing the current, instantaneous state of the user or of her/his user devices and/or sensors (e.g., the current GPS location thereof or the Web page the user is browsing), as well as “historic” data, i.e., an archive of past streams of real-time data, which describe the evolutions of the state of the user and the “transactions” in which such user has been involved.
Collections of PD can be organized and managed both as collections of homogeneous data of a same kind and/or of a same source and/or with the same informative content regarding different users (e.g., either as the whole data of the totality of customers/users or specific sub-groups or communities), and as collections of heterogeneous data of different types and/or from different sources and/or with different informative content concerning one (the same) specific user. In the latter case, the collection of heterogeneous data concerning a specific user is often referred to as “digital footprint” of the user. The digital footprint comprises the whole set of data trails left by the interactions of a user in any digital scenario, as well as any data that can be used to portray and describe such user.
Presently, the amount of PD available and generated on a daily basis is rapidly growing due to the increasing number of online and digital activities (e.g., with a digital identity) enabled by a widespread adoption of user devices (e.g., smartphones, tablets). Through the user devices, people is now able to access a large amount of online services and interact with a large amount of real-world services (e.g., payments, ticketing, check-in, searches). Moreover, the amount of PD available and generated on a daily basis is also growing due to the pervasiveness of sensors, either in the surrounding environment of users or integrated with user devices (such as, smartphones and tablets or wearable accessories, like wristbands), which enable collecting contextual information (such as locations, environmental conditions) or physiological information in a completely transparent way.
In view of the above, PD are nowadays mostly generated (or automatically sensed) during the interaction of users with their user devices. Thanks to the introduction of new user devices provided with advanced sensing features and increased computational powers, and thanks to the ever increasing number of mobile/online applications, the amount of generated PD will further increase in quantity, variety and quality (e.g., frequency, granularity and precision).
Manufacturers of user devices or operating systems host on their platforms applications (APPs) which can access/record PD by interacting with the physical resources of the user device (e.g., sensors), or by retrieving the PD already stored in the device itself (e.g., by accessing the memory or the file system thereof). Such APPs may also generate new (kinds of) PD by explicitly requesting them to users, implicitly collecting them from the users' behavior, device-interactions and activities using the APP itself or by combining and processing PD from different sources.
Manufacturers store the available APPs for their devices/operative-systems in APP publicly available marketplaces where developers can freely publish their APPs and where users can download (for free or paying a license) published APPs. Usually, the access of the APP to PD may be granted by users at the time of installation of the APP itself on the user device, or once-for-all by properly setting the user device. Moreover, granting the access to PD may also imply the request of a wide set of permissions on how these PD can be processed, stored and shared by the APP provider.
Typically, PD collected (accessed or generated) by APPs installed on a user device are diverse and highly dynamic, and may be exploited to describe the behavior of the user owning the same user device. For example, PD may comprise records of the activities carried out by the user of the user device, together with information exchanged during the activities, as well as records of the locations thereof.
On one hand, PD are of paramount value for companies providing services (both in the real and digital worlds). Indeed, such PD may be exploited to have a deep understanding of needs and behaviors of people and to create novel, more personalized offers and APPs. Moreover, PD may be sold for advertisement and/or statistics research purposes. Similarly, PD represent and invaluable opportunity and an irreplaceable resource also for public administrations in order to provide effective services to users and to improve (in terms of costs and efficiency) existing services to citizens.
On the other hand, unfortunately, the current scenario still doesn't fully allow an effective, efficient, controlled and righteous (with respect to users) exploitation of this opportunity. On the contrary, the management and the usage of such PD is raising new concerns about privacy and the need of new technological and regulatory solutions to give users more control over their data life-cycle. In fact, data are currently gathered and managed following a so-called “application-centric” paradigm, where PD of a single user (even with a unique user device) are collected independently by the different APPs installed on his/her user device(s).
Therefore, under a technical perspective, PD independently collected by different APPs are spread and fragmented, since the PD are collected and stored separately (locally or remotely) by different APP owners according to the access granted by the users and to the specific terms and conditions. Moreover, PD collected by different APPs are often redundant, since the same PD are accessed and replicated several times by different APPs, and unreliable, since PD is not stored in a single storage entity but instead multiple copies of the same information can be stored in different storage entities. Hence, PD can result to be inconsistent, out of date or noisy.
The actual way PD are collected by different APPs causes several disadvantages also under the user perspective. Indeed, PD are often not transparently collected by APPs. Even if the user of a user device has to grant the access to the source of the PD (e.g., the sensor or the file system), the majority of users have no control and knowledge on the quantity and quality of the collected PD (e.g., in which cases and with which frequency and accuracy PD are collected), on the nature and quantity of PD generated by the various APPs, and on the way PD are transmitted and replicated. Moreover, users are usually unaware on the exact informative content and potentials of the specific PD collected and processed by an APP. Users are often not enabled to have the whole picture and history of PD granted to the different APPs, and are not enabled to keep track of their usage. In particular, when replicated remotely by a service, PD can be mined, analyzed, processed or distributed in a way that is not transparent to the user, who lacks the control (e.g. to delete) her/his data.
In order to overcome the limitations of such application-centric paradigm, “user-centric” paradigms for personal data management have been recently proposed by several initiatives such as, e.g., the “Rethinking Personal Data Project” promoted by World Economic Forum (Worlds Economic Forum, Rethinking Personal DATA: Strengthening Trust, (January 2012), www.weforum.org/issues/rethinking-personal-data). These initiatives promote the introduction of higher levels of transparency and the possibility for users to have an effective control on the lifecycle of their PD (e.g., on their collection, storage, processing, sharing, exploitation, etc.). Moreover, several surveys showed that people empowered with a real control on their PD lifecycle are encouraged to share them more with other people, organizations and applications. In this perspective, the user-centric paradigm will enable a virtuous ecosystem of personal data management, where users are encouraged to unlock the potentials of their PD and where a larger availability and variety of PD will enable and encourage the growth of an unpredictable set of personal (tailored) and/or social applications and services, leveraging on a rich variety of user or aggregated information.
Known solutions embracing a user-centric approach for the PD management provide for the use of a so-called Personal Data Store (PDS). A PDS is defined as a secure digital space (owned and controlled by the user) acting as repository for PD, provided with a wide set of collection and management features. Several providers are starting to deliver to users PDS services for the management and exploitation of their PD; existing PDS platforms provide a set of functionalities enabling the data owners to have control on the entire lifecycle of their PD and capabilities to control the access to the data stored by specific services, for instance based on user-defined policies or rules.
Several known platforms are already implement PDS based services, such as “Danube” (http://projectdanube.org) and “Higgins” (http://eclipse.org/higgins/). Moreover, several companies are starting providing commercial PDS-like services, such as “Personal.com” (www.personal.com) and “Mydex” (mydex.org).
Other known PDS platform comprise the platform developed by IST TAS3 Project (http://www.tas3.eu), and the related open source software ZXID (www.zxid.org), and the OpenPDS developed by MIT (http://openpds.media.mit.edu/).