The big data universe is growing aggressively with an estimated market of 50 billion dollars by next year. Big data platforms such as Hadoop and Spark are being widely adopted both by academia and industry. The end users have to trust the providers of big data platforms that host their data. Such trust is built on an underlying assumption that the platforms or their security methods will never be compromised; however, unexpected issues such as insider attacks or control-flow attacks due to programmer errors can happen in any system anytime.
Insider attacks (in an organization) typically deal with an employee stealing data using USB drives or by masquerading as another employee to gain access to unauthorized data. Such attacks can be hard to detect and almost impossible to prevent, but with the increase in popularity of concepts such as differential privacy in the big data universe, the biggest concern for these platforms is data loss or data theft; hence they need to be able to identify an attack on the data as soon as it happens.
The drawings illustrate only example embodiments and are therefore not to be considered limiting of the scope described herein, as other equally effective embodiments are within the scope and spirit of this disclosure. The elements and features shown in the drawings are not necessarily drawn to scale, emphasis instead being placed upon clearly illustrating the principles of the embodiments. Additionally, certain dimensions may be exaggerated to help visually convey certain principles. In the drawings, similar reference numerals between figures designate like or corresponding, but not necessarily the same, elements.