The present invention relates to improving performance in data analytics and data storage environments, and more particularly, this invention relates to semantic- and user-aware admission control for performance management in data analytics and data storage environments.
Big data is a term which is used to describe, generally, data sets that are so large and/or complex that conventional data storage retrieval and/or analytics processes are inadequate and/or inefficient at manipulating the data sets. Most types of access and/or analytics may be hindered by the size of the data sets, including, but not limited to, analyzing, capturing, searching, sharing, storing, transferring, copying, and privacy. The term may also refer to extracting value from large amounts of data. In order to extract this value, the data must be stored and accessible within a big data environment, such as a distributed storage system.
In conventional big data environments, a plurality of local and/or remote storage devices are used to store large amounts of data. Some exemplary big data environments include, but are not limited to, distributed data storage systems, server farms, data centers, cloud environments, Hadoop environments, etc. Many of these big data environments are also capable of analyzing large amounts of data, that may be spread out across many storage devices. These big data environments receive many access requests for access to the data that is stored thereto. In some instances, the big data environments may receive analytics requests for analytics to be performed on data stored thereto. A data analytics platform, such as Hadoop, Spark, etc., may be deployed in the big data environment to perform data analytics on the data stored therein. A data analytics application may be run on top of the platform.
There are two broad categories of approaches to performance management in big data environments, specifically distributed storage systems: (a) quality of service control and resource allocation conducted to jobs after they have entered the system, and (b) admission control on jobs before they enter the system. However, conventional admission control routines are limited in their functionality to ensuring that a requester is authorized to access data that is being requested, and ensuring that the request will not expose the data and/or the big data environment to a security breach.