Distributed data environments, such as Apache™ Hadoop®, enable distributed processing of large data sets across clusters of computers. Each cluster of computers is often associated with a tenant. Distributed data environments are often associated with multiple tenants. Each tenant typically accesses the same set of resources included in the distributed data environment.
Distributed data environments also include components. The components are shared by the multiple tenants. The components include Yarn™, MapReduce™, Cassandra™, Hive™, Spark™, ZooKeeper™, Flume™, Impala™, Kafka™, Sqoop™ and Sentry™.
A cluster of computers includes one or more computers. Each computer includes one or more applications. At times, an application operates in a resource-draining manner. The resource-draining manner is understood to mean that the application uses a relatively large amount of the distributed data environment's resources. A resource-draining application causes failures to occur within the distributed data environment.
It may be difficult to determine which applications are resource-draining and which applications are operating efficiently. Therefore, a system and method for determining the efficiency of an application would be desirable. It may be further desirable for the system and method to remediate the resource-draining application. Such a system and method may reduce the amount of failures that occur within the distributed data environment.