Technical Field
The present invention generally relates to garbage collecting in computing, and more particularly to reducing minor garbage collection overhead.
Description of the Related Art
Generational garbage collection (GC) is example of a GC policy. Generational GC is widely used in current modern memory management systems. In generational GC, objects are created in a “nursery” space. A garbage collector mostly performs “minor” collections, which garbage collect the nursery, copying live objects from one region to another there. If objects are copied a certain number of times in the nursery space (that is, if they survive long), they are promoted to the “tenure” space. In generational GC, efficiency is based on the generational hypothesis that most objects die young.
Apache Spark is an in-memory oriented cluster computing framework for performing big data analytics on a large distributed system. Spark's programming model is based on Resilient Distributed Datasets (RDDs). A RDD is logical distributed collection of data partitioned over multiple machines. RDD offers two types of operations, namely transformations and actions. Application programs are represented as a Directed Acyclic Graph (DAG) that, in turn, represents the data transformation chain of RDDs. A node in a DAG corresponds to a RDD (computation result) and an edge in a DAG corresponds to an operation (computation task). Since a RDD is immutable, RDDs which are reused many times should be long-lived in a heap, whereas intermediately used RDDs will be disposable, which means that the objects in these RDDs die young.
Minor collection is basically fast and the overhead is low. However, minor collections frequently occur while immutable RDDs are often generated and reclaimed in nursery space. As a consequence, minor collection overhead is dominant in the total GC pause time. Moreover, some RDDs are obviously used several times, but the Java Virtual Machine (JVM®) runtime does not know which objects are short-lived or long-lived, so that minor collection tries to perform a copying garbage collection operation even if the objects are long-lived. Thus, there is a need for reducing minor garbage collection overhead.