Off-chip tessellation stores tessellation related data in remote, (e.g., off-chip), memory. It allows cross compute unit execution and enables tessellation redistribution for achieving better load balancing. That is, it allows data to be stored to a location for more than one compute unit or shader engine to access it and perform processing on it. It has been supported for multiple generations of graphics processing units (GPUs). However, there are inefficiencies and latencies associated with redistributing tessellation for load balancing.