The present invention relates to information integration, and more specifically, to compiling specifications into efficient run-time queries and optimization steps that improve the run-time performance of entity population by exploiting parallel group-by capabilities in MapReduce systems.