Modern graphics processing units (GPUs) include highly parallel architectures capable of executing hundreds and even thousands of threads in parallel substantially simultaneously. In such architectures, bypass circuits have been implemented in order to enable data scheduled for storage in the register file to be routed, during the same clock cycle, to the input of a data path to be reused. However, typically, such bypass mechanisms are implemented in logic located in the register file of an integrated circuit. When groups of related threads are executed within these processors, multiple copies of the logic perform essentially the same operation, thereby creating redundant logic that increases the complexity of the integrated circuit. Thus, there is a need for addressing this issue and/or other issues associated with the prior art.