Conventional graphics processors are exemplified by systems and methods developed to perform stencil testing following fragment shading. Shaded fragments that fail a stencil test specified by a stencil function are rejected and are not written to a frame buffer. Shading fragments which are not written to the frame buffer is inefficient, because the throughput of a conventional graphics processor may be reduced. Furthermore, memory bandwidth utilization is increased to read texture data, depth, or stencil values to process fragments which are rejected during stencil testing. In conventional graphics processors rendering performance may be limited due to memory bandwidth. In those systems, rendering performance may be improved by reducing the number of memory accesses needed to process fragments which will be rejected during stencil testing. There is thus a need for performing an early stencil test to reject fragments prior to shading.