In-loop filtering in HEVC is more sophisticated with introduction of Sample adaptive offset (SAO) filter in addition to deblocking filter in comparison to H.264. In this paper, very high performance as well as area efficient VLSI architecture is proposed for HEVC SAO Encoder, which supports 4 K at 60 frames per second (fps) for next generation Ultra HDTV at 200 MHz clock. The design can process Largest Coding Unit (LCU) of size 64×64 in less than 1600 cycles for all scenarios. The proposed solution contains VLSI level optimization with 2D block based processing with 3 pipe stage for Statistics generation, single LCU stage SAO operation for encoder along with decode, multiple engines for RD offset and SAO type calculation and unified engine for luma as well as chroma to reach desired area and performance goals. The design also provides list of SW configurable overrides and statistics from hardware to further tune video quality for a given product in the field. The final design in 28 nm CMOS process is expected to take around 0.15 mm2 after actual place and route. The proposed design is capable of handling 4 K at 60 fps as well as fully compliant to HEVC video standard specification with bit-rate saving of 4 to 7% bit-rate saving based on configuration of encoder.