Graphics processor units (GPUs) are emerging as powerful massively parallel systems. Also, the introduction of application programming interfaces (APIs) for general-purpose computations on GPUs (for example, compute unified device architecture (CUDA) from NVIDIA), makes GPUs an attractive choice for high-performance numerical and scientific computing. Sparse matrix-vector multiplication (SpMV) is a heavily used kernel in scientific computing. However, with indirect and irregular memory accesses resulting in more memory accesses per floating point operation, optimization of SpMV kernel is a significant challenge in any architecture under existing approaches. Existing approaches, for example, also do not take into account various architectural constraints for optimizing memory access patterns.