A graphics processing unit (GPU) may be nominally configured with a certain amount of local or dedicated memory, (hereinafter referred to as local), to service operations performed on the GPU. For example, the local memory may be dynamic random access memory. The GPU, which is a byte addressable device, may also have access to non-volatile memory (NVM), which is a type of block addressable memory. In the event that the GPU or certain applications require a transfer of data between the NVM and the local memory, an operating system (OS), display driver, device driver or similar hardware/software entity running on a host computing system typically controls or manages the data transfer process. This data transfer process entails a two hop process; first from the NVM to system memory, and then from the system memory to the local memory. In particular, the NVM data must be first transferred into the system memory via a NVM controller's block input/output (I/O) file transfer mechanism. The GPU can then access the data from the system memory. This involves at least using the system memory and results in increased traffic and congestion.