With the technological advancements of server, network, and storage technologies, hardware-based network functions are being transitioned to software-based network functions on standard high-volume servers. To meet the performance requirements, software-based network functions typically require more central processor unit (CPU) cycles, as compared to their hardware-based counterparts. Alternatively, general purpose graphics processor units (GPUs), or GPGPUs, may be used for network packet processing workloads. The GPGPU performance of a single network packet processing application (e.g., a deep packet inspection (DPI), a firewall, encryption/decryption, layer-3 forwarding, etc.), having exclusive access to a GPGPU is relatively predictable. However, a level of performance can become more difficult to predict as additional network packet processing applications utilize the GPGPU as an offloading engine or an accelerator. For example, a GPGPU-accelerated application may not be aware of and/or may not be able to communicate with another GPGPU-accelerated application, which can result in inefficient and/or uncoordinated usage of the GPGPU. More specifically, if the first GPGPU-accelerated application is fully utilizing resources of the GPGPU, offloading the second GPGPU-accelerated application may result in performance degradation due to resource contention, etc.