Model-free reinforcement learning (RL) techniques may be employed in certain types of systems to determine an optimal system control policy by actively exploring an environment. However, it may be challenging to apply conventional RL approaches to control policies usable for autonomous control of vehicles due to potentially negative consequences associated with extensive active exploration of all the actions available to the vehicle. In addition, conducting active exploration in a manner needed to help ensure vehicle safety may exact a high computational cost. As an alternative, model-based RL techniques may be employed to determine an optimum control policy without active exploration by utilizing an accurate dynamics model of the environment in which the vehicle operates. However, the complex environment in which an autonomous vehicle operates may be very difficult to model accurately.