Model-free reinforcement learning (RL) techniques may be employed in certain types of systems to determine an optimal system control policy by actively exploring an environment. However, it may be challenging to apply conventional RL approaches to control policies usable for autonomous control of vehicles due to potentially negative consequences associated with extensive active exploration of all the actions available to the vehicle. In addition, conducting active exploration in a manner needed to help ensure vehicle safety may exact a high computational cost. The use of model-based RL techniques as an alternative may require an accurate system dynamics model of the environment in which the vehicle operates. However, the complex environment in which an autonomous vehicle operates may be very difficult to model accurately.