Deep learning-based devices, such as autonomous vehicles, generally determine the next action to be taken by a short-term planning based on information from several frames to dozens of frames of recent images inputted.
For example, in the case of the autonomous vehicles, based on segmentation images or meta data like bounding boxes and left/right direction of detected objects, which are information obtained from each frame, actions of three dimensional real-valued vectors, i.e., (i) an amount of change in a steering angle, (ii) a pressure on a brake pedal, (iii) a pressure on an accelerator pedal, are outputted, and accordingly, the autonomous vehicles will autonomously drive themselves in response to the actions.
This deep learning-based device should be learned to determine appropriate actions according to the input state, and there are various learning methods, but currently, on-policy reinforcement learning is generally used.
Also, the deep learning-based device can be learned in a real world, but it is difficult to acquire diversified training data, and it is time-consuming and costly.
Therefore, a method for learning the deep learning-based devices in a virtual world has been proposed.
However, when learning in the virtual world, there is a problem in a reliability of the learning results due to a gap between virtual environment and real environment.
For example, in case of the deep learning-based device is an autonomous vehicle, the physics engine of a virtual world simulator may be configured to output “a next state with a changed velocity, a location, and surroundings of the vehicle” if “actions of rotating a steering wheel by a certain angle and stepping on the brake pedal with a certain pressure” are taken in response to “a state of a current velocity and its surroundings”, thereby learning the autonomous vehicle.
However, the larger a discrepancy between the next state created by the physics engine of the virtual world simulator and the next state of a real world is, the less optimal the optimal action having been learned in the virtual world is in the real world.
For example, in the case of the autonomous vehicle, the vehicle has learned a proper action that can avoid an accident in a dangerous situation in the virtual world, but nevertheless an accident in the real world occurs even though the vehicle takes the proper action in a same situation.