Transfer learning: what I learned
Definition: Trying to improve the performances of an agent on a target set of which limited data is available by first training on a source set with more data available. Less target data is therefore needed.
Domain adaptation: Subset of transfer learning, enough source domain data is available and the same task is performed with source data but we have very little target data.
Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey
Different methods of transfer learning:
-Zero shot: the simulation accurately depicts reality
-Domain randomization: the goal is to highly randomize the source domain data in order to hopefully cover real data distribution despite the bias of the model and the real world.
For robotics application we distinguish perception randomization where we randomize sensor data and dynamics randomization where the dynamical model of the robot itself is randomized. The scenarios can also be randomized to cover real world scenarios.
-Domain adaptation: unifying source and target feature spaces. Not relevant here. It implies learning using in part real data which is excluded here.
-Other methods specific to reinforcement learning.
Real-world robotics perception and control using synthetic data
Training (in our case evaluating) the agent on various dynamical models can make the controller robust to modeling errors and improve transfer.
Domain randomization is efficient for object detection and grasping.
Enhanced Transfer Learning for Autonomous Driving with systematic Accident Simulation
Real data available for autonomous driving. The goal of this paper is to show that doing first level training on simulated data with a high proportion of edge case and then retraining it on real world data in beneficial to the agent in real world situations to handle anomalous driving scenarios.
Traffic scenario generation:
Enhanced Transfer Learning for Autonomous Driving with systematic Accident Simulation (2020)
Methodology:
-use of video game engine for visual and physics simulation (Unity)
-instead of simulating an all safe driving scenario as close as possible to reality (domain adaption) the goal is to cover as much of the parameter domain as possible (domain randomization) for selected scenarios.
-accident scenarios are generated by initializing precrash behaviour and sampling these parameters over a distribution.
-pre-crash scenarios are extracted from NHTSA studies. National Highway Transport Safety Agency.
-Pre-crash conditions for vehicule on vehicule collision only (no pedestrian), intersection or highway
-the distribution of scenarios generated is informed by the NHTSA report
-6 general parameters:
-speed: distribution justified by us department of transports
-car mass: according to NHTSA data
-fog: arbitrary distribution skewed toward clear visibility (half normal distribution)
-brake force: normal distribution
-lane change distance: the distance travelled horizontally by the car to complete a line change, normal distribution
-vertical offset: the distance travelled vertically for a lane change, normal distribution
-other parameters were difficult to simulate such as slipperiness of the road, not modelled
-the weights of the model trained with these simulation scenarios are then used to initialize learning on real data.
-results: much faster convergence time and less epochs, improvement in steering prediction (31% closer to human behaviour than baseline)
-What we can learning from that: If this methodology is good enough for RL we should be able to employ it for validation purposes. We could take advantage of the simulation phase to skew scenarios toward edge cases which are more collision prone and generated our scenarios with domain randomization. We could then tend to a more realistic distribution of scenarios in HiL testing and then physical simulation (domain adaptation)
Share with your friends: |