POLSAB - Policy Learning for Safe Autonomous Behavior (proposal)

POLSAB aims at advancing the state-of-the-art in safe imitation learning for high-dimensional domains in an end- to-end approach. We focus on two main applications: autonomous robot navigation and self-driving car simulations. In order to design efficient and safe policies (which map observations to actions) for these tasks, it is necessary more than just using behavioral cloning which basically applies supervised learning on a labelled dataset.

Control tasks usually have the issue of cascading errors. This happens when the controller's policy does not take into account the feedback loops of controller mistakes: little deviations from the desired reference track (the street lane, or the robot path) causes the error to feedback into the policy as new observations arise, until no valid action is possible anymore.

In order to fix these problems, in this project, we will use a recently introduced framework called Generative Adversarial Imitation Learning (GAIL) for learning robust policies by imitation learning for the robot and the simulated vehicle. To minimize the risk of high-cost events (accidents), the risk-averse version of GAIL will be extended to our application domains.

A second approach will tackle the recent framework of option-critic in Reinforcement Learning (RL), where by defining a reward function (a qualitative measure of the agent's behavior) it is possible to learn robust control policies by trial and error. This method also takes into account temporal abstractions in the policy mappings by creating hierarchies of behaviors in time, which makes it possible to scale up reinforcement learning.

Finally, this project will contribute to research in safe AI by investigating risk-sensitive methods in applications where this is of paramount importance in order to position Luxembourg as a important player in billion dollars industries: safe, robust AI agents in real-world settings (e.g. trading, autonomous driverless vehicles, service robotics).

SUPPORT (if project is realized with FNR support):

FNR Luxembourg
SnT/University of Luxembourg
Institute for Robotics and Process Control at the Technische Universität Braunschweig (Prof. Dr. Jochen Steil)
Google DeepMind (Dr. Raia Hadsell)

PLATFORMS TO BE USED IN THE PROJECT: