Skip directly to content


Online recurrent neural network learning for control of nonlinear plants in oil and gas production platforms

on Wed, 10/17/2018 - 12:54

This research line aims at designing adaptive controllers by using Echo State Networks (ESN) as a efficient data-driven method for training recurrent neural networks capable of controlling complex nonlinear plants, with a focus on oil and gas production platforms from Petrobras.

The resulting ESN-based controllers should learn inverse models of the controlled plant in an online fashion by interacting with the industrial plant and observing its dynamical behaviors.

In collaboration with supervised Master Student Jean P. Jordanou.

Well model. Figure by Jahanshahi et al. (2012).          


Manifold connecting two oil wells and a riser. Figure by Jordanou.

Scheme of Adaptive ESN-based controller and nonlinear plant. Figure by Jordanou

POLSAB aims at advancing the state-of-the-art in safe imitation learning for high-dimensional domains in an end- to-end approach. We focus on two main applications: autonomous robot navigation and self-driving car simulations. In order to design efficient and safe policies (which map observations to actions) for these tasks, it is necessary more than just using behavioral cloning which basically applies supervised learning on a labelled dataset.

Control tasks usually have the issue of cascading errors. This happens when the controller's policy does not take into account the feedback loops of controller mistakes: little deviations from the desired reference track (the street lane, or the robot path) causes the error to feedback into the policy as new observations arise, until no valid action is possible anymore.

In order to fix these problems, in this project, we will use a recently introduced framework called Generative Adversarial Imitation Learning (GAIL) for learning robust policies by imitation learning for the robot and the simulated vehicle. To minimize the risk of high-cost events (accidents), the risk-averse version of GAIL will be extended to our application domains.

A second approach will tackle the recent framework of option-critic in Reinforcement Learning (RL), where by defining a reward function (a qualitative measure of the agent's behavior) it is possible to learn robust control policies by trial and error. This method also takes into account temporal abstractions in the policy mappings by creating hierarchies of behaviors in time, which makes it possible to scale up reinforcement learning.

Finally, this project will contribute to research in safe AI by investigating risk-sensitive methods in applications where this is of paramount importance in order to position Luxembourg as a important player in billion dollars industries: safe, robust AI agents in real-world settings (e.g. trading, autonomous driverless vehicles, service robotics).

SUPPORT (if project is realized with FNR support):

  • FNR Luxembourg
  • SnT/University of Luxembourg
  • Institute for Robotics and Process Control at the Technische Universität Braunschweig (Prof. Dr. Jochen Steil)
  • Google DeepMind (Dr. Raia Hadsell)



Turtlebot Waffle with camera and LiDAR sensors for autonomous navigation



CARLA: simulation for self-driving/autonomous cars in urban environments


TORCS: simulation for racing/road autonomous cars


Generative Modeling of Autonomous Robots and their Environments using Reservoir Computing

on Wed, 01/20/2016 - 21:02

Autonomous mobile robots form an important research topic in the field of robotics due to their near-term applicability in the real world as domestic service robots. These robots must be designed in an efficient way using training sequences. They need to be aware of their position in the environment and also need to create models of it for deliberative planning. These tasks have to be performed using a limited number of sensors with low accuracy, as well as with a restricted amount of computational power. In this contribution we show that the recently emerged paradigm of Reservoir Computing (RC) is very well suited to solve all of the above mentioned problems, namely learning by example, robot localization, map and path generation. Reservoir Computing is a technique which enables a system to learn any time-invariant filter of the input by training a simple linear regressor that acts on the states of a highdimensional but random dynamic system excited by the inputs. In addition, RC is a simple technique featuring ease of training, and low computational and memory demands.

Keywords: reservoir computing, generative modeling, map learning, T-maze task, road sign problem, path generation







Related Publications

  1. Eric AntoneloBenjamin Schrauwen and Jan Van Campenhout Generative Modeling of Autonomous Robots and their Environments using Reservoir Computing Neural Processing Letters, Vol. 26(3), pp. 233-249 (2007)   


Technologies used: C++, TCP/IP sockets, Linux, Qt, Qwt. 
Developed from 2001 to 2010.
Two different software programs were developed during my undergraduate and Master studies: SINAR, a simulator that shows graphically the representation of the environment and the simulation in real time; and CONAR, the autonomous  controller that receives sensor data (from SINAR) and output the actuators data (to SINAR). Simulations with multiple robots can be done if more than one controller (CONAR) connects to SINAR. The communication between both programs is represented on the following figure:

The communication protocol is implemented using TCP/IP sockets. Thus, several controllers can run under different computers over a network (distributing the computing load through different nodes of a network). Both simulator programs were developed under the Linux operating system; the graphical interface was developed with Qt library and some graphical plots were created with Qwt library. C++ was the programming  language used to create both programs.


SINAR is a simulator for autonomous robot navigation experiments. Its graphical user interface contains menus, command bars, and the environment display. 

The user can create simulation environments merely by clicking and moving the mouse cursor in the display area. Objects are inserted, resized, translated and rotated simply by moving the mouse device. An object can be a obstacle, a target or even a robot. The user can also edit an object by changing its color and type of movement (for moving objects).

Environments can be saved in files and posteriorly they can be loaded for being used in simulations. Before a simulation starts, one or more controllers (CONAR) should be connected to the SINAR software. The user can control the simulation by activating appropriate (button) commands: start, pause and finish.

During a simulation, sensor data can be viewed graphically through plots in real time:

The performance of the robot (number of collisions, number of target captures, and number of executed iterations) can be verified in real time as well.

There are two modes of simulation: orginary mode and sophisticated mode. In the former mode, the environment display is updated at each iteration such that the user can view graphically the progress of the simulation. Furthermore,  the user can move any object in the environment in real time.

In the latter mode, the simulation is accomplished implicitily (not graphically) and is composed of a set of experiments configured by a specific C++ script. The script determines the sequence and the duration of simulation experiments (considering that each experiment uses distinct environments), besides the number of repetitions for a sequence of experiments. In the sophisticated mode, all generated data are recorded in files:  the trajectory of the robot and the performance measures (number of captures, collisions and their respective iteration time); the representation of the final state of the environment in PNG format and the performance plot (also called learning evolution graphic) are also generated automatically. The controller data (neural networks states) are also saved in an automatic way since the script tells CONAR to save its state when each simulation is finished.



CONAR is a program that simulates the brain of a robot located in the SINAR environment. After receiving sensor data (distance, color and contact) from its respective robot in the SINAR environment, it sends actuator data (direction adjustment and velocity adjustment) to the same robot. This cycle is kept until the simulations ends.

The graphical interface of CONAR is shown on the following figure. Parameters of the controller can be adjusted before the simulation and in real time; commands can be activated by clicking on buttons: connect to SINAR, apply parameters changes in real time, generate performance data and plots for recording in files, save neural networks state, exit simulation. Furthermore, some neural networks in the controller can be disabled in real time (so that it outputs null (zero)): IP, IC, RR and AR networks.

   In addition, neural networks state can be viewed graphically in real time. In the following figures, a neuron is represented by a circle. In addition, the more black a neuron is, stronger is its output.

In above figure, it is shown a representation of PI repertoire neurons. A small red square inside a circle means that a neuron has already been winner during a learning event.

The next figure shows AR, RR and actuator neurons. The energy levels (degree of activity) of AR or RR neurons are represented by a thick line next to the respective neurons.

In following picture, it is shown the graphical representation of output of winner neurons in PI repertoire (each line represents the winner neuron output in a column: the first line corresponds to the first column and so on).


To see a video of a simulation run, check out this page: Reinforcement learning of robot behaviors

Related publications

  1. Eric AntoneloAlbert-Jan BaerveldtThorsteinn Rognvaldsson and Mauricio Figueiredo Modular Neural Network and Classical Reinforcement Learning for Autonomous Robot Navigation: Inhibiting Undesirable Behaviors Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1225-1232 (2006)    
  2. Eric Antonelo A Neural Reinforcement Learning Approach for Behavior Acquisition in Intelligent Autonomous Systems Master thesis, Halmstad University (2006)    
  3. Eric AntoneloMauricio FigueiredoAlbert-Jan Baerlveldt and Rodrigo Calvo Intelligent autonomous navigation for mobile robots: spatial concept acquisition and object discrimination Proceedings of the 6th IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA), pp. 553-557 (2005)    
  4. Eric Antonelo and Mauricio Figueiredo Autonomous intelligent systems applied to robot navigation: spatial concept acquisition and object discrimination Proceedings of the 2nd National Meeting of Intelligent Robotics (II ENRI) in the Congress of the Brazilian Computer Society (in Portuguese), pp. (2004)  

Learning navigation attractors for mobile robots with reinforcement learning and reservoir computing

on Wed, 12/16/2015 - 16:48

Autonomous robot navigation in partially observable environments is a complex task because the state of the environment can not be completely determined only by the current sensory readings of a robot. This work uses the recently introduced paradigm for training recurrent neural networks (RNNs), called reservoir computing (RC), to model multiple navigation attractors in partially observable environments. In RC, the RNN with randomly generated fixed weights, called reservoir, projects the input into a high-dimensional dynamic space. Only the readout output layer is trained using standard linear regression techniques, and in this work, is used to approximate the state-action value function. By using a policy iteration framework, where an alternating sequence of policy improvement (samples generation from environment interaction) and policy evaluation (network training) steps are performed, the system is able to shape navigation attractors so that, after convergence, the robot follows the correct trajectory towards the goal. The experiments are accomplished using an e-puck robot extended with 8 distance sensors in a rectangular environment with an obstacle between the robot and the target region. The task is to reach the goal through the correct side of the environment, which is indicated by a temporary stimulus previously observed at the beginning of the episode. We show that the reservoir-based system (with short-term memory) can model these navigation attractors, whereas a feedforward network without memory fails to do so.

Reservoir Computing network as a function approximator for reinforcement learning tasks with partially observable environments. The reservoir is a dynamical system of recurrent nodes. Solid lines represent connections which are fixed. Dashed lines are the connections to be trained


Motor primitives or basic behaviors: left, forward and right.


A sequence of robot trajectories as learning evolves, using the ESN. Each plot shows robot trajectories in the environment for several episodes during the learning process. In the beginning, exploration is high and several locations are visited by the robot. As the simulation develops, two navigation attractors are formed to the left and to the right so that the agent receives maximal reward.