In this work we tackle the road sign problem with Reservoir Computing (RC) networks. The T-maze task (a particular form of the road sign problem) consists of a robot in a T-shaped environment that must reach the correct goal (left or right arm of the T-maze) depending on a previously received input sign. It is a control task in which the delay period between the sign received and the required response (e.g., turn right or left) is a crucial factor. Delayed response tasks like this one form a temporal problem that can be handled very well by RC networks. Reservoir Computing is a biologically plausible technique which overcomes the problems of previous algorithms such as Backpropagation Through Time - which exhibits slow (or non-) convergence on training. RC is a new concept that includes a fast and efficient training algorithm. We show that this simple approach can solve the T-maze task efficiently.
Video showing trained RC network controlling the robot:
Publications
Eric Antonelo, Benjamin Schrauwen and Dirk StroobandtMobile Robot Control in the Road Sign Problem using Reservoir Computing Networks Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 911-916 (2008)
Eric Antonelo, Benjamin Schrauwen and Jan Van CampenhoutGenerative Modeling of Autonomous Robots and their Environments using Reservoir Computing Neural Processing Letters, Vol. 26(3), pp. 233-249 (2007)
Reservoir Computing (RC) techniques use a fixed (usually randomly created) recurrent neural network, or more generally any dynamic system, which operates at the edge of stability, where only a linear static readout output layer is trained by standard linear regression methods. In this work, RC is used for detecting complex events in autonomous robot navigation. This can be extended to robot localization tasks which are solely based on a few low-range, high-noise sensory data. The robot thus builds an implicit map of the environment (after learning) that is used for efficient localization by simply processing the stream of distance sensors. These techniques are demonstrated in both a simple simulation environment and in the physically realistic Webots simulation of the commercially available e-puck robot, using several complex and even dynamic environments.
Videos showing data generation for event detection and localization:
Autonomous mobile robots form an important research topic in the field of robotics due to their near-term applicability in the real world as domestic service robots. These robots must be designed in an efficient way using training sequences. They need to be aware of their position in the environment and also need to create models of it for deliberative planning. These tasks have to be performed using a limited number of sensors with low accuracy, as well as with a restricted amount of computational power. In this contribution we show that the recently emerged paradigm of Reservoir Computing (RC) is very well suited to solve all of the above mentioned problems, namely learning by example, robot localization, map and path generation. Reservoir Computing is a technique which enables a system to learn any time-invariant filter of the input by training a simple linear regressor that acts on the states of a highdimensional but random dynamic system excited by the inputs. In addition, RC is a simple technique featuring ease of training, and low computational and memory demands.
Eric Antonelo, Benjamin Schrauwen and Jan Van CampenhoutGenerative Modeling of Autonomous Robots and their Environments using Reservoir Computing Neural Processing Letters, Vol. 26(3), pp. 233-249 (2007)
Title of Master thesis: A Neural Reinforcement Learning Approach for Intelligent Autonomous Navigation Systems
Classical reinforcement learning mechanisms and a modular neural network are unified to conceive an intelligent autonomous system for mobile robot navigation. The conception aims at inhibiting two common navigation deficiencies: generation of unsuitable cyclic trajectories and ineffectiveness in risky configurations. Different design apparatuses are considered to compose a system to tackle with these navigation difficulties, for instance: 1) neuron parameter to simultaneously memorize neuron activities and function as a learning factor, 2) reinforcement learning mechanisms to adjust neuron parameters (not only synapse weights), and 3) a inner-triggered reinforcement. Simulation results show that the proposed system circumvents difficulties caused by specific environment configurations, improving the relation between collisions and captures.
Video (inhibiting unsuitable cyclic trajectories through reinforcement learning):
The robot starts not knowing what it should do in the environment, but as times passes, we can see that it interacts with the environment by colliding against obstacles and capturing targets (yellow boxes). Each collision elicits an appropriate innate response, i.e., aversion. As more collisions take place, its neural network learns to associate obstacles (and its blue color) with aversion behaviors such that it can deviate from obstacles (emergent behavior). The same process occurs for target capture being associated with attraction behavior through learning. In the end, the robot can navigate the environment efficiently, capturing targets, effectively suppressing cyclic trajectories common to such reactive systems.
Video (robot cooperation; each robot trained with previous neural network architecture)
The intelligent autonomous system corresponds to a neural network arranged in three layers (Fig. 4). In the first layer there are two neural repertoires: Proximity Identifier repertoire (PI) and Color Identifier repertoire (CI). Distance sensors stimulate PI repertoire whereas color sensors feed CI repertoire. Both repertoires receive stimuli from contact sensors. The second layer is composed by two neural repertoires: Attraction repertoire (AR) and Repulsion repertoire (RR). Each one establishes connections with both networks in the first layer as well as with contact sensors. The actuator network, connected to AR and RR repertoires, outputs the adjustment on direction of the robot.