Skip directly to content

Aprendizagem de Máquina

Physics-Informed Neural Nets for Control of Dynamical Systems

on Tue, 09/28/2021 - 16:00

Physics-informed neural networks (PINNs) impose known physical laws into the learning of deep neural networks, making sure they respect the physics of the process while decreasing the demand of labeled data. For systems represented by Ordinary Differential Equations (ODEs), the conventional PINN has a continuous time input variable and outputs the solution of the corresponding ODE. In their original form, PINNs do not allow control inputs neither can they simulate for long-range intervals without serious degradation in their predictions. In this context, this work presents a new framework called Physics-Informed Neural Nets for Control (PINC), which proposes a novel PINN-based architecture that is amenable to control problems and able to simulate for longer-range time horizons that are not fixed beforehand. The framework has new inputs to account for the initial state of the system and the control action. In PINC, the response over the complete time horizon is split such that each smaller interval constitutes a solution of the ODE conditioned on the fixed values of initial state and control action for that interval. The whole response is formed by feeding back the predictions of the terminal state as the initial state for the next interval. This proposal enables the optimal control of dynamic systems, integrating a priori knowledge from experts and data collected from plants into control applications. We showcase our proposal in the control of two nonlinear dynamic systems: the Van der Pol oscillator and the four-tank system.

MPC: Representation of the output prediction in a time instant, where the proposed actions generate a predicted behavior that reduces the distance between the value predicted by the model and a reference trajectory:

mpc_pred.png


The PINC network has initial state y(0) of the dynamic system and control input u as inputs, in addition to continuous time scalar t. Both y(0) and u can be multidimensional. The output y(t) corresponds to the state of the dynamic system as a function of t 2 [0; T], and initial conditions given by y(0) and u. The deep network is fully connected even though not all connections are shown:

pinc_net.png

Below, modes of operation of the PINC network. (a) PINC net operates in self-loop mode, using its own output prediction as next initial state, after T seconds. This operation mode is used within one iteration of MPC, for trajectory generation until the prediction horizon of MPC completes (predicted output from the first Figure). (b) Block diagram for PINC connected to the plant. One pass through the diagram arrows corresponds to one MPC iteration applying a control input u for Ts timesteps for both plant and PINC network. Note that the initial state of the PINC net is set to the real output of the plant. In practice, in MPC, these two operation modes are executed in an alternated way (optimization in the prediction horizon, and application of control action).

a). pinc_feedback.png           


b) pinc_plant.png

Below, the representation of a trained PINC network evolving through time in self-loop mode (previous Figure a)) for trajectory generation in prediction horizon. The top dashed black curve corresponds to a predicted trajectory y of a hypothetical dynamic system in continuous time. The states y[k] are snapshots of the system in discrete time k positioned at the equidistant vertical lines. Between two vertical lines (during the inner continuous interval between steps k and k + 1), the PINC net learns the solution of an ODE with t \in [0; T], conditioned on a fixed control input u[k] (blue solid line) and initial state y(0) (green thick dashed line). Control action u[k] is changed at the vertical lines and kept fixed for T seconds, and the initial state y(0) in the interval between steps k and k + 1 is updated to the last state of the previous interval k  1 (indicated by the red curved arrow). The PINC net can directly predict the states at the vertical lines without the need to infer intermediate states t < T as numerical simulation does. Here, we assume that T = Ts and, thus, the number of discrete timesteps M is equal to the length of the prediction horizon in MPC.

pinc_evolution.png

Online recurrent neural network learning for control of nonlinear plants in oil and gas production platforms

on Wed, 10/17/2018 - 12:54

This research line aims at designing adaptive controllers by using Echo State Networks (ESN) as a efficient data-driven method for training recurrent neural networks capable of controlling complex nonlinear plants, with a focus on oil and gas production platforms from Petrobras.

The resulting ESN-based controllers should learn inverse models of the controlled plant in an online fashion by interacting with the industrial plant and observing its dynamical behaviors.

In collaboration with supervised Master Student Jean P. Jordanou.

Well model. Figure by Jahanshahi et al. (2012).          

 

Manifold connecting two oil wells and a riser. Figure by Jordanou.

Scheme of Adaptive ESN-based controller and nonlinear plant. Figure by Jordanou

State-of-the-art Artificial Intelligence method for detecting that you is really you and not some intruder entering the code on your mobile phone.

Technologies used:
Python (backend & custom Neural network model);
Java (Android app frontend);

Developed in 2016/2017.

 

More information:  TigerAI_info.pdf

 

 

Learning navigation attractors for mobile robots with reinforcement learning and reservoir computing

on Wed, 12/16/2015 - 16:48

Autonomous robot navigation in partially observable environments is a complex task because the state of the environment can not be completely determined only by the current sensory readings of a robot. This work uses the recently introduced paradigm for training recurrent neural networks (RNNs), called reservoir computing (RC), to model multiple navigation attractors in partially observable environments. In RC, the RNN with randomly generated fixed weights, called reservoir, projects the input into a high-dimensional dynamic space. Only the readout output layer is trained using standard linear regression techniques, and in this work, is used to approximate the state-action value function. By using a policy iteration framework, where an alternating sequence of policy improvement (samples generation from environment interaction) and policy evaluation (network training) steps are performed, the system is able to shape navigation attractors so that, after convergence, the robot follows the correct trajectory towards the goal. The experiments are accomplished using an e-puck robot extended with 8 distance sensors in a rectangular environment with an obstacle between the robot and the target region. The task is to reach the goal through the correct side of the environment, which is indicated by a temporary stimulus previously observed at the beginning of the episode. We show that the reservoir-based system (with short-term memory) can model these navigation attractors, whereas a feedforward network without memory fails to do so.

Reservoir Computing network as a function approximator for reinforcement learning tasks with partially observable environments. The reservoir is a dynamical system of recurrent nodes. Solid lines represent connections which are fixed. Dashed lines are the connections to be trained

 

Motor primitives or basic behaviors: left, forward and right.

 

A sequence of robot trajectories as learning evolves, using the ESN. Each plot shows robot trajectories in the environment for several episodes during the learning process. In the beginning, exploration is high and several locations are visited by the robot. As the simulation develops, two navigation attractors are formed to the left and to the right so that the agent receives maximal reward.

 

Supervised Learning of Internal Models for Autonomous Goal-Oriented Robot Navigation using Reservoir Computing

on Wed, 12/16/2015 - 14:45

In this work we propose a hierarchical architecture which constructs internal models of a robot environment for goal-oriented navigation by an imitation learning process. The proposed architecture is based on the Reservoir Computing paradigm for training Recurrent Neural Networks (RNN). It is composed of two randomly generated RNNs (called reservoirs), one for modeling the localization capability and one for learning the navigation skill. The localization module is trained to detect the current and previously visited robot rooms based only on 8 noisy infra-red distance sensors. These predictions together with distance sensors and the desired goal location are used by the navigation network to actually steer the robot through the environment in a goal-oriented manner. The training of this architecture is performed in a supervised way (with examples of trajectories created by a supervisor) using linear regression on the reservoir states. So, the reservoir acts as a temporal kernel projecting the inputs to a rich feature space, whose states are linearly combined to generate the desired outputs. Experimental results on a simulated robot show that the trained system can localize itself within both simple and large unknown environments and navigate successfully to desired goals.

 

 

 

  1. Eric Antonelo and Benjamin Schrauwen On Learning Navigation Behaviors for Small Mobile Robots with Reservoir Computing Architectures IEEE Transactions on Neural Networks and Learning Systems, Vol. 26 pp. 763-780 (2014). DOI: 10.1109/TNNLS.2014.2323247.  
  2. Eric Antonelo and Benjamin Schrauwen Supervised learning of internal models for autonomous goal-oriented robot navigation using Reservoir Computing IEEE International conference on Robotics and Automation, Proceedings, pp. 6 (2010)   

 

Pages