Reinforcement Learning with PDEs

February 21, 2025

Previously we discussed applying reinforcement learning to Ordinary Differential Equations (ODEs) by integrating ODEs within gymnasium. ODEs are a powerful tool that can describe a wide range of systems but are limited to a single variable. Partial Differential Equations (PDEs) are differential equations involving derivatives of multiple variables that can cover a far broader range and more complex systems. Often, ODEs are special cases or special assumptions applied to PDEs.

PDEs include Maxwell’s Equations (governing electricity and magnetism), Navier-Stokes equations (governing fluid flow for aircraft, engines, blood, and other cases), and the Boltzman equation for thermodynamics. PDEs can describe systems such as flexible structures, power grids, manufacturing, or epidemiological models in biology. They can represent highly complex behavior; the Navier Stokes equations describe the eddies of a rushing mountain stream. Their capacity for capturing and revealing more complex behavior of real-world systems makes these equations an important topic for study, both in terms of describing systems and analyzing known equations to make new discoveries about systems. Entire fields (like fluid dynamics, electrodynamics, structural mechanics) can be devoted to study of just a single set of PDEs.

This increased complexity comes with a cost; the systems captured by PDEs are much more difficult to analyze and control. ODEs are also described as lumped-parameter systems, the various parameters and variables that describe them are “lumped” into a discrete point (or small number of points for a coupled system of ODEs). PDEs are distributed parameter systems that track behavior throughout space and time. In other words, the state space for an ODE is a relatively small number of variables, such as time and a few system measurements at a specific point. For PDE/distributed parameter systems, the state space size can approach infinite dimensions, or discretized for computation into millions of points for each time step. A lumped parameter system controls the temperature of an engine based on a small number of sensors. A PDE/distributed parameter system would manage temperature dynamics across the entire engine.

As with ODEs, many PDEs must be analyzed (aside from special cases) through modelling and simulation. However, due to the higher dimensions, this modelling becomes far more complex. Many ODEs can be solved through straightforward applications of algorithms like MATLAB’s ODE45 or SciPy’s solve_ivp. PDEs are modelled across grids or meshes where the PDE is simplified to an algebraic equation (such as through Taylor Series expansion) at each point on the grid. Grid generation is a field, a science and art, on its own and ideal (or usable) grids can vary greatly based on problem geometry and physics. Grids (and hence problem state spaces) can number in the millions of points with computation time running in days or weeks, and PDE solvers are often commercial software costing tens of thousands of dollars.

Controlling PDEs presents a far greater challenge than ODEs. The Laplace transform that forms the basis of much classical control theory is a one-dimensional transformation. While there has been some progress in PDE control theory, the field is not as comprehensive as for ODE/lumped systems. For PDEs, even basic controllability or observability assessments become difficult as the state space to assess increases by orders of magnitude and fewer PDEs have analytic solutions. By necessity, we run into design questions such as what part of the domain needs to be controlled or observed? Can the rest of the domain be in an arbitrary state? What subset of the domain does the controller need to operate over? With key tools in control theory underdeveloped, and new problems presented, applying machine learning has been a major area of research for understanding and controlling PDE systems.

Given the importance of PDEs, there has been research into developing control strategies for them. For example, Glowinski et. all developed an analytical adjoint based method from advanced functional analysis relying on simulation of the system. Other approaches, such as discussed by Kirsten Morris, apply estimations to reduce the order of the PDE to facilitate more traditional control approaches. Botteghi and Fasel, have begun to apply machine learning to control of these systems (note, this is only a VERY BRIEF glimpse of the research). Here we will apply reinforcement learning on two PDE control problems. The diffusion equation is a simple, linear, second order PDE with known analytic solution. The Kuramoto–Sivashinsky (K-S) equation is a much more complex 4th order nonlinear equation that models instabilities in a flame front.

For both these equations we use a simple, small square domain of grid points. We target a sinusoidal pattern in a target area of a line down the middle of the domain by controlling input along left and right sides. Input parameters for the controls are the values at the target region and the {x,y} coordinates of the input control points. Training the algorithm required modelling the system development through time with the control inputs. As discussed above, this requires a grid where the equation is solved at each point then iterated through each time step. I used the py-pde package to create a training environment for the reinforcement learner (thanks to the developer of this package for his prompt feedback and help!). With the py-pde environment, approach proceeded as usual with reinforcement learning: the particular algorithm develops a guess at a controller strategy. That controller strategy is applied at small, discrete time steps and provides control inputs based on the current state of the system that lead to some reward (in this case, root mean square difference between target and current distribution).

Unlike previous cases, I only present results from the genetic-programming controller. I developed code to apply a soft actor critic (SAC) algorithm to execute as a container on AWS Sagemaker. However, full execution would take about 50 hours and I didn’t want to spend the money! I looked for ways to reduce the computation time, but eventually gave up due to time constraints; this article was already taking long enough to get out with my job, military reserve duty, family visits over the holidays, civic and church involvement, and not leaving my wife to take care of our baby boy alone!

First we will discuss the diffusion equation:

with x as a two dimensional cartesian vector and ∆ the Laplace operator. As mentioned, this is a simple second order (second derivative) linear partial differential equation in time and two dimensional space. Mu is the diffusion coefficient which determines how fast effects travel through the system. The diffusion equation tends to wash-out (diffuse!) effects on the boundaries throughout the domain and exhibits stable dynamics. The PDE is implemented as shown below with grid, equation, boundary conditions, initial conditions, and target distribution:

from pde import Diffusion, CartesianGrid, ScalarField, DiffusionPDE, pde
grid = pde.CartesianGrid([[0, 1], [0, 1]], [20, 20], periodic=[False, True])
state = ScalarField.random_uniform(grid, 0.0, 0.2)
bc_left={“value”: 0}
bc_right={“value”: 0}
bc_x=[bc_left, bc_right]
bc_y=”periodic”
#bc_x=”periodic”
eq = DiffusionPDE(diffusivity=.1, bc=[bc_x, bc_y])
solver=pde.ExplicitSolver(eq, scheme=”euler”, adaptive = True)
#result = eq.solve(state, t_range=dt, adaptive=True, tracker=None)
stepper=solver.make_stepper(state, dt=1e-3)
target = 1.*np.sin(2*grid.axes_coords[1]*3.14159265)

The problem is sensitive to diffusion coefficient and domain size; mismatch between these two results in washing out control inputs before they can reach the target region unless calculated over a long simulation time. The control input was updated and reward evaluated every 0.1 timestep up to an end time of T=15.

Due to py-pde package architecture, the control is applied to one column inside the boundary. Structuring the py-pde package to execute with the boundary condition updated each time step resulted in a memory leak, and the py-pde developer advised using a stepper function as a work-around that doesn’t allow updating the boundary condition. This means the results aren’t exactly physical, but do display the basic principle of PDE control with reinforcement learning.

The GP algorithm was able to arrive at a final reward (sum mean square error of all 20 points in the central column) of about 2.0 after about 30 iterations with a 500 tree forest. The results are shown below as target and achieved distributed in the target region.

Figure 1: Diffusion equation, green target distribution, red achieved. Provided by author.

Now the more interesting and complex K-S equation:

Unlike the diffusion equation, the K-S equation displays rich dynamics (as befitting an equation describing flame behavior!). Solutions may include stable equilibria or travelling waves, but with increasing domain size all solutions will eventually become chaotic. The PDE implementation is given by below code:

grid = pde.CartesianGrid([[0, 10], [0, 10]], [20, 20], periodic=[True, True])
state = ScalarField.random_uniform(grid, 0.0, 0.5)
bc_y=”periodic”
bc_x=”periodic”
eq = PDE({“u”: “-gradient_squared(u) / 2 – laplace(u + laplace(u))”}, bc=[bc_x, bc_y])
solver=pde.ExplicitSolver(eq, scheme=”euler”, adaptive = True)
stepper=solver.make_stepper(state, dt=1e-3)
target=1.*np.sin(0.25*grid.axes_coords[1]*3.14159265)

Control inputs are capped at +/-5. The K-S equation is naturally unstable; if any point in the domain exceeds +/- 30 the iteration terminates with a large negative reward for causing the system to diverge. Experiments with the K-S equation in py-pde revealed strong sensitivity to domain size and number of grid points. The equation was run for T=35, both with control and reward update at dt=0.1.

For each, the GP algorithm had more trouble arriving at a solution than in the diffusion equation. I chose to manually stop execution when the solution became visually close; again, we are looking for general principles here. For the more complex system, the controller works better—likely because of how dynamic the K-S equation is the controller is able to have a bigger impact. However, when evaluating the solution for different run times, I found it was not stable; the algorithm learned to arrive at the target distribution at a particular time, not to stabilize at that solution. The algorithm converged to the below solution, but, as the successive time steps show, the solution is unstable and begins to diverge with increasing time steps.

Figure 2: K-S equation Green target; yellow, red, magenta, cyan, blue for T = 10, 20, 30, 40. Provided by author.

Careful tuning on the reward function would help obtain a solution that would hold longer, reinforcing how vital correct reward function is. Also, in all these cases we aren’t coming to perfect solutions; but, especially for the K-S equations we are getting decent solutions with comparatively little effort compared to non-RL approaches for tackling these sorts of problems.

The GP solution is taking longer to solve with more complex problems and has trouble handling large input variable sets. To use larger input sets, the equations it generates become longer which make it less interpretable and slower to compute. Solution equations had scores of terms rather than the dozen or so in ODE systems. Neural network approaches can handle large input variable sets more easily as input variables only directly impact the size of the input layer. Further, I suspect that neural networks will be able to handle more complex and larger problems better for reasons discussed previously in previous posts. Because of that, I did develop gymnasiums for py-pde diffusion, which can easily be adapted to other PDEs per the py-pde documentation. These gymnasiums can be used with different NN-based reinforcement learning such as the SAC algorithm I developed (which, as discussed, runs but takes time).

Adjustments could also be made to the genetic programming approach. For example, vector representation of inputs could reduce size of solution equations. Duriez et al.1 all proposes using Laplace transform to introduce derivatives and integrals into the genetic programming equations, broadening the function spaces they can explore.

The ability to tackle more complex problems is important. As discussed above, PDEs can describe a wide range of complex phenomena. Currently, controlling these systems usually means lumping parameters. Doing so leaves out dynamics and so we end up working against such systems rather than with them. Efforts to control or manage these means higher control effort, missed efficiencies, and increased risk of failure (small or catastrophic). Better understanding and control alternatives for PDE systems could unlock major gains in engineering fields where marginal improvements have been the standard such as traffic, supply chains, and nuclear fusion as these systems behave as high dimensional distributed parameter systems. They are highly complex with nonlinear and emergent phenomena but have large available data sets—ideal for machine learning to move past current barriers in understanding and optimization.

For now, I have only taken a very basic look at applying ML to controlling PDEs. Follow ons to the control problem include not just different systems, but optimizing where in the domain the control is applied, experimenting with reduced-order observation space, and optimizing the control for simplicity or control effort. In addition to improved control efficiency, as discussed in Brunton and Kutz2, machine learning can also be used to derive data-based models of complex physical systems and to determine reduced order models which reduce state space size and may be more amenable to analysis and control, by traditional or machine learning methods. Machine learning and PDEs is an exciting area of research, and I encourage you to see what the professionals are doing!

Duriez, Thomas., Steven L Brunton, and Bernd R. Noack. Machine Learning Control–Taming Nonlinear Dynamics and Turbulence. ︎Brunton, Steven., and J. Nathan Kutz. Data Driven Science and Engineering. ︎

The post Reinforcement Learning with PDEs appeared first on Towards Data Science.