LQR Control

Basic introduction to LQR Control. The current text is largely based on the document "Linear Quadratic Regulator" by MS Triantafyllou.

LQR Brief Overview

Linear Quadratic Regulator

Introduction

The Linear Quadratic Regulator (LQR) is a well-known method that provides optimally controlled feedback gains to enable the closed-loop stable and high performance design of systems.

Full-State Feedback

For the derivation of the linear quadratic regulator we consider a linear system state-space representation:

$\dot{x} = Ax + Bu$
$\dot{y} = Cx,~C=I_{n\times n}$
which essentially means that full state feedback is available (all $n$ states are measurable).

The feedback gain is a matrix $K$ and the feedback control action takes the form:

$u = K(x^{ref}-x)$

The closed-loop system dynamics are then written:

$\dot{x} = (A-BK)x + BKx^{ref}$

where $x^{ref}$ represents the vector of desired states, and serves as the external input to the closed-loop system. The “A-matrix” of the closed-loop systems is $(A-BK)$ , while its $B-matrix$ is $BK$ . The closed-loop system has exactly the same amount of inputs and outputs - $n$ . The column dimension of $B$ equals the number of channels available in $u$ , and must match the row dimension of $K$ . Pole-placement is the process of placing the poles of $(A-BK)$ in stable, suitably-damped locations in the complex plane.

The Maximum Principle

Towards a generic procedure for solving optimal control problems, we derive a methodology based on the calculus of variations. The problem statement for a fixed end time $t_f$ is:

choose $u(t)$ to minimize $J = \psi(x(t_f)) + \int _{t0}^{tf} L(x(t),u(t),t)dt$
subject to $\dot{x} = f(x(t),u(t),t)$ x(t_0) = x_0$

where $\psi(x(t_f),t_f)$ is the terminal cost; the total cost $J$ is a sum of the terminal cost and an integral along the way. We assume that $L(x(t),u(t),t)$ is nonnegative. The first step is to augment the cost using the costate vector $\lambda (t)$

$\bar{J} = \psi(x(t_f)) + \int _{t0}^{tf} (L + \lambda^T(f - \dot{x}))dt$

As understood, $\lambda (t)$ may be an arbitrary expression we choose, since it multiplies $f - \dot{x}= 0$ . Along the optimum trajectory, variations in $J$ and hence $\bar{J}$ should vanish. This follows from the fact that $J$ is chosen to be continuous in $x$ , $u$ , and $t$ . We write the variation as:

$\delta \bar{J} = \psi _x \delta x(t_f) + \int _{t0}^{tf} [L_x \delta x + L_u \delta u \lambda ^T f_x \delta x + \lambda ^T f_u \delta u - \lambda ^T \delta \dot{x}dt$

where subscripts denote partial derivatives. The last term above can be evaluated using integration by parts as:

$-\int _{t0}^{tf} \lambda ^T \dot{x} dt = - \lambda ^T (t_f) \delta x (t_f) + \lambda ^T (t_0) \delta x(t_0) + \int _{t0}^{tf} \dot{\lambda} ^T \delta x dt$ ,
$\delta \bar{J} = \psi _x (x(t_f)) \delta x (t_f) + \int _{t0}^{tf} (L_u + \lambda ^T f_u)\delta u dt + \int _{t0}^{tf} (L_x + \lambda ^T f_x + \dot{\lambda}^T )\delta x dt - \lambda ^T (t_f) \delta x (t_f) + \lambda ^T (t_0) \delta x(t_0)$ .

The last term is zero, since we cannot vary the initial of the state by changing something later in time. This writing of $\bar{J}$ indicates that there are three components of the variation that must independently be zero:

$L_u + \lambda ^T f_u = 0$
$L_x + \lambda ^T f_x + \dot{\lambda}^T = 0$
$\psi_x (x (t_f)) - \lambda^T (t_f) = 0$

The second and third requirements are met by explicitly setting:

$\dot{\lambda}^T = -L_x - \lambda ^T f_x$
$ \lambda _^T (t_f) = \psi _x (x(t_f)).

The evolution of $\lambda$ is given in reverse time, from a final state to the initial. Hence we see the primary difficulty of solving optimal control problems: the state propagates forward in time, while the costate propagates backward. The state and costate are coordinated through the above equations.

Gradient Method Solution for the General Case

Numerical solutions to the general problem are iterative, and the simplest approach is the gradient method. Its steps are as follows:

For a given $x_0$ , pick a control history $u(t)$ .
Propagate $\dot{x} = f(x,u,t)$ forward in time to create a state trajectory.
Evaluate $\psi _x (x(t_f))$ , and propagate the costate backward in time from $t_f$ to $t_0$ .
At each time step, choose $\delta u = -K(L_u + \lambda ^T f_u)$ , where $K$ is a positive scalar or a positive definite matrix in the case of multiple input channels.
Let $u = u + \delta u$ .
Go back to step 2 and repeat loop until solution has converged.

The first three steps are consistent in the sense that $x$ is computed directly from $x(t_0)$ and $u$ and $\lambda$ is computed from $x$ and $x(t_f)$ . All of $\delta \bar{J}$ except the integral with $\delta u$ is therefore eliminated explicitly. The choice of $\delta u$ in step 4 then achieves $\delta \bar{J} < 0$ unless $\delta u = 0$ , in which case the problem is solved.

LQR Solution

In the case of the Linear Quadratic Regulator (with zero terminal cost), we set $\psi = 0$ , and

$L = \frac{1}{2} x^T Q x + \frac{1}{2} u^T R u$

where the requirement that $L \ge 0$ implies that both $Q$ and $R$ are positive definite. In the case of linear plant dynamics also, we have:

$L_x = x^T Q$
$L_u = u^T R$
$f_x = A$
$f_u = B$

so that:

$\dot{x} = A x + B u$
$x(t_0) = x_0$
$\dot{\lambda} = - Q x - A^T \lambda$
$\lambda (t_f) = 0$
$R u + B^T \lambda = 0$ .

Since the systems are clearly linear, we try a connection $\lambda = P x$ . Inserting this into $\dot{\lambda}$ equation, and then using the $\dot{x}$ equation, and a substitution for $u$ , we obtain:

$P A x + A^T P x + Q x - P B R ^{-1} B ^T P x + \dot{P} = 0$

This has to hold for all $x$ , so in fact it is a matrix equation, the matrix Riccatti equation. The steady-state solution is given satisfies:

$P A + A^T P + Q - P B R^{-1} B^T P = 0$

Optimal Full-State Feedback

This equation is the Matrix Algebraic Riccati Equation (MARE), whose solution $P$ is needed to compute the optimal feedback gain K. The MARE is easily solved by standard numerical tools in linear algebra. The equation $R u + B^T \lambda = 0$ gives the feedback law:

$u = - R^{-1} B ^T P x$

Properties and Use of the LQR

Static Gain: The LQR generates a static gain matrix $K$ , which is not a dynamical system. Hence, the order of the closed-loop system is the same as that of the plan.
Robustness: The LQR achieves infinite gain margin.
Output Variables: When we want to conduct output regulation (and not state regulation), we set $Q = C^T Q^\prime C$ .

Python Implementation

Simple Python code for the lqr/discrete lqr functions

from __future__ import division, print_function
 
import numpy as np
import scipy.linalg
 
def lqr(A,B,Q,R):
    """Solve the continuous time lqr controller.
     
    dx/dt = A x + B u
     
    cost = integral x.T*Q*x + u.T*R*u
    """
    #ref Bertsekas, p.151
 
    #first, try to solve the ricatti equation
    X = np.matrix(scipy.linalg.solve_continuous_are(A, B, Q, R))
     
    #compute the LQR gain
    K = np.matrix(scipy.linalg.inv(R)*(B.T*X))
     
    eigVals, eigVecs = scipy.linalg.eig(A-B*K)
     
    return K, X, eigVals
 
def dlqr(A,B,Q,R):
    """Solve the discrete time lqr controller.
     
     
    x[k+1] = A x[k] + B u[k]
     
    cost = sum x[k].T*Q*x[k] + u[k].T*R*u[k]
    """
    #ref Bertsekas, p.151
 
    #first, try to solve the ricatti equation
    X = np.matrix(scipy.linalg.solve_discrete_are(A, B, Q, R))
     
    #compute the LQR gain
    K = np.matrix(scipy.linalg.inv(B.T*X*B+R)*(B.T*X*A))
     
    eigVals, eigVecs = scipy.linalg.eig(A-B*K)
     
    return K, X, eigVals