Linear Quadratic Regulator#
Problem Formulation#
Note
\(Q_k\) and \(R_k\) are defined/tuned by the human, so they can always be chosen satisfy their requirements.
The LQR formulation is applicable to a wide range of linear and non-linear systems. This is why it is so prominent in this field. It is computationally tractable.
The time-varying description above is general. For example, linearizing dynamics about the knot points leads to a LQR formulation that can be applied to a non-linear system.
It is “time invariant” if \(A_k = A\), \(B_k = B\) and \(Q_k = Q\), \(R_k = R\). This is also applicable to surprisingly many non-linear problems. It is done by linearizing about a set point.
There are many extensions to LQR such as infinite horizon, stochastic etc.
Solving LQR via Indirect Shooting#
Plugging the cost and dynamics into Pontryagin’s Principle we have.
Algorithm
Start with initial \(U_{1:N-1}\)
Simulate (“rollout”) to get \(X_{1:N}\)
Backward pass to get \(\lambda\) and \(\Delta u\)
Rollout with line search on \(\Delta u\)
Go to 3. until convergence.
Example: Double Integrator#
Dynamics of a frictionless cart could be given by
Discretizing this with Forward Euler gives
Show code cell content
import Pkg; Pkg.activate(@__DIR__);
using LinearAlgebra
using SparseArrays
using Plots
Activating project at `/__w/CMU-16-745/CMU-16-745/optimalcontrol`
function cost(params::NamedTuple, X::Matrix, U::Matrix)
n, N = size(X) # states and time steps
Q = params.Q
R = params.R
Qn = params.Qn
# Terminal cost
J = 0.5 * X[:, end]' * Qn * X[:, end]
# Step cost
for k = 1:(N-1)
J += 0.5 * X[:, k]' * Q * X[:, k] + 0.5 * U[:, k]' * R * U[:, k]
end
return J
end
function rollout(params::NamedTuple, X::Matrix, U::Matrix)
n, N = size(X)
A = params.A
B = params.B
X1 = zeros(size(X))
X1[:, 1] = X[:, 1] # Initial condition
for k = 1:(N-1)
X1[:, k+1] = A * X1[:, k] + B * U[:, k]
end
return X1
end
rollout (generic function with 1 method)
# Discrete dynamics
h = 0.1 # time step
A = [1 h; 0 1]
B = 1
B = reshape([0.5*h*h; h],(2,1))
# Cost weights
Q = 1.0*I(2)
R = 0.1
Qn = 1.0*I(2)
# Package it
params = (A=A, B=B, Q=Q, R=R, Qn=Qn)
# Horizon
n = 2 # number of state
m = 1 # number of controls
Tfinal = 5.0 # final time #try larger values
N = Int(Tfinal/h)+1 # number of time steps
T = Array(range(0,h*(N-1), step=h));
# Initial guesses
x0 = [1.0; 0] # Initial conditions
X = repeat(x0, 1, N)
U = zeros(1, N-1)
Δu = ones(size(U)) # all the Δu_k
λ = zeros(n, N) # all the λ_k
2×51 Matrix{Float64}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
tol = 1e-2
iterations = 0
max_iter = 1000
solved = false
X = rollout(params, X, U) # intial rollout
@show cost(params, X, U) # initial cost
for i = 1:max_iter
# Backward pass to compute λ and Δu
λ[:, N] = Qn * X[:, N]
for k = N-1:-1:1
Δu[:, k] = -(U[:, k] + (R \ B')*λ[:, k+1])
λ[:, k] .= Q * X[:, k] + A' * λ[:, k+1]
end
# Forward pass with line search to compute x
α = 1.0
b = 1e-2 # line search tolerance
unew = U + α .* Δu
xnew = rollout(params, X, unew)
line_search_success = false
for j = 1:max_iter
# Roll out a new trajectory
α = 0.5 * α
unew = U + α .* Δu
xnew = rollout(params, X, unew)
# Keep going until the predicted update is within reason
if cost(params, xnew, unew) < cost(params, X, U) - b * α * (Δu * Δu')[1]
line_search_success = true
break
end
end
if !line_search_success
error("Line search failed")
end
U .= unew
X .= xnew
iterations += 1
if maximum(abs.(Δu)) < tol
solved = true
break
end
end
if !solved
error("No solution found, try increasing iters")
end
@show iterations;
@show cost(params, X, U);
cost(params, X, U) = 25.5
iterations =
664
cost(params, X, U) = 6.658144576322122
# Plot
l = @layout [a ; b]
p1 = plot(T, X', label=["x1" "x2"])
p2 = plot(T[1:end-1], U', label="u", xlabel="Time")
plot(p1, p2, layout = l)
Practical#
This is great we can solve an LQR problem. But let’s take a quick look at the practicalities of this.
TODO
Show “back-prop” through this suffers from vanishing gradients
What is the time complexity
Solving LQR via QP#
Let’s explore the structure of the OC program. If we stack all the decision variables into a vector \(z\) then we will see the program has a familiar structure
Where the decision variables and weights are
and the contstraints
This is just a QP. At this point we could throw it at any old solver, but let’s be a little more rigorous.
The Lagriangian of the QP is
and the KKT conditions
This is one big linear system. So that means we could get the solution through one Newton’s method iteration! Let’s give it a go.
# Discrete dynamics
h = 0.1 # time step
A = [1 h; 0 1]
B = [0.5*h*h; h]
n = 2 # number of state
m = 1 # number of controls
Tfinal = 5.0 # final time
N = Int(Tfinal/h)+1 # number of time steps
thist = Array(range(0,h*(N-1), step=h));
# Cost weights
Q = sparse(1.0*I(2))
R = sparse(0.1*I(1))
Qn = sparse(1.0*I(2))
#Cost function
function J(xhist,uhist)
cost = 0.5*xhist[:,end]'*Qn*xhist[:,end]
for k = 1:(N-1)
cost = cost + 0.5*xhist[:,k]'*Q*xhist[:,k] + 0.5*(uhist[k]'*R*uhist[k])[1]
end
return cost
end
# Cost
H = blockdiag(R, kron(I(N-2), blockdiag(Q,R)), Qn);
# Constraints
C = kron(I(N-1), [B -I(2)])
for k = 1:N-2
C[(k*n).+(1:n), (k*(n+m)-n).+(1:n)] .= A
end
d = [-A*x0; zeros(size(C,1)-n)];
# Solve the linear system
KKT = [H C'; C zeros(size(C, 1), size(C, 1))]
@show size(KKT)
y = KKT \ [zeros(size(H, 1)); d]
# Get state history
z = y[1:size(H, 1)] # states and controls [u0,x1,u1,...,xN]
Z = reshape(z, n + m, N - 1)
xhist = Z[m+1:end, :]
xhist = [x0 xhist]
# Get control history
uhist = Z[1, :];
J(xhist, uhist)
size(KKT) = (250, 250)
6.658133166380833
# Plot
times = range(0,h*(N-1), step=h)
l = @layout [a ; b]
p1 = plot(times, xhist', label=["x1" "x2"])
p2 = plot(times[1:end-1], uhist, label="u", xlabel="Time")
plot(p1, p2, layout = l)
In the problem solution above we used sparse to exploit the fact that most of the entries in the matrices are zero. Let’s look at the sparsity pattern. The
KKT
250×250 SparseMatrixCSC{Float64, Int64} with 844 stored entries:
⎡⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢳⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠹⣆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢧⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⢷⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠹⣆⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢧⡀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⢳⡄⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠹⣆⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣦⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⢳⡄⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠹⣆⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢧⎥
⎢⠙⠲⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠈⠙⠶⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠈⠙⠳⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⠶⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⠶⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⠲⣤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠶⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⠶⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎦
TODO
Discuss and show improvements in sparse and dense matrix (wall clock time)
Show time complexity
Show the solution with JUMP interface and compare wall clock time?
Solving LQR - Riccati Recursion#
The KKT system is definitely sparse and very structured.
Note
Exploiting structure and sparsity is extremely important in solving OC problems.
Let’s explore the backward pass in more detail. Starting with the row containing \(Q_N\) (cost)
Now the row containing \(u_3\)
Plugging in the dynamics \(x_4=Ax_3 + Bu_3\)
And solving this system for \(u_3\)
Let’s keep going with the row for \(x_3\)
This continues on so that we have the recursive relationship
This is called Riccati recursion.
We can solve the QP by doing a backward Riccati recusion followed by a forward rollout to compute \(X_{2:N}\) and \(U_{1:N-1}\).
Importantly we have a feedback policy \(u_k = -K_k x_k\) which we could employ directly to control as system. No tracking controller necessary.
Note
Our dense QP has complexity \(O(N^3(n+m)^3)\)
TODO confirm: The sparse QP has complexity that is closer to superlinear in time???
The Ricatti solution has complexity \(O(N(n+m)^3)\) which is linear in time.
Example#
# Cost weights
Q = Array(1.0*I(2))
R = 0.1 #Array(1.0*I(1))
Qn = Array(1.0*I(2))
P = zeros(n,n,N)
K = zeros(m,n,N-1)
P[:,:,N] .= Qn
#Backward Riccati recursion
for k = (N-1):-1:1
K[:,:,k] .= (R + B'*P[:,:,k+1]*B)\(B'*P[:,:,k+1]*A)
P[:,:,k] .= Q + A'*P[:,:,k+1]*(A-B*K[:,:,k])
end
#Forward rollout starting at x0
xhist = zeros(n,N)
xhist[:,1] = x0
uhist = zeros(m,N-1)
for k = 1:(N-1)
uhist[:,k] .= -K[:,:,k]*xhist[:,k]
xhist[:,k+1] .= A*xhist[:,k] + B*uhist[k]
end
times = range(0,h*(N-1), step=h)
l = @layout [a ; b]
p1 = plot(times, xhist', label=["x1" "x2"])
p2 = plot(times[1:end-1], uhist', label="u", xlabel="Time")
plot(p1, p2, layout = l)
TODO
Show time complexity