Clojure Numerics, Part 5 - Orthogonalization and Least Squares

Need help with your custom Clojure software? I'm open to (selected) contract work.

October 17, 2017

Please share: Twitter.

These books fund my work! Please check them out.

How to solve linear systems that have many solutions, or those that have no solutions at all? That's the theme for a thick math textbook, of course, but from the programmer's point of view, we are interested in the practical matters, so I'll stick to the main point. When I have less equations than there are unknowns, or I have too many equations, what can I do in Clojure to make those unknowns known? What functions do I call?

Before I continue, a few reminders:

Include Neanderthal library in your project to be able to use ready-made high-performance linear algebra functions.
Read articles in the introductory Clojure Linear Algebra Refresher series.
This is the fifth article in a more advanced series. It is a good idea to read them all in sequence if this is your first contact with this kind of software.

The namespaces we'll use:

(require '[uncomplicate.neanderthal
           [native :refer [dge dv]]
           [core :refer [mm trans view-sy view-tr axpy nrm2 mm! copy submatrix]]
           [linalg :refer [qrf org svd sv ls!]]])

Let's begin

I wrote about solving linear systems, and I think it was clear that, while there are complications along the way when the specific numbers are a bit funky, it is otherwise a quite straightforward business. Many problems are not structured so well. What about the case when we have more equations than there are unknowns? Such systems are overdetermined; there are so many conditions that there probably isn't any solution to satisfy them all. In that case, we have to choose the solution that is somehow "least bad". What is "least bad" is also its own matter. On top of that, since I am a practical programmer who prefers Clojure, I also require an elegant programming API to find the solution. And, since I accept nothing less than the best from Neanderthal, I want it to be superfast…

The Full-rank Least Squares

Let's consider the problem first. I have a system \(Ax=b\), where \(A\in{R^{m\times{n}}}\) is a data matrix holding coefficients, and \(b\in{R^m}\) is an observation vector, while \(m\geq{n}\). Such system is called overdetermined, and usually there is no exact solution to such system.

Since we can't find the exact solution, we opt for the next best, the "nearest" solution. Recall from the Clojure Linear Algebra Refresher series that in linear algebra we use norms to measure distance. Therefore, the nearest something is something with the least norm \({\lVert Ax-b \rVert}_p\). But, which norm to choose? 1-norm? ∞-norm? Different norms will give different answers. The matter is, fortunately, solved by the fact that minimization of 1-norm and ∞-norm is too complicated, so our best buddy the Euclidean norm is the winner.

Minimization of the Euclidean norm \(min {\lVert Ax-b \rVert}_2\) is called the least squares problem. It is convenient because this function is differentiable, and the 2-norm is preserved under orthogonal transformations (see Clojure Linear Algebra Refresher series for the basics, and the specific details follow here shortly).

When \(A\) is full-rank (\(ran(A)=n\)), there is a unique linear squares solution, and it can be found by solving the symmetric positive definite system \(A^TAx_{LS}=A^Tb\). You remember one of the previous articles in this series? We talked about these.

Let's try a random example in Clojure:

First, I'll construct a (randomly chosen) matrix a, and make sure its rank is 2 (here, by doing SVD):

(def a (dge 3 2 [1 2 3 -1 1 -1]))

(svd a)

#uncomplicate.neanderthal.internal.common.SVDecomposition{:sigma #RealDiagonalMatrix[double, type:gd mxn:2x2, offset:0]
   ▧                       ┓
   ↘       3.79    1.63
   ┗                       ┛
, :u nil, :vt nil, :master true}

There are 2 non-zero values there, so the rank is 2, which I wanted to check.

Next, I'll construct \(A^TA\):

(def ata (mm (trans a) a))

ata

#RealGEMatrix[double, mxn:2x2, layout:column, offset:0]
   ▥       ↓       ↓       ┓
   →      14.00   -2.00
   →      -2.00    3.00
   ┗                       ┛

I created ata as a general matrix, so we can both look at it and see that it is symmetric, as promised by math theory. Now I'll make it symmetric (in the Clojure data structure sense). Of course, in a real project, I'd skip those intermediate steps.

(def b (dge 3 1 [1 -1 -2]))
(def solution (sv (view-sy (mm (trans a) a)) (mm (trans a) b)))

solution

#RealGEMatrix[double, mxn:2x1, layout:column, offset:0]
   ▥       ↓       ┓
   →      -0.55
   →      -0.37
   ┗               ┛

This should be the LS solution, and we'll check later whether a more general method will give the same answer. Yes, there are more general methods, so what I've shown you until now is just a learning-oriented warm-up. Usually you do not need to be so creative, and can just go to the method X and say "Solve this for me!".

An interesting side note here is that solving these normal equations is connected to solving gradient equations. Which might be an interesting topic if you're interested in machine learning. I'll have to say more on that when I finish these more general topics in the Linear Algebra series.

When we find the LS solution \(x_{LS}\), we can also compute the minimum residual \(r_{LS} = b-Ax_{LS}\). It's norm will tell us how far the system we solved exactly is from the original system.

(axpy -1 (mm a solution) b)

#RealGEMatrix[double, mxn:3x1, layout:column, offset:0]
   ▥       ↓       ┓
   →       1.18
   →       0.47
   →      -0.71
   ┗               ┛

Minimum residual, the norm of the residual, seems quite high here. We'll check whether our solution is correct later.

(nrm2 (axpy -1 (mm a solution) b))

1.459992790176863

This method is quite OK for full-rank systems, but even then, it is sensitive to the presence of small singular values (recall the last article about Singular Value Decomposition).

Another method is to use normal equations, which includes Cholesky factorization, matrix multiplication and matrix-vector multiplication. I'll skip this, because there is a better and more general method.

These better methods usually rely on the QR factorization.

QR Factorization

The idea behind the QR factorization is similar to the idea behind the LU and Cholesky factorizations that we met in previous articles: destructure matrix \(A\) to a product of a few matrices that have special properties (symmetric, triangular, positive definite, etc.) and then use that destructured form in "easier" computations. In the QR, the trick is to find an orthogonal matrix \(Q\) and and upper triangular \(R\), such that \(A=QR\). Recall from the refresher series that \(Q\) is orthogonal if \(Q^{-1}=Q^T\).

There is a proven theorem (look it up in a math textbook if you're interested in the details :) that for \(A\in{R^{m\times{n}}}\), there are orthogonal \(Q\in{R^{m\times{n}}}\) and upper triangular \(R\in{R^{m\times{n}}}\) such that \(A=QR\), the QR factorization of \(A\).

Since Q is orthogonal, and orthogonal transformations preserve the rank, shape and distance, we can intuitively guess that such factorization have many properties useful in rank-sensitive operations. I'll plead with you to trust the theory and believe me that this works quite well without further convincing (if you want to know more, the textbooks are at your disposal).

Most of these techniques are based on partitioning the columns of into the form such as

\begin{equation} A=QR= [Q_1 | Q_2] \begin{bmatrix} R_1\\0\\ \end{bmatrix} \end{equation}

and then using \(Q_1 R_1\) further. What is important to me as a programmer is, mostly:

To keep in mind that the QR factorization is the key step for these kinds of problems
To know what can be computed from QR factorization, and which method does it.
To keep in mind that, while very powerful, QR-based algorithms are not almighty; there are cases when further options should be explored.

The complexity of various flavors varies, but boils down to \(O(m\times{n^2})\).

But how to solve the system with it?

From \(Ax=QRx=b\), we have \(Q^{T}Ax=Rx\) and

\begin{equation} Q^Tb= \begin{bmatrix} c\\d\\ \end{bmatrix} \end{equation}

so \(R_1x_{LS}=c\). Since \(c\) is readily computed by a matrix multiplication, \(R_1\) is upper triangular system, and and triangular systems are quite easy to solve, this is what we were looking for.

Here I call Neanderthal's sv method, which computes the factorization and maintains all crutial parts of it. Please note that the matrix :or and vector :tau together hold Q and R encoded. Using raw :or would produce gibberish. Other functions in Neanderthal expect exactly this raw structure and know how to work with it, so do not mess with its contents yourself.

(def qr (qrf a))

qr

#uncomplicate.neanderthal.internal.common.OrthogonalFactorization{:eng #object[uncomplicate.neanderthal.internal.host.mkl.DoubleGEEngine 0x4e780e3a "uncomplicate.neanderthal.internal.host.mkl.DoubleGEEngine@4e780e3a"], :or #RealGEMatrix[double, mxn:3x2, layout:column, offset:0]
   ▥       ↓       ↓       ┓
   →      -3.74    0.53
   →       0.42   -1.65
   →       0.63   -0.01
   ┗                       ┛
, :jpiv nil, :tau #RealBlockVector[double, n:2, offset: 0, stride:1]
[   1.27    2.00 ]
, :master true, :fresh #atom[true 0x4a43cc2b], :m 3, :n 3, :or-type :qr, :api-orm #function[uncomplicate.neanderthal.internal.api/eval6950/fn--7034/G--6891--7047], :api-org #function[uncomplicate.neanderthal.internal.api/eval6950/fn--7129/G--6923--7138]}

Did you know that even the mm! method can work with Q encoded in this form? It is important, since we need to compute \(Q^Tb\). Even transpose works with QR, although it is not a proper matrix itself:

(def cd (mm! 1 (trans qr) (copy b)))

cd

#RealGEMatrix[double, mxn:3x1, layout:column, offset:0]
   ▥       ↓       ┓
   →       1.87
   →       0.61
   →      -1.46
   ┗               ┛

\(R\) is just the upper triangular part of the qr instance that we have, so:

(let [r1 (view-tr (:or qr) {:uplo :upper})]
  (sv r1 (submatrix cd 0 0 2 1)))

#RealGEMatrix[double, mxn:2x1, layout:column, offset:0]
   ▥       ↓       ┓
   →      -0.55
   →      -0.37
   ┗               ┛

Yep, we got the same result using different method. Please note that the third component, \(d\) is equals to the one that we predicted using minimum residual \({\rho_{LS}= \lVert d \rVert}_2\) with the first method.

Least Squares in one quick sweep

Let's check with the function we could have used at the start, ls. Yes, there is a simple almighty method that makes all this fuss unnecessary. But, If I just showed you that, what would we have learned about the fine points of solving these kinds of systems and the many gotchas that await us there?

So:

(ls! (copy a) (copy b))

#RealGEMatrix[double, mxn:3x1, layout:column, offset:0]
   ▥       ↓       ┓
   →      -0.55
   →      -0.37
   →      -1.46
   ┗               ┛

Hey, all methods that we used gave the same solution! That's a good sign.

Let's see whether we can recover the original Q and try with it, just to further our understanding:

(let [q (org qr)
      r1 (view-tr (:or qr) {:uplo :upper})]
  (sv r1 (mm (trans q) b)))

#RealGEMatrix[double, mxn:2x1, layout:column, offset:0]
   ▥       ↓       ┓
   →      -0.55
   →      -0.37
   ┗               ┛

This is the same result indeed! This is good enough for me, today at least.

Please note that I copied all these instances in code, but for best performance, you'd probably use the destructive variants of those methods.

Next installments…

I'll probably take some time to write some more about problems connected to least squares, and their solutions. Or I'll skip right to eigen-problems. We'll see. Anyway, more Clojure & Neanderthal stuff follows soon.