# Clojure Numerics, Part 2 - General Linear Systems and LU Factorization

September 7, 2017

Solving systems of linear equations is a staple food of linear algebra. It can be applied as a part of many machine learning tasks, although it is not always obvious to spot the opportunity. Here, we explore how triangular systems are the foundation that we need to internalize well. We concentrate on computational details, and transformations of general systems to triangular systems . Neanderthal offers many functions to help us in this quest.

Before I continue, a few reminders:

- You should keep a linear algebra textbook around. I recommend Linear Algebra With Applications, Alternate Edition by Gareth Williams (see more in part 1 of Clojure Linear Algebra Refresher).
- Include Neanderthal library in your project to be able to use ready-made high-performance linear algebra functions.
- Read my Clojure Linear Algebra Refresher series.

The namespaces we'll use:

(require '[uncomplicate.neanderthal [native :refer [dv dge fge dtr]] [core :refer [col nrm1 copy nrm1 view-tr]] [linalg :refer [sv! sv trf tri tri! trs trs! det con]] [aux :refer [permute-rows!]]] '[uncomplicate.commons.core :refer [with-release]] '[uncomplicate.clojurecl.core :as clojurecl])

## General Linear Systems

I might have forgot many things that I learned in school, but I remember how to intuitively solve simple systems of equations involving a handful of those \(x\), \(y\), and \(z\) using Gaussian elimination. If it rings a bell, but not particularly loudly, I recommend visiting that Wikipedia link for a quick refresher. If you are wondering what the hell I'm talking about, read the first chapter of Linear Algebra With Applications before you continue.

So, we start with a number of equations, let's say 3 equations, and a (hopefully) matching number of unknowns, and combine those equations in such a way that we eliminate some unknowns from each equation, until in one equation we are left with just one unknown, in the second with two unknowns, and in the third three. The first equation is then trivial to solve. We substitute the solution in the second, and find the second unknown, and do the same for the third. When done in school with pen and paper, it looked like a logical but tedious exercise in finding ways to combine those equations cleverly. Later, we learned that the system can be described by a matrix, and the same process can be described more formally as a series of operations on the rows of the matrix.

Let's repeat this simple example from an earlier blog post, Clojure Linear Algebra Refresher (3) - Matrix Transformations:

(let [a (dge 3 3 [1 0 1 1 -1 1 3 1 4]) y (dge 3 1 [11 -2 9]) b (copy y)] (sv! a b) [a y b])

'(#RealGEMatrix(double mxn:3x3 layout:column offset:0) ▥ ↓ ↓ ↓ ┓ → 1.00 1.00 3.00 → 0.00 -1.00 1.00 → 1.00 -0.00 1.00 ┗ ┛ #RealGEMatrix(double mxn:3x1 layout:column offset:0) ▥ ↓ ┓ → 11.00 → -2.00 → 9.00 ┗ ┛ #RealGEMatrix(double mxn:3x1 layout:column offset:0) ▥ ↓ ┓ → 17.00 → -0.00 → -2.00 ┗ ┛ )

We could conclude that a general system can be easily described by matrices, and solved using the sv! function, and happily proceed punching in our fantastic Clojure code. However, numerical computing is full of traps for the careless programmer, rather than an idealized mathematical magic box. If it could compute idealized formulas from the book, this would be the last post in this series. Since it only approximately computes these numbers with floating point numbers, we'll have to learn how to set up the computations according to the problem, so that we minimize the accumulation of errors. Additionally, since those computations can stretch for a long wall clock time, we'll have to learn to exploit the structure of the problem at hand (if there is any to be exploited) and use the appropriate structures and algorithms to complete the task as fast as possible.

## Triangular Systems

When it comes to complexity, the happiest problem to have at hand is when I can describe it by a triangular matrix.

Like this one, for example:

\begin{equation} \begin{bmatrix} 2 & 0 & 0\\ 1 & 5 & 0\\ 3 & 2 & 1\\ \end{bmatrix} \begin{bmatrix}x_1\\x_2\\x_3\\\end{bmatrix} = \begin{bmatrix}6\\3\\4\\\end{bmatrix} \end{equation}
We can see that we don't even have to do any of the procedures learned at school to transform the equations
into a convenient form; it's already pretty convenient! We can immediately find \(x_1=3\), then substitute it
and find \(x_2\), then substitute it and find \(x_3\). The first lesson, thus, is that, when we *know* that our
problem can be described by a triangular matrix, we should try triangular matrices (dense tr, banded tb, or packed tp)
instead the default (and the most versatile) general dense matrix ge.

(let [a (dtr 3 [2 1 3 5 2 1]) b (dge 3 1 [6 3 4])] [a b (sv a b)])

'(#RealUploMatrix(double type:tr mxn:3x3 layout:column offset:0) ▥ ↓ ↓ ↓ ┓ → 2.00 * * → 1.00 5.00 * → 3.00 2.00 1.00 ┗ ┛ #RealGEMatrix(double mxn:3x1 layout:column offset:0) ▥ ↓ ┓ → 6.00 → 3.00 → 4.00 ┗ ┛ #RealGEMatrix(double mxn:3x1 layout:column offset:0) ▥ ↓ ┓ → 3.00 → 0.00 → -5.00 ┗ ┛ )

At the first glance, this example is the same as the previous, which used a general dense matrix. And it is, with a distinction that Neanderthal will choose a much faster algorithm for the triangular system than for the general system. For such a small dimension as this (3x3) it does not matter, but with just a bit more complex systems it matters in two important ways:

- Solving triangular systems is quite efficient, and does not require any factorizations (to make the system triangular), so it can be noticeably faster.
- By not requiring factorizations, it leaves less room for rounding errors and other pitfalls of working with floating point numbers.

In the last example, I've used the pure sv function, rather than the destructive sv!, just to show different possibilities. You should choose which one to use in your code based on the problem at hand, of course.

## Multiple linear systems

Often, the problem consists of applying the same operation when one of the parameters is the same. For example, I can have many linear systems at hand, that can be described with the same matrix on the left-hand side, but have different right-hand sides. The first thought of most programmers might be "I'll use loop/map/doall", but we can do even better: the sv and sv! functions could solve all these systems in one call! That's how Neanderthal usually works: if there is something that works with a vector, there is a similar thing that works with a matrix (whose columns or rows are those vectors that you could loop over but should avoid to do it).

In this case:

(let [a (dtr 3 [2 1 3 5 2 1]) b (dge 3 4 [6 3 4 7 8 5 -3 -1 2 0 9 -1])] [a b (sv a b)])

'(#RealUploMatrix(double type:tr mxn:3x3 layout:column offset:0) ▥ ↓ ↓ ↓ ┓ → 2.00 * * → 1.00 5.00 * → 3.00 2.00 1.00 ┗ ┛ #RealGEMatrix(double mxn:3x4 layout:column offset:0) ▥ ↓ ↓ ↓ ↓ ┓ → 6.00 7.00 -3.00 0.00 → 3.00 8.00 -1.00 9.00 → 4.00 5.00 2.00 -1.00 ┗ ┛ #RealGEMatrix(double mxn:3x4 layout:column offset:0) ▥ ↓ ↓ ↓ ↓ ┓ → 3.00 3.50 -1.50 0.00 → 0.00 0.90 0.10 1.80 → -5.00 -7.30 6.30 -4.60 ┗ ┛ )

Four systems solved in one quick sweep!

Functions that deal with matrices can be broadly classified as Level 1, Level 2, and Level 3. Level 1 indicates \(O(n)\) complexity, Level 2 - \(O(n^2)\), and Level 3 - \(O(n^3)\) complexity. While a problem that can be solved by Level 1 is obviously the most desirable problem to have, Level 3 is a level where most flops can be used optimally and most gain can be get compared to a do-it-yourself code. In general, it is considerably faster to call one Level 3 function, than to call Level 2 function \(n\) times.

Solving a *triangular* system is a Level 2 function. Solving multiple systems moves to Level 3.

I'd take this as a valuable lesson (was it lesson 2?).

## LU Factorization

When we have a triangular system, we can solve it easily in \(O(n^2)\). What do we do if our system is
not triangular? Well, obviously, call sv. By going the easy route, we'd leave a lot of potential on the table.
Here's why: remember the Gaussian elimination from the start of the article? We first do some procedure on
our system to make it triangular, and then another to find a solution. What if we can keep
the result of that first procedure, and *reuse* it many times later? That's exactly what LU factorization is for.

LU factorization is practically the triangularization of a general matrix into a couple of triangular matrices,
that can later used in its place for solving linear systems *and other useful things*. It is an algorithmic
description of Gaussian elimination.

Look at this example (\(A=LU\)):

\begin{equation} \begin{bmatrix} 3 & 5\\ 6 & 7\\ \end{bmatrix}= \begin{bmatrix} 1 & 0\\ 2 & 1\\ \end{bmatrix} \begin{bmatrix} 3 & 5\\ 0 & -3\\ \end{bmatrix} \end{equation}
A general matrix has been described as a product of two triangular matrices: **L**, the *lower* triangle,
and **U**, the *upper* triangle.

How do we now proceed to use them? Our simple triangular system has only one triangle, what do we do with two?

The original system is \(Ax=b\). We proceed as follows: we decide that \(y=Ux\) and then \(Ax=LUx=Ly=b\). First we solve the system \(Ly=b\), then \(Ux=y\), and we have our solution. Once we have triangularized the original general system, we can solve it by solving two triangular systems!

Lesson number 3? Reuse the results as much as you can, since they are so expensive to compute.

LU factorization seems quite straightforward once it's been laid out, but I've seen so many people trying to
solve this problem as described in math textbooks: they see the formula \(Ax=b\), (correctly) conclude
that \(x=A^{-1}y\), and (correctly but unwisely) start the quest of computing the inverse matrix.
The inverse matrix is something that is *really hard to compute*, not only because it requires lots
of FLOPs, but even more because there is a large possibility that the result will be rather
imprecise.

**Computing \(A^{-1}\) is seldom necessary!** The right approach is to solve the related system of linear equations.

If the determinant of \(A\) is not 0, LU factorization exists. When \(A\) is non-singular, then the factorization is unique.

On with our simple example, but in Clojure. We do a LU factorization by calling the function trf!, or its
pure cousin trf. TRF stands for *TRiangular Factorization*.

(let [a (dge 3 3 [1 0 1 1 -1 1 3 1 4]) lu (trf a)] [(:lu lu) (view-tr (:lu lu) {:uplo :lower :diag :unit}) (view-tr (:lu lu) {:uplo :upper})])

'(#RealGEMatrix(double mxn:3x3 layout:column offset:0) ▥ ↓ ↓ ↓ ┓ → 1.00 1.00 3.00 → 0.00 -1.00 1.00 → 1.00 -0.00 1.00 ┗ ┛ #RealUploMatrix(double type:tr mxn:3x3 layout:column offset:0) ▥ ↓ ↓ ↓ ┓ → ·1· * * → 0.00 ·1· * → 1.00 -0.00 ·1· ┗ ┛ #RealUploMatrix(double type:tr mxn:3x3 layout:column offset:0) ▥ ↓ ↓ ↓ ┓ → 1.00 1.00 3.00 → * -1.00 1.00 → * * 1.00 ┗ ┛ )

We called trf, and got a record that contains (among other things that we'll look at later) a general matrix
accessible through the `:lu`

key. The `:lu`

general matrix contains *both* L and U. The reason for
that is that they perfectly fit together and can be tightly packed in memory and used efficiently in
further computations. Since we are never particularly interested in L and U themselves, but in the
results that we can get by supplying them to the other procedures, separating them at this point would
be both inefficient and necessary.

Maybe Lesson number 4 at this point? What looks more elegant at first glance (returning neatly separated
L and U so we can look at them and worship them in all their glory) is sometimes exactly the naive
and *wrong* choice.

But anyway, if we really need to (or just like to because of their artistic appeal) see L and U, Neanderthal offers
an easy way to do this: just take a view of `:upper`

or `:lower`

triangle of that general matrix, by
calling the view-tr function. Just for your information, you can also take different `view-XX`

of most
matrix structures in Neanderthal.

I know (by learning it from a numerical math textbook) that L is a lower triangular *unit* matrix
that have 1 on the diagonal, and U is a upper triangular. If L weren't unit-diagonal, L and U couldn't
have been so neatly packed into a general matrix. A nice and useful coincidence :)

The key difference to sv, again, is that we can *reuse* the LU to not only solve multiple systems, but to
compute the condition number (con), or even the notorious inverse matrix (tri; again, we rarely need the inverse).

Let's do that:

(let [a (dge 3 3 [1 0 1 1 -1 1 3 1 4]) b (dge 3 5 [11 -2 9 6 3 4 7 8 5 -3 -1 2 0 9 -1]) lu (trf a)] [(con lu (nrm1 a)) (det lu) (trs! lu b) (tri! lu)])

'(0.017857142857142856 1.0 #RealGEMatrix(double mxn:3x5 layout:column offset:0) ▥ ↓ ↓ ↓ ↓ ↓ ┓ → 17.00 17.00 23.00 -24.00 13.00 → -0.00 -5.00 -10.00 6.00 -10.00 → -2.00 -2.00 -2.00 5.00 -1.00 ┗ ┛ #RealGEMatrix(double mxn:3x3 layout:column offset:0) ▥ ↓ ↓ ↓ ┓ → 5.00 1.00 -4.00 → -1.00 -1.00 1.00 → -1.00 0.00 1.00 ┗ ┛ )

That LU was reused to compute the reverse condition number, and the determinant, and to solve five systems of linear equations, and, finally, to compute the inverse matrix. I find that quite neat.

## Pivoting

Solving things, in the sense of finding \(x\) that finely fits the numbers that you already have in the formula is usually clean and elegant on the page of a math textbook. However, for anything but very simple problems, the real world strikes mercilessly. The algorithms start getting slower much faster than the dimension increases. One remedy for that are various clever shortcuts, but that is something that often falls under the fields such as machine learning, so I'll skip that here. Another approach that works to a certain degree (and that machine learning has to build on top) is optimizing the code tightly for the hardware (Neanderthal helps with this effectively).

The other important issue from the real world is that the computer does not work with real numbers, but
an approximation of real numbers implemented as floating point. The implication is, of course, that
there are many pitfalls that can make results completely wrong. Fortunately, many of those pitfalls are
well researched by the field of numerical analysis, and applied in well written software, so the error
can be bounded and minimized, for most real world use cases. I'm reminding you of this just to make you aware;
although many algorithms that look relatively easy to implement on the surface, you should not try
to implement them yourself unless *you really know what you're doing*.

One of those pitfalls can be seen in *ill-conditioned* systems - the systems whose computed solutions
swing wildly when some of the parameters change only a little bit. The main reason for this is that,
due to limited precision of floating-point numbers, operations with one very large number with one very small
number might not have enough precision to take into account the smaller one. But, even when the system
is not ill-conditioned, vanilla Gaussian elimination is unstable, and can give poor results when a pivot in
the elimination is *small*.

This is indicated by very large entries in triangular factors. The solution is to exchange rows during the
elimination - *pivoting*. Then, the computed factorization is \(PA=LU\).

Fortunately for you, Neanderthal's functions take this into account, and manage pivots for you.
You only have to be aware of the existence of pivoting, because the LU matrix that the procedure returns
has its rows interchanged! You'll also get access to pivot index (:ipiv key). All other procedures
accept the *pivoted* LU. You do not need to convert anything, and in the case that you'd like to
look at L and U, you can use permute-rows! or swap-rows!to recover the desired matrix.

You can check whether the matrix is well-conditioned by computing the *inverse* condition number. Small
positive inverse condition number (thus, large condition number) indicates a well-conditioned matrix,
while a relatively large inverse is ill-conditioned.

## What's next

We've taken a look at solving general rectangular matrices and triangular matrices. That is quite useful. However, there are some more specialized methods that are faster or more robust when we know additional details about the matrix at hand. The system might be symmetric, or positive definite, etc. We'll look at this in the next article.