Clojure Numerics, Part 3 - Special Linear Systems and Cholesky Factorization

September 18, 2017

In the last article we have learned to solve general linear systems, assuming that the matrix of coefficients is square, dense, and unstructured. We have also seen how computing the solution is much faster and easier when we know that the matrix is triangular. These are pretty general assumptions, so we are able to solve any well-defined system. We now explore how additional knowledge about the system can be applied to make it faster. The properties that we are looking for are symmetry, definiteness, and bandedness.

Before I continue, a few reminders:

The namespaces we'll use:

(require '[uncomplicate.neanderthal
           [native :refer [dge dsy dsb dsp]]
           [linalg :refer [trf trs]]])

Symmetry

Recall from the last article that a general square system is solved by first doing LU factorization to destructure the system to two triangular forms (L and U), and then solve those two triangular systems (\(Ly=b\), then \(Ux=y\)). To fight with inherent instability of floating-point computations (as a non-perfect approximation of "ideal" real numbers), the algorithm does pivoting and row interchange. It is not a burden only because of the additional stuff (pivots) to carry around, but also because it requires additional memory reading and writing (we remember that this is more expensive that mere FLOPS). That's why we would like to use computational shortcuts that do less pivoting, or no pivoting at all.

One such shortcut is that if \(A\) is symmetric and has an LU factorization, \(U\) is a row scaling of \(L^T\). More precisely, \(A=LDL^T\) where \(D\) is a diagonal matrix. Since the system is symmetrical, \(A=UDU^T\), too. From theoretical perspective, it might not make much difference, but in implementation it is important. If the symmetric matrix data is stored in the lower triangle, that triangle will be (re)reused for storing the factorization. Likewise for upper symmetric matrix. Those two are equal but are obviously not the same (= vs identical? in Clojure).

It is simpler to do with Neanderthal than it looks from the previous description. Practically, there is nothing required of you, but to create a symmetric matrix. When you call the usual factorization and solver functions, this will be taken care of automatically.

(let [a (dsy 3 [3
                5 3
                -2 2 0]
             {:layout :row :uplo :lower})
      fact (trf a)
      b (dge 3 2 [-1 4
                  0.5 0
                  2 -1]
             {:layout :row})]
  [a fact (trs fact b)])
'(#RealUploMatrix(double  type:sy  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       3.00    *       *
   →       5.00    3.00    *
   →      -2.00    2.00    0.00
   ┗                               ┛
 #uncomplicate.neanderthal.internal.common.LUFactorization(:lu #RealUploMatrix(double  type:sy  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       3.00    *       *
   →       5.00    3.00    *
   →       1.00   -1.00    4.00
   ┗                               ┛
  :ipiv #IntegerBlockVector(int  n:3  offset: 0  stride:1)(-2 -2 3)  :master true  :fresh #atom(true 0x6a306829)) #RealGEMatrix(double  mxn:3x2  layout:row  offset:0)
   ▤       ↓       ↓       ┓
   →      -0.53    0.50
   →       0.47    0.00
   →       0.88   -1.25
   ┗                       ┛
)

As you can see, it's all the same as for general dense matrices, with Neanderthal taking care to preserve the optimized symmetrical structure of :lu.

Positive definite systems and Cholesky factorization

The previous shortcut is cute, but nothing to write home about. Fortunately, there is a much better optimization available for a special subset of symmetric matrices - those that are positive definite.

A matrix \(A\in{R^{n\times{n}}}\) is positive definite if \(x^TAx>0\) for all nonzero \(x\in{R^n}\), positive semidefinite if \(x^TAx\geq{0}\), and positive indefinite if there are \(x_1,x_2\in{R^n}\) such that \((x_1^TAx_1)(x_2^TAx_2)<0\). Huh? So, is my system positive definite? How would I know that?

Before I show you that, let me tell you that the reason why symmetric positive definite matrices are handy is that for them, there is a special factorization available - Cholesky factorization - which preserves symmetry and definiteness. Now, it turns out that discovering whether a matrix is positive definite is not easy. That's why I will not try to explain here how to do that, nor it would help you in practical work. The important (and fortunate) thing is that you don't have to care; Neanderthal will determine that automatically, and return the Cholesky factorization if possible. If not, it will return the \(LDL^T\) (or \(UDU^T\))!

Let's see how to do this in Clojure:

(let [a (dsy 3 [1
                1 2
                1 2 3]
             {:layout :row :uplo :lower})
      fact (trf a)
      b (dge 3 2 [-1 4
                  0.5 0
                  2 -1]
             {:layout :row})]
  [a fact (trs fact b)])
'(#RealUploMatrix(double  type:sy  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       1.00    *       *
   →       1.00    2.00    *
   →       1.00    2.00    3.00
   ┗                               ┛
 #uncomplicate.neanderthal.internal.common.PivotlessLUFactorization(:lu #RealUploMatrix(double  type:sy  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       1.00    *       *
   →       1.00    1.00    *
   →       1.00    1.00    1.00
   ┗                               ┛
  :master true  :fresh #atom(true 0x8cf263e)) #RealGEMatrix(double  mxn:3x2  layout:row  offset:0)
   ▤       ↓       ↓       ┓
   →      -2.50    8.00
   →       0.00   -3.00
   →       1.50   -1.00
   ┗                       ┛
)

Notice how the code is virtually the same as in the previous example. The only thing that is different is the data. In this example, Neanderthal could do the Cholesky factorization, instead of more expensive LU with symmetric pivoting. Later it adapted the solver to use the available factorization for solving the linear equation, but everything went automatically.

In fact, Cholesky factorization is a variant of LU, just like \(LDL^T\) is. The difference is that L and U in Cholesky are \(G\) and \(G^T\). Notice: L is G, U is a transpose of G, and there is no need for the D in the middle. Also, no pivoting is necessary, which makes Cholesky quite nice; compact and efficient.

There are more subtle controls related to this in Neanderthal; look at the treasure trove of API documentation and tests. Writing these blog posts takes time and energy, and now I don't feel like taking too much time delving more into details related to this. :)

Of course, sv is also available, as well as destructive variants of the functions I've shown, and with them, too, you can rely on Neanderthal to select the appropriate algorithm automatically. I didn't show those here because that is used virtually in the same way as in the examples I've shown in previous articles. Let it be a nice easy exercise to try them on your own.

Banded Systems

Are there more optimizations? Sure! One of them is for banded systems. A matrix is banded when all of its zero elements are concentrated in a narrow (relative to the whole matrix) band around the diagonal. Of course, the implementation does not care how narrow: even completely dense matrix could be stored as banded in a sense that the band covers the whole matrix. However, for the implementation to have more performance instead of much less, the band should not be too obese. For example, if the matrix is in \(R^{100\times{100}}\) a band 5 diagonals wide is obviously exploitable, while the band 50 elements wide is probably not (but test that yourself for your use cases).

Now, the cool thing is that the stuff I've shown you for the general, triangular, and symmetric matrices also work with banded matrices:

  • The most desirable case is triangular banded matrix, since it does not need to be factorized at all
  • Then, if the matrix is symmetric banded, Neanderthal offers Cholesky if possible, with the LU fallback.
  • And, for general banded matrices, it does the banded LU.

The best of all, it's all automatic (I'm only showing the symmetric case here):

(let [a (dsb 9 3 [1.0 1.0 1.0 1.0
                  2.0 2.0 2.0 1.0
                  3.0 3.0 2.0 1.0
                  4.0 3.0 2.0 1.0
                  4.0 3.0 2.0 1.0
                  4.0 3.0 2.0 1.0
                  4.0 3.0 2.0
                  4.0 3.0
                  4.0])
      fact (trf a)
      b (dge 9 3 [4.0  0.0  1.0
                  8.0  0.0  1.0
                  12.0  0.0  0.0
                  16.0  0.0  1.0
                  16.0  0.0  0.0
                  16.0  0.0 -1.0
                  15.0  1.0  0.0
                  13.0  1.0 -2.0
                  10.0  2.0 -3.0]
             {:layout :row})]
  [a fact (trs fact b)])
'(#RealBandedMatrix(double  type:sb  mxn:9x9  layout:column  offset:0)
   ▥       ↓       ↓       ↓       ↓       ↓       ─
   ↘       1.00    2.00    3.00    4.00    4.00    ⋯
   ↘       1.00    2.00    3.00    3.00    3.00
   ↘       1.00    2.00    2.00    2.00    2.00
   ↘       1.00    1.00    1.00    1.00    1.00
   ┗                                               ┛
 #uncomplicate.neanderthal.internal.common.PivotlessLUFactorization(:lu #RealBandedMatrix(double  type:sb  mxn:9x9  layout:column  offset:0)
   ▥       ↓       ↓       ↓       ↓       ↓       ─
   ↘       1.00    1.00    1.00    1.00    1.00    ⋯
   ↘       1.00    1.00    1.00    1.00    1.00
   ↘       1.00    1.00    1.00    1.00    1.00
   ↘       1.00    1.00    1.00    1.00    1.00
   ┗                                               ┛
  :master true  :fresh #atom(true 0x39b524ea)) #RealGEMatrix(double  mxn:9x3  layout:column  offset:0)
   ▥       ↓       ↓       ↓       ┓
   →       1.00    1.00    1.00
   →       1.00   -1.00    0.00
   →       ⁙       ⁙       ⁙
   →       1.00   -1.00    0.00
   →       1.00    1.00   -1.00
   ┗                               ┛
)

I created a banded symmetric matrix a, with dimensions \(9\times 9\). The band width is the main diagonal and 3 sub-diagonals. When it comes to storage, it means than instead of storing all 81 elements, only \(9+8+7+6=30\) non-zero elements are stored (even though the band was not particularly thin). When Neanderthal prints the matrix, it prints the diagonals horizontally (to avoid printing a bunch of zero entries). To avoid confusion, notice how Neanderthal prints the ↘ symbol for the printed rows and ↓ for columns to indicate that (in the case of column-major matrices) diagonals are printed horizontally, and columns vertically. Also note how this particular system has been positive definite, and we get a nice Cholesky, which preserves the band!

Packed matrices

As a bonus, let me mention that Neanderthal supports packed dense storage, which can come handy when the memory is scarce. If we work with dense symmetric or triangular matrices that can not be compressed with band because they do not have many zero elements, we can still save half the space by only storing lower or upper half and not storing the upper/lower part that is not accessed.

(let [a (dsp 3 [1
                1 2
                1 2 3]
             {:layout :row :uplo :lower})
      fact (trf a)
      b (dge 3 2 [-1 4
                  0.5 0
                  2 -1]
             {:layout :row})]
  [a fact (trs fact b)])
'(#RealPackedMatrix(double  type:sp  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       1.00    .       .
   →       1.00    2.00    .
   →       1.00    2.00    3.00
   ┗                               ┛
 #uncomplicate.neanderthal.internal.common.PivotlessLUFactorization(:lu #RealPackedMatrix(double  type:sp  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       1.00    .       .
   →       1.00    1.00    .
   →       1.00    1.00    1.00
   ┗                               ┛
  :master true  :fresh #atom(true 0x65376e65)) #RealGEMatrix(double  mxn:3x2  layout:row  offset:0)
   ▤       ↓       ↓       ┓
   →      -2.50    8.00
   →       0.00   -3.00
   →       1.50   -1.00
   ┗                       ┛
)

Hey, this example is virtually the same as when we used dense symmetric matrix! That's right - Neanderthal can sort these kind of things without bothering you! Now, how cool is that? I don't know, but it is certainly very useful…

Use packed storage with caution, though: it saves only half the space, while many important operations such as matrix multiplication are noticeably slower than when working with straight dense triangular or symmetric matrices. Some operations can be faster, so YMMV, and experiment with the use cases that you are interested in.

(Tri)diagonal storage

Everything described by now can be compacted and exploited even more. As an exercise, please look at diagonal (gd), tridiagonal (gt), diagonally dominant tridiagonal (dt), and symmetric tridiagonal (st) matrices, and try them out with linear solvers. Yes, they are also supported…

What to expect next

We've taken a look at solving dense, banded and packed systems described with general rectangular matrices, symmetric matrices, and triangular matrices. Now we can quickly and easily solve almost any kind of linear system that has unique solution. Hey, but what if our system has less equations than there are unknowns, or if we have too many equations? Stay tuned, since I'll discuss these topics soon. Somewhere in the middle, I'll probably squeeze in a post where I'll explain the details of those storage and matrix types, and how to read these symbols when matrices are printed (although I think it is intuitive enough that you've probably already picked the most important details up).

Until then, create a learning playground project, include Neanderthal, and have a happy hacking day!

Clojure Numerics, Part 3 - Special Linear Systems and Cholesky Factorization - September 18, 2017 - Dragan Djuric