Clojure Numerics, Part 4 - Singular Value Decomposition (SVD)
Need help with your custom Clojure software? I'm open to (selected) contract work.October 4, 2017
Please share: Twitter.
These books fund my work! Please check them out.
Today's article is a short one. Not because Singular Value Decomposition (SVD) is not important, but because it is so ubiquitous that we'll touch it in other articles where appropriate. The goal of this article is to give an overview and point to Neanderthal's functions that work with it. When you see SVD in the literature you read, you'll know where to look for the implementation; that's the idea.
Before I continue, a few reminders:
- Include Neanderthal library in your project to be able to use ready-made high-performance linear algebra functions.
- Read articles in the introductory Clojure Linear Algebra Refresher series.
- This is the fourth article in a more advanced series. It is a good idea to read them all in sequence if this is your first contact with this kind of software.
The namespaces we'll use:
(require '[uncomplicate.neanderthal
[native :refer [dge dgb dgd]]
[core :refer [mm nrm2]]
[linalg :refer [svd svd!]]])
Motivation
The main idea of many of the methods I've already mentioned and the one we've yet to cover in this tutorial is to decompose a matrix into a product of a few matrices with special properties, and then exploit these properties to get what we want. Not all decompositions, aka factorizations, are the same; some give more benefits than others. On the other hand, they may be more difficult to compute, or require yet another set of properties from the original matrix. Recall that for random matrices, we have to do LU decomposition with pivoting, but for the positive definite, pivoting is not needed, while for thex triangular, no decomposition is needed.
As diagonal matrices are very convenient, and computationally efficient, it makes sense to have a method to decompose a matrix into some diagonal form. One such decomposition, a very powerful one, is the Singular Value Decomposition (SVD). It is relatively expensive to compute, but once we have it, it can be used to get answers for some difficult cases when other decompositions can not help us.
The Definition of Singular Value Decomposition
If \(A\) is a matrix in \(R^{m \times n}\), then there exist orthogonal matrices \(U\in{R^{m \times m}}\) and \(V\in{R^{n \times n}}\) and a diagonal matrix \(\Sigma\) such that \(U^TAV=\Sigma=diag(\sigma_1,\dots,\sigma_p) \in R^{m\times{n}}, p=min(m,n)\), where \(\sigma_1\geq \sigma_2 \geq \dots \geq \sigma_p \geq 0\).
Here, the columns of \(U\) are the left singular vectors.
\begin{equation} U= \begin{bmatrix} u_1 & | \dots | & U_m\\ \end{bmatrix} \in R^{m \times m} \end{equation}The columns of \(V\) are the right singular vectors.
\begin{equation} V= \begin{bmatrix} v_1 & | \dots | & v_n\\ \end{bmatrix} \in R^{n \times n} \end{equation}The elements of \(\Sigma\) are the singular values of \(A\).
Here's a simple example in Clojure code:
(let [a (dge 4 3 [1 2 3 4 -1 -2 -1 -1 4 3 2 1])]
(svd a true true))
#uncomplicate.neanderthal.internal.common.SVDecomposition{:sigma #RealDiagonalMatrix[double, type:gd mxn:3x3, offset:0] ▧ ┓ ↘ 7.51 3.17 0.79 ┗ ┛ , :u #RealGEMatrix[double, mxn:4x3, layout:column, offset:0] ▥ ↓ ↓ ↓ ┓ → -0.49 0.66 0.51 → -0.53 0.23 -0.81 → -0.49 -0.23 0.25 → -0.49 -0.68 0.13 ┗ ┛ , :vt #RealGEMatrix[double, mxn:3x3, layout:column, offset:0] ▥ ↓ ↓ ↓ ┓ → -0.66 0.34 -0.67 → -0.73 -0.06 0.69 → 0.19 0.94 0.29 ┗ ┛ , :master true}
Let's check whether it is true that \(U^TAV=\Sigma\) in this example.
(let [a (dge 4 3 [1 2 3 4 -1 -2 -1 -1 4 3 2 1])
usv (svd a true true)
u (:u usv)
vt (:vt usv)
sigma (:sigma usv)]
(mm u sigma vt))
#RealGEMatrix[double, mxn:4x3, layout:column, offset:0] ▥ ↓ ↓ ↓ ┓ → 1.00 -1.00 4.00 → 2.00 -2.00 3.00 → 3.00 -1.00 2.00 → 4.00 -1.00 1.00 ┗ ┛
Hurrah, it is equal to \(A\)! But, the original definition didn't say \(A=U\Sigma V^T\), it said \(U^TAV=\Sigma\)? It's just a matter of a bit of arithmetic. These equations are equivalent.
Properties of the SVD
\(Av_i=\sigma_i u_i\) and \(A^Tu_i=\sigma_i v_i\)
This is obvious from the definition, an it can be seen in the previous example, too.
\({\lVert A\rVert}_2 = \sigma_1\) and \({\lVert \sqrt{\sigma_1^2 + \dots \sigma_p^2 }\rVert}_2 = \sigma_1\)
If you're interested in the proof, look for it in a textbook. Here, we'll just see an example:
(let [a (dge 4 3 [1 2 3 4 -1 -2 -1 -1 4 3 2 1]) ]
[(nrm2 (:sigma (svd a)))
(nrm2 a)])
8.185352771872449 | 8.18535277187245 |
Rank, range, and null space
If \(A\) has \(r\) positive singular values, then \(rank(A) = r\), \(null(A) = span\{v_{r+1},\dots,v_n\}\), and \(ran(A) = span\{u_1,\dots,v_r\}\).
And so on. There are more interesting properties like these. I hope that with the help of this guide, you can easily find out how to use them in Clojure when you learn them from the textbooks.
Subspaces
Singular vectors and values are very useful in computations of subspaces (see Clojure Linear Algebra Refresher).
A convenient property of \(U\) and \(V\) is that they are orthogonal, and orthogonal transformations do not deform space.
If
\begin{equation} U= \begin{bmatrix} U_r & | \tilde{U}_{m-r}\\ \end{bmatrix} ,\text{and} V= \begin{bmatrix} V_r & | \tilde{V}_{n-r}\\ \end{bmatrix} \end{equation}Here are a few useful facts:
- \(V_r V_r^T\) = projection on \(null(A)^{\perp} = ran(A^T)\),
- \(\tilde{V}_r \tilde{V}_r^T\) = projection on $null(A),
- \(U_r U_r^T\) = projection on \(ran(A)\),
- \(\tilde{U}_r \tilde{U}_r^T\) = projection on ran(A)⊥ = null(AT),
Sensitivity of Square Linear Systems
There's a very important way in which SVD can help us with Linear Systems.
We start from \(Ax=b\), and then have \(x = A^{-1}b=(U \Sigma V^T)^{-1}b=\sum_{i=1}^n{\dfrac{u_i^t b}{\sigma_i}}\)
Now, if \(\sigma_i\) is small enough, small change in \(A\) or \(b\) will cause large changes in \(X\), and the system becomes unstable.
One of the properties of the SVD that we skipped in the last section is that the smallest singular value is the 2-norm distance of a to the set of rank-deficient, singular, matrices. Closer the matrix is to that set, the system it describes becomes sensitive to small changes (caused by imprecision in floating point operations).
It boils down to this: if no other method can not show us reliably whether a system is ill-conditioned, SVD usually can. On the other hand, SVD is usually more computationally expensive than other factorizations, so we should consider that too and prefer lighter methods when possible.
A precise measure of linear system sensitivity can be obtained by looking at the condition number. We've already dealt with that - we can get the condition number from triangular factorizations (see previous articles).
A word or two about determinants should be said here. I've learned in school to find the determinant when I want to check whether a system is solvable. If \(det(A)=0\), the system is singular. But, if the determinant is close to zero? Is \(A\) near singular? Unfortunately, we cannot say. Determinant is poor tool for this, and in computational linear algebra, as I've shown you, we usually have much better tools for that.
And next…
Orthogonalization seems to be a quite powerful factorization method. We'll explore it in the next sessions.