Hello World of Programming with Linear Algebra
Need help with your custom Clojure software? I'm open to (selected) contract work.May 13, 2021
Please share: Twitter.
These books fund my work! Please check them out.
Recent popularity of Machine Learning brought high performance computing on the radar of most programming. You've heard that math in general and linear algebra in particular is central in implementing and using this stuff. However, you might have a hard time connecting the linear algebra that you've learned in college (and probably forgot by now) to actual programming tasks where you'd use it now.
I hope that a simple Hello World example can at least give you a rough idea of how LA can be applied in programming. I won't dwell on the theory; let's see whether we can make this intuitive.
Just an ordinary domain model
Imagine this simplified code for inventory modeling. (When I say "simplified" i mean really simplified. We're using floats for prices (bad), we store the data in global state, the architecture is far from even a simple web application. But the model is familiar enough to a typical software developer.)
(def products {:banana {:price 1.3 :id :banana} :mango {:price 2.0 :id :mango} :pineapple {:price 1.9 :id :pineapple} :pears {:price 1.8 :id :pears}})
This (imaginary) application exists to track sales. A customer puts the desired products into a cart, we calculate the total price, and, later, perform the delivery. Each cart only stores the products' identifiers and quantities (in unspecified units; yes, it's super-simplified).
(def cart1 {:banana 10 :pineapple 7 :pears 3}) (def cart2 {:pineapple 3 :mango 9})
Having defined product and cart data, we write a function that,
given the products "database" and a cart, calculates the total price
of the products in the cart. The cart-price
function reduces all
[product quantity]
pairs in the cart
, by retrieving the appropriate
product map in the product-db
, and taking the value associated with
its :price
key. It multiplies that price with the quantity, and
accumulates it in total
.
(defn cart-price [product-db cart] (reduce (fn [total [product quantity]] (+ total (* (:price (product-db product)) quantity))) 0 cart))
Let's call this function with the available carts, and see it in action.
(cart-price products cart1)
=> 31.699999999999996
(cart-price products cart2)
=> 23.7
We hopefully have more than one order. Our code can easily process sequences of carts, and compute the total revenue.
(reduce + (map (partial cart-price products) [cart1 cart2]))
=> 55.39999999999999
It's all good; but what does it have to do with linear algebra?
A more general algorithm
In the previous implementation, we entangled the specifics of data storage and the algorithm that computes the total price. In this simple model, it's not much of a problem, but if the data model is more complex, and the algorithm not as simple as the straightforward map/reduce, this quickly leads to (at least) two problems:
- code becomes too complicated
- program performance degrades quickly
Let's first tackle the code complexity by extracting the computation logic from the domain into the abstract mathematical notion of vectors and operations on these vectors. In this particular example, vectors help us encapsulate a bunch of numbers as one atomic unit.
(def product-prices [1.3 2.0 1.9 1.8]) (def cart-vec-1 [10 0 7 3]) (def cart-vec-2 [0 9 3 0])
We recognize that the logic we've already developed for computing the total price matches a simple and well known mathematical operation, known as the dot product, a scalar product of two vectors.
(defn dot-product-vec [xs ys] (reduce + (map * xs ys)))
Given two vectors, [1 2 3]
and [4 5 6]
, the dot product computes
one number, a scalar, that represent a scalar product of these two vectors.
Right now, we don't even care about theoretical details of the dot
product; we recognize that it technically computes the same thing
that we need in our domain, and it seems useful. There are other ways to multiply vectors,
which return non-scalar structures.
(dot-product-vec [1 2 3] [4 5 6])
=> 32
We can see that, when applied to the vectors holding product prices and quantities, it returns the correct results that we've already seen.
(dot-product-vec product-prices cart-vec-1)
=> 31.699999999999996
(dot-product-vec product-prices cart-vec-2)
=> 23.7
Getting the total price requires another map/reduce, but we will quickly see that this, too, can be generalized.
(reduce + (map (partial dot-product-vec product-prices) [cart-vec-1 cart-vec-2]))
=> 55.39999999999999
A library of linear algebra operations
When we abstract away the specifics of the domain, we end up with a number of general operations that can be reused over and over, and combined into more complex, but still general, operations. Countless such operations have been studied and theoretically developed by various branches of mathematics and related applied disciplines for a long time. What's more, many have been implemented and optimized for popular hardware and software ecosystems, so our main task is to learn how to apply that vast resource to the specific domain problems.
Linear algebra is particularly well supported in implementations. Whenever we need to process arrays of numbers, it is likely that at least some part of this processing, if not all of it, can be described through vector, matrix, or tensor operations.
Vectors
Instead of developing our own naive implementations, we should reuse the well-defined data structures and functions provided by Neanderthal ().
Here we use vectors of double precision floating point numbers to represent products' prices and carts.
(def product-prices (dv [1.3 2.0 1.9 1.8])) (def cart-vctr-1 (dv [10 0 7 3])) (def cart-vctr-2 (dv [0 9 3 0]))
We use the general dot
function in the same way as the matching
function that we had implemented before.
(dot product-prices cart-vctr-1)
=> 31.7
(dot product-prices cart-vctr-2)
=> 23.7
Matrices
Once we start applying general operations, we can see new ways to improve our code, not so obvious at first.
Instead of maintaining sequences of vectors that represent carts, and coding custom functions to process these vectors, we can put that data into the rows of a matrix. All carts are now represented by one matrix, and each row of the matrix represents one cart.
(def carts (dge 2 4))
=> #RealGEMatrix[double, mxn:2x4, layout:column, offset:0] ▥ ↓ ↓ ↓ ↓ ┓ → 0.00 0.00 0.00 0.00 → 0.00 0.00 0.00 0.00 ┗ ┛
We could have populated the matrix manually, but, since we already have the data loaded in appropriate vectors, we can copy it, showing how these structures are related.
(copy! cart-vctr-1 (row carts 0))
=> #RealBlockVector[double, n:4, offset: 0, stride:2] [ 10.00 0.00 7.00 3.00 ]
(copy! cart-vctr-2 (row carts 1))
=> #RealBlockVector[double, n:4, offset: 1, stride:2] [ 0.00 9.00 3.00 0.00 ]
The following step is the usual opportunity for a novice to slip.
Should we now iterate the rows of our newly created matrix, calling
dot products on each row? No! We should recognize that the equivalent
operation already exists: matrix-vector multiplication, implemented
by the mv
function!
Most functions in this domain have
short names that might sound cryptic
until you get used to it. There is a
method to their naming, though, and
they are usually very descriptive
mnemonics. For example, mv
stands for
Matrix-Vector multiplication. You'll
guess that mm
is Matrix-Matrix multiplication
and so on. Like in mathematical
formulas, this naming makes for code
that can be viewed in a contained place
that can be grasped in one view.
(mv carts product-prices)
=> #RealBlockVector[double, n:2, offset: 0, stride:1] [ 31.70 23.70 ]
(asum (mv carts product-prices))
=> 55.4
Not only that the mv
operation is equivalent to multiple calls to
dot
, but it takes advantage of the structure of the matrix, and optimizes
the computation to the available hardware. This achieves much
better performance, which can compound to orders of magnitude in
improvements.
These improvements materialize in more serious examples. Any implementation of a small toy problem works fast.
…and more
Let's introduce a bit more complication. Say that we want to support
different discounts for each product, in the form of multipliers.
That gets us the price reductions, that we should subtract from the
price. An alternative way, shown in the following snippets is to
subtract the discount coefficients from 1.0
to get the direct multiplier
that gets us to the reduced price.
(def discounts (dv [0.07 0 0.33 0.25])) (def ones (entry! (dv 4) 1))
We can subtract two vectors by the axpy
function. axpy
stands for "scalar a times x plus y".
(axpy -1 discounts ones)
=> #RealBlockVector[double, n:4, offset: 0, stride:1] [ 0.93 1.00 0.67 0.75 ]
The mul
function multiplies its vector, matrix, or tensor arguments
element-wise, entry by entry.
(mul (axpy -1 discounts ones) product-prices)
=> #RealBlockVector[double, n:4, offset: 0, stride:1] [ 1.21 2.00 1.27 1.35 ]
The following code seamlessly incorporates this new part of the algorithm into the implementation that we already have.
(asum (mv carts (mul (axpy -1 discounts ones) product-prices)))
=> 46.87
Why stop here? Suppose that we want to simulate the effects of multiple discount combinations on the total price. As earlier, we put all these hypothetical discount vectors into a matrix, in this case, three samples that we'd like to investigate.
(def discount-mat (dge 4 3 [0.07 0 0.33 0.25 0.05 0.30 0 0.1 0 0 0.20 0.40]))
=> #RealGEMatrix[double, mxn:4x3, layout:column, offset:0] ▥ ↓ ↓ ↓ ┓ → 0.07 0.05 0.00 → 0.00 0.30 0.00 → 0.33 0.00 0.20 → 0.25 0.10 0.40 ┗ ┛
We have to subtract these numbers from 1.0
. Instead of populating
the matrix with 1.0
, we will demonstrate the outer product operation,
implemented by the function rk
. Given two vectors, it produces a matrix
that holds all combinations of the product of the entries of the vectors.
(rk ones (subvector ones 0 3))
=> #RealGEMatrix[double, mxn:4x3, layout:column, offset:0] ▥ ↓ ↓ ↓ ┓ → 1.00 1.00 1.00 → 1.00 1.00 1.00 → 1.00 1.00 1.00 → 1.00 1.00 1.00 ┗ ┛
We can also utilize rk
to "lift" product prices vector to a matrix
whose shape matches the shape of the discount combinations matrix.
(def ones-3 (subvector ones 0 3)) (def discounted-prices (mul (axpy -1 discount-mat (rk ones ones-3)) (rk product-prices ones-3)))
The result is a matrix of hypothetical discount prices that we'd like to simulate.
=> #RealGEMatrix[double, mxn:4x3, layout:column, offset:0] ▥ ↓ ↓ ↓ ┓ → 1.21 1.23 1.30 → 2.00 1.40 2.00 → 1.27 1.90 1.52 → 1.35 1.62 1.08 ┗ ┛
Now, the most interesting part: how do we calculate the totals from this matrix and the matrix of carts we've produced earlier. (Not) surprisingly, just a single operation, matrix multiplication, completes this task!
(mm carts discounted-prices)
=> #RealGEMatrix[double, mxn:2x3, layout:column, offset:0] ▥ ↓ ↓ ↓ ┓ → 25.05 30.51 26.88 → 21.82 18.30 22.56 ┗ ┛
Now we only need to sum the columns up to get the three final totals.
We won't do this column-by-column. Instead, we'll use the "mv
with
ones" approach we've already encountered. Note that we need to transpose
the matrix to match the desired structure.
(trans (mm carts discounted-prices))
=> #RealGEMatrix[double, mxn:3x2, layout:row, offset:0] ▤ ↓ ↓ ┓ → 25.05 21.82 → 30.51 18.30 → 26.88 22.56 ┗ ┛
And, the final answer is…
(mv (trans (mm carts discounted-prices)) (subvector ones 0 2))
=> #RealBlockVector[double, n:3, offset: 0, stride:1] [ 46.87 48.81 49.44 ]
Given three (or three million) possible discount combinations, we get a vector of the total revenue amounts. Of course, being a toy example, this code doesn't take into account that lower prices would (likely) induce more sales; let's not carry a Hello World example too far.
So, the first major benefit of using a library, such as Neanderthal (), based on many decades of numerical computing research and development is that we have access to a large treasure trove of useful, well thought, general functions for developing useful, general or customized, number processing algorithms.
Another major improvement is performance. Although toy examples may be implemented in any way you'd like, and they'd still work reasonably well, real-world data processing almost always involves either many data points, or many computation steps, or, often – both.
I'm not talking about a couple dozen percentages, but improvements of many orders of magnitude. But that's a story that has to be looked at in more depth. I've written two books where I go into much, much, more depth and width on this. You can also check some earlier articles on this blog; there's many examples that demonstrate how fast this approach is!