Map and Reduce Primitive Arrays Without Clojure Macros
Need help with your custom Clojure software? I'm open to (selected) contract work.July 5, 2018
Please share: Twitter.
These books fund my work! Please check them out.
I've just released a new version of Fluokitten (),
a library that enables "monadic" programming in Clojure. It sounds abstract, but it is really the opposite.
I'll show you how you can use its fmap and fold functions as universal replacement for Clojure's map
and reduce
.
Not only that fmap and fold offer more functionality than map
and reduce
, they even support primitive
Java arrays, and are as fast as Clojure's macros amap
and areduce
. There are much more cool functions
there - be sure to check documentation and articles at Flukitten's website.
What map
and reduce
lack
Clojure's map
and reduce
work with anything sequential, so they work with arrays too. However,
the problems are:
- They turn everything into sequences and return sequence.
- They turn elements into objects. It impact performance of primitive arrays an order of magnitude.
- They do not support in-place, mutable changes of source Java array.
- They work only with sequential things.
Fluokitten's elegance at full speed!
I'll just show you Fluokitten's map
and reduce
equivalents, although it has much more to offer.
fmap is, roughly, a polymorphic alternative to map
. Instead of converting data to a specific type,
it calls specialized implementation for each data type. fmap! is a destructive, in-place version of fmap. fold is
a polymorphic alternative to reduce
. foldmap is a combination of map and reduce, that first maps over
source arguments, and then reduces the results of mapping in the same pass.
In contrast to map
and reduce
, they:
- They keep the type of the object/container they work on.
- They can work with primitives without boxing.
- fmap! supports in-place, mutable changes (if applicable for the object's type).
- They work with any Clojure or Java object!
What's wrong with amap
and areduce
macros
Let's demonstrate problems with map
, reduce
, amap
, and areduce
.
(require '[uncomplicate.commons.core :refer [double-fn]] '[uncomplicate.fluokitten.core :refer [fmap! fmap fold foldmap]])
(def arr (double-array [1 2 3]))
map
returns a sequence:
(map inc arr)
2.0 | 3.0 | 4.0 |
The array is unchanged. This is normally desired in Clojure, but primitive arrays are explicitly built for mutability and speed, and sometimes we need to mutate them.
(get arr 0)
1.0
As for the speed, let's measure mapping and reduction of an array of 100.000 elements with criterium. The canonical example is dot product:
\begin{gather*} \vec{x} = [x_1, x_2,\ldots, x_n]\\ \vec{y} = [y_1, y_2,\ldots, y_n]\\ \vec{x} \cdot \vec{y} = \sum_{i=1}^n x_i y_i \end{gather*}So, create two large arrays and create a function for dot product:
(def arr-a (double-array (range 100000))) (def arr-b (double-array (range 100000))) (defn dot-seq [xs ys] (reduce + (map * xs ys)))
;;(quick-bench (dot-seq arr-a arr-b)) (dot-seq arr-a arr-b)
3.3332833335E14
Execution time mean : 10.074604 ms
It's not that fast…
Rich Hickey knew that this is a problem, and he created macros amap
and areduce
to help us with that:
(defn dot-areduce [xs ys] (areduce ^doubles xs idx res 0.0 (+ res (* (aget ^doubles xs idx) (aget ^doubles ys idx)))))
;;(quick-bench (dot-areduce arr-a arr-b)) (dot-areduce arr-a arr-b)
3.3332833335E14
100 times faster than the seq way!
Execution time mean : 94.521934 µs
Much faster, but more verbose. To avoid reflection penalty, we had to sprinkle the code with the ^doubles
typehints.
This means that our function can work only with doubles, and we will have to create a separate version for
floats, longs, ints etc. areduce
and map
had to be macros, since working with primitives is explicit.
Even in Java, there are special methods for each type of primitive arrays: getFloat
, getDouble
, getLong
,
setDouble
etc. If amap
and areduce
were functions instead of macros, they would have to work with all
types of data, and would have to box and unbox numbers on every call instead of keeping them primitive.
This incures performance penalty of roughly an order of magnitude.
Now, Fluokitten!
(defn p+ ^double [^double x ^double y] (+ x y)) (defn p* ^double [^double x ^double y] (* x y)) (defn dot-fluokitten [xs ys] (foldmap p+ 0.0 p* xs ys))
;;(quick-bench (dot-fluokitten arr-a arr-b)) (dot-fluokitten arr-a arr-b)
3.3332833335E14
Execution time mean : 78.358088 µs
Yesss! Even slightly faster than areduce
! (your mileage may vary, though!).
Faster than map
/ reduce
, and often at the same speed as amap
/ areduce
. If you're not careful you
might be hit by boxing. HotSpot is an unpredictable beast when it comes to hese things. It is always
safer to provide primitive typehinted functions as arguments. +
and * work, but the arguments will be boxed.
There are helper functions in Uncomplicate commons library that can wrap any function into a primitive version.
Where it really shows its convenience is when we work with something a bit more sophisticated than
mere +
and *:
(defn pointless-id ^double [^double x] (Math/exp (Math/log x))) (defn demo-amap [xs] (amap ^doubles xs idx xs (pointless-id (aget ^doubles xs idx))))
;;(quick-bench (demo-amap arr-a)) (take 3 (demo-amap arr-a))
0.0 | 1.0 | 2.0 |
Execution time mean : 5.186906 ms
Now, Fluokitten:
;;(quick-bench (fmap! pointless-id arr-b)) (take 3 (fmap! pointless-id arr-b))
0.0 | 1.0 | 2.0 |
The same speed as amap
!
Execution time mean : 5.006114 ms
While, if you'd used ordinary map
:
;;(quick-bench (doall (map pointless-id arr-b))) (doall (map pointless-id arr-b))
Execution time mean : 14.845975 ms
Three times slower. JVM spent 5 ms on actual computations, and twice as that on messing up with objects. Sometimes it is the cost that is not important, but in some applications, it is.
And a honorary mention of Neanderthal
Of course, if you really need the speed, there is no better helper with that than Neanderthal ():
(require '[uncomplicate.neanderthal [native :refer [dv]] [vect-math :refer [log! exp!]]])
(def v (dv (range 100000))) ;;(quick-bench (take 3 (exp! (log! v)))) (take 3 (exp! (log! v)))
#'user/v(0.0 1.0 2.0)
Execution time mean : 110.770416 µs
50 times faster than arrays is quite all right. Keep in mind that it used four cores instead of just one…
Conclusions
Fluokitten helps me a lot. I use it in Neanderthal and other high-performance code to get the elegance of Clojure's higher order map and reduce, while keeping the speed of native and GPU backends. Check it out…