# Going faster than TensorFlow with Clojure

You can adopt a pet function! Support my work on my Patreon page, and access my dedicated discussion server. Can't afford to donate? Ask for a free invite.

October 5, 2020

A few weeks ago I've shown you how simple Clojure's Deep Diamond() is, even compared to Keras. I've also mentioned that it's superfast, and you probably didn't believe me. Let's quickly compare the training time of the same convolutional neural network in Clojure and Keras!

## TL;DR Deep Diamond is much faster

In this article, we're only measuring the performance on the CPU. Both libraries, Deep Diamond, and Keras with TensorFlow use Intel's oneDNN low level performance library under the hood, and I confirmed that both installations exploit AVX2 instructions that are available on my (old-ish) CPU i7-4790k, so the difference is completely due to the higher-level implementations.

Deep Diamond completes this training in 368 seconds while Keras + TensorFlow takes 509 seconds.

TensorFlow is not famous for being the fastest deep learning library, but keep in mind that that info is from the times before they integrated Intel's oneDNN. Now that all major frameworks have oneDNN support, the underlying performance is usually on more equal footing.

That's why this result is good. Even though all high performance operations are backed by the same native operations, Keras + TensoFlow still add 140 seconds of overhead, almost 50% to Deep Diamond's running time. Not bad!

I know what you'll complain about: "Nobody trains their networks on CPU anyway. The GPU performance is what is relevant! Clojure certainly can't challenge TensorFlow there?" You're right about the first part; GPU performance is much more relevant. Let's keep the suspense then, until the next article, which will compare Deep Diamond to Keras + TensorFlow on the GPU with CUDA.

## Keras CNN in Python

I repeat the relevant model code for reference. We're interested in the running time of model.fit, with minimal verbosity, for 12 epochs.

model = Sequential()
activation='relu',
input_shape=(28, 28, 1)))

model.compile(loss=keras.losses.categorical_crossentropy,
metrics=['accuracy'])

s = time.time_ns()
model.fit(x_train, y_train,
batch_size=128,
verbose=2,
epochs=12)
e = time.time_ns()
print((e-s)/(10**9), " seconds")


## Deep Diamond CNN in Clojure

In Clojure, we're measuring the runtime of the train function.

(defonce net-bp
(network (desc [128 1 28 28] :float :nchw)
[(convo [32] [3 3] :relu)
(convo [64] [3 3] :relu)
(pooling [2 2] :max)
(dropout)
(dense [128] :relu)
(dropout)
(dense [10] :softmax)]))