# Going faster than TensorFlow on the GPU with Clojure (GTX 1080Ti)

Need help with your custom Clojure software? I'm open to (selected) contract work.

November 2, 2020

A few weeks ago I've shown you how simple Clojure's Deep Diamond() is, even compared to Keras. I've also mentioned that it's superfast. Here's how fast it is on the GPU!

## TL;DR Much faster than Keras+TensorFlow on the GPU, too!

In the previous article, we have only compared the libraries on the CPU. Deep Diamond was considerably faster: 368 seconds vs 509 seconds. Most readers were intrigued, but, being skeptical as they should be, they complained that CPU performance doesn't matter anyway, since everybody uses GPU for training convolution networks; let's do the GPU comparison then.

Both Deep Diamond, and Keras with TensorFlow, use Nvidia's cuDNN low level performance library under the hood, and any difference is due to the higher-level implementation.

Deep Diamond completes this training in 21 seconds while Keras + TensorFlow takes 35 seconds. The gap even increased in favor of Deep Diamond! Now the ratio is 1.67, in place of 1.38 on the CPU.

## Keras CNN in Python

I repeat the relevant model code for reference. We're interested in the running time of model.fit, with minimal verbosity, for 12 epochs. I'm using Nvidia's GTX 1080Ti GPU. Keras code is taken from official Keras examples.

model = Sequential()
activation='relu',
input_shape=(28, 28, 1)))

model.compile(loss=keras.losses.categorical_crossentropy,
metrics=['accuracy'])

s = time.time_ns()
model.fit(x_train, y_train,
batch_size=128,
verbose=2,
epochs=12)
e = time.time_ns()
print((e-s)/(10**9), " seconds")


## Deep Diamond CNN in Clojure

In Clojure, we're measuring the runtime of the train function.

(defonce net-bp
(network (desc [128 1 28 28] :float :nchw)
[(convo [32] [3 3] :relu)
(convo [64] [3 3] :relu)
(pooling [2 2] :max)
(dropout)
(dense [128] :relu)
(dropout)
(dense [10] :softmax)]))