Going faster than TensorFlow with Clojure

You can adopt a pet function! Support my work on my Patreon page, and access my dedicated discussion server. Can't afford to donate? Ask for a free invite.

October 5, 2020

Please share: .

New books are available for subscription.

A few weeks ago I've shown you how simple Clojure's Deep Diamond() is, even compared to Keras. I've also mentioned that it's superfast, and you probably didn't believe me. Let's quickly compare the training time of the same convolutional neural network in Clojure and Keras!

TL;DR Deep Diamond is much faster

In this article, we're only measuring the performance on the CPU. Both libraries, Deep Diamond, and Keras with TensorFlow use Intel's oneDNN low level performance library under the hood, and I confirmed that both installations exploit AVX2 instructions that are available on my (old-ish) CPU i7-4790k, so the difference is completely due to the higher-level implementations.

Deep Diamond completes this training in 368 seconds while Keras + TensorFlow takes 509 seconds.

TensorFlow is not famous for being the fastest deep learning library, but keep in mind that that info is from the times before they integrated Intel's oneDNN. Now that all major frameworks have oneDNN support, the underlying performance is usually on more equal footing.

That's why this result is good. Even though all high performance operations are backed by the same native operations, Keras + TensoFlow still add 140 seconds of overhead, almost 50% to Deep Diamond's running time. Not bad!

I know what you'll complain about: "Nobody trains their networks on CPU anyway. The GPU performance is what is relevant! Clojure certainly can't challenge TensorFlow there?" You're right about the first part; GPU performance is much more relevant. Let's keep the suspense then, until the next article, which will compare Deep Diamond to Keras + TensorFlow on the GPU with CUDA.

Keras CNN in Python

I repeat the relevant model code for reference. We're interested in the running time of model.fit, with minimal verbosity, for 12 epochs.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(28, 28, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=Adam(learning_rate=0.01),
              metrics=['accuracy'])

s = time.time_ns()
model.fit(x_train, y_train,
          batch_size=128,
          verbose=2,
          epochs=12)
e = time.time_ns()
print((e-s)/(10**9), " seconds")

Deep Diamond CNN in Clojure

In Clojure, we're measuring the runtime of the train function.

(defonce net-bp
  (network (desc [128 1 28 28] :float :nchw)
           [(convo [32] [3 3] :relu)
            (convo [64] [3 3] :relu)
            (pooling [2 2] :max)
            (dropout)
            (dense [128] :relu)
            (dropout)
            (dense [10] :softmax)]))

(defonce net (init! (net-bp :adam)))

(time (train net train-images y-train :crossentropy 12 []))

The books

The book Deep Learning for Programmers: An Interactive Tutorial with CUDA, OpenCL, DNNL, Java, and Clojure teaches the nuts and bolts of neural networks and deep learning by showing you how Deep Diamond is built, from scratch, in interactive sessions. Each line of code can be executed and the results inspected in the plain Clojure REPL. The best way to master something is to build it yourself!

It' simple. But fast and powerful!

Please subscribe, read the drafts, get the full book soon, and support my work on this free open source library.

Going faster than TensorFlow with Clojure - October 5, 2020 - Dragan Djuric