CUDA 10 in Clojure

Need help with your custom Clojure software? I'm open to (selected) contract work.

November 21, 2018

Please share: Twitter.

These books fund my work! Please check them out.

New CUDA 10 support has just landed in ClojureCUDA () with the latest version 0.6.0. Install CUDA 10 Toolkit, update your drivers, update the ClojureCUDA version in your project.clj, and you should be ready to go! I expect that all your existing code continues to work without changes!

CUDA is the leading environment for high performance computing on Nvidia GPU hardware. You've probably heard about it. What you may not know, is that instead of wrangling with C++ toolchain, you can use it directly from your interactive Clojure REPL!

Other Clojure high performance libraries, such as Neanderthal () take advantage of ClojureCUDA to deliver speed dynamically to your Clojure programs.

Check these higher-level libraries to see how you can do fast calculations with a few lines of Clojure, without writing GPU code yourself. But, see how writing even low level code is not so scary when you have the full power of Clojure's interactive environment. Here is an interactive REPL session that builds the bulk increment function for the whole array.

Notice that I don't write any explanations for the following lines of code, and I bet you can understand what's happening.

Play with the environment

(require '[uncomplicate.clojurecuda.core :refer :all]
         '[uncomplicate.commons.core :refer :all])

(init)

true

(device-count)

(def my-nvidia-gpu (device 0))

#'user/my-nvidia-gpu

(info my-nvidia-gpu)

:async-engine-count

:managed-memory

true

:multi-gpu-board

false

:maximum-surface2d-layered-layers

2048

:maximum-texturecubemap-width

32768

:ecc-enabled

false

:max-pitch

2147483647

:max-grid-dim-y

65535

:compute-mode

:default

:can-map-host-memory

true

:max-grid-dim-z

65535

:pci-bus-id-string

0000:02:00.0

:maximum-texture2d-mipmapped-width

32768

:texture-pitch-alignment

:kernel-exec-timeout

false

:maximum-texture2d-linear-height

65000

:max-shared-memory-per-multiprocessor

98304

:total-mem

11721506816

:maximum-texture1d-layered-width

32768

:maximum-texturecubemap-layered-layers

2046

:maximum-texture3d-width

16384

:maximum-surface2d-layered-height

32768

:max-block-dim-z

:maximum-surface1d-width

32768

:maximum-surface3d-width

16384

:name

GeForce GTX 1080 Ti

:maximum-texture3d-height-alternate

8192

:max-threads-per-multiprocessor

2048

:max-shared-memory-per-block

49152

:maximum-texture3d-width-alternate

8192

:compute-capability-major

:texture-alignment

512

:global-memory-bus-width

352

:maximum-surface2d-layered-width

32768

:memory-clock-rate

5505000

:maximum-surfacecubemap-layered-layers

2046

:maximum-surface2d-height

65536

:clock-rate

1683000

:concurrent-kernels

:compute-capability-minor

:maximum-texture2d-width

131072

:max-threads-per-block

1024

:maximum-texture1d-linear-width

134217728

:integrated

false

:maximum-texture2d-layered-layers

2048

:max-block-dim-x

1024

:maximum-texture1d-mipmapped-width

16384

:maximum-texture2d-mipmapped-height

32768

:local-L1-cache-supported

true

:maximum-surface1d-layered-layers

2048

:pci-bus-id

:maximum-texture1d-layered-layers

2048

:maximum-surfacecubemap-layered-width

32768

:max-grid-dim-x

2147483647

:maximum-texture2d-height

65536

:global-L1-cache-supported

true

:maximum-texture2d-linear-pitch

2097120

:maximum-texturecubemap-layered-width

32768

:multi-gpu-board-group-id

:pci-domain-id

:maximum-surface3d-depth

16384

:maximum-surface2d-width

131072

:stream-priorities-supported

true

:multiprocessor-count

:tcc-driver

false

:warp-size

:unified-addressing

true

:maximum-texture3d-height

16384

:L2-cache-size

2883584

:maximum-surfacecubemap-width

32768

:maximum-texture1d-width

131072

:maximum-surface1d-layered-width

32768

:maximum-surface3d-height

16384

:pci-device-id

:max-registers-per-block

65536

:max-block-dim-y

1024

:surface-alignment

512

:maximum-texture3d-depth-alternate

32768

:maximum-texture3d-depth

16384

:total-constant-memory

65536

:maximum-texture2d-linear-width

131072

:max-registers-per-multiprocessor

65536

:maximum-texture2d-layered-height

32768

Create the context

(def ctx (context my-nvidia-gpu))

#'user/ctx

(info ctx)

'(:dev-runtime-pending-launch-count 2048  :dev-runtime-sync-depth 2  :malloc-heap-size 8388608  :stack-size 1024  :api-version 3020  :stream-priority-range (0 -1)  :cache-config :prefer-none  :printf-fifo-size 1048576  :device #object(jcuda.driver.CUdevice 0x1db4190c "CUdevice[nativePointer=0x0]")  :shared-config :four-byte-bank-size)

(= ctx (current-context))

true

Play with memory

(def gpu-array (mem-alloc 1024))

#'user/gpu-array

(def main-array (float-array (range 256)))

#'user/main-array

(take 10 main-array)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

(memcpy-host! main-array gpu-array)

#object[uncomplicate.clojurecuda.internal.impl.CULinearMemory 0x515e61e0 "uncomplicate.clojurecuda.internal.impl.CULinearMemory@515e61e0"]

Compute something already

extern "C"
__global__ void increment(int n, float *a) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < n) {
        a[i] = a[i] + 1.0f;
    }
};

(def kernel-source
      "extern \"C\"
         __global__ void increment (int n, float *a) {
           int i = blockIdx.x * blockDim.x + threadIdx.x;
           if (i < n) {
             a[i] = a[i] + 1.0f;
        }
       };")

(def hello-program (compile! (program kernel-source)))
(def hello-module (module hello-program))
(def increment (function hello-module "increment"))

#'user/kernel-source#'user/hello-program#'user/hello-module#'user/increment

(launch! increment (grid-1d 256) (parameters 256 gpu-array))

nil

(take 12 (memcpy-host! gpu-array (float-array 256)))

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

11.0

12.0

How to get involved

ClojureCUDA and other Uncomplicate libraries are open source. You can contribute with examples, bug fixes, new features, etc. I understand that you may need more time or knowledge to be able to do that.

That's why I've recently started a donation campaign to enable people who do not have enough time, to help me have more time to work on this, by donating a monthly amount of your choice on Patreon.

You can even adopt a pet function of your own!

Today you can be a special Thanksgiving Donator.

Donate and become a Patron!

SmallFP ClojureCUDA talk

A few month ago I've visited SmallFP & ClojuTRE conference in beautiful Helsinki, Finland. Maybe you'll find that talk interesting. Here's the link to the video on YouTube. And here are the slides.