CUDA 10 in Clojure
Need help with your custom Clojure software? I'm open to (selected) contract work.November 21, 2018
Please share: Twitter.
These books fund my work! Please check them out.
New CUDA 10 support has just landed in ClojureCUDA ()
with the latest version 0.6.0. Install CUDA 10 Toolkit, update your drivers, update the ClojureCUDA version in your
project.clj
, and you should be ready to go! I expect that all your existing code continues to work without changes!
CUDA is the leading environment for high performance computing on Nvidia GPU hardware. You've probably heard about it. What you may not know, is that instead of wrangling with C++ toolchain, you can use it directly from your interactive Clojure REPL!
Other Clojure high performance libraries, such as Neanderthal () take advantage of ClojureCUDA to deliver speed dynamically to your Clojure programs.
Check these higher-level libraries to see how you can do fast calculations with a few lines of Clojure, without writing GPU code yourself. But, see how writing even low level code is not so scary when you have the full power of Clojure's interactive environment. Here is an interactive REPL session that builds the bulk increment function for the whole array.
Notice that I don't write any explanations for the following lines of code, and I bet you can understand what's happening.
Play with the environment
(require '[uncomplicate.clojurecuda.core :refer :all] '[uncomplicate.commons.core :refer :all])
(init)
true
(device-count)
2
(def my-nvidia-gpu (device 0))
#'user/my-nvidia-gpu
(info my-nvidia-gpu)
:async-engine-count | 2 | :managed-memory | true | :multi-gpu-board | false | :maximum-surface2d-layered-layers | 2048 | :maximum-texturecubemap-width | 32768 | :ecc-enabled | false | :max-pitch | 2147483647 | :max-grid-dim-y | 65535 | :compute-mode | :default | :can-map-host-memory | true | :max-grid-dim-z | 65535 | :pci-bus-id-string | 0000:02:00.0 | :maximum-texture2d-mipmapped-width | 32768 | :texture-pitch-alignment | 32 | :kernel-exec-timeout | false | :maximum-texture2d-linear-height | 65000 | :max-shared-memory-per-multiprocessor | 98304 | :total-mem | 11721506816 | :maximum-texture1d-layered-width | 32768 | :maximum-texturecubemap-layered-layers | 2046 | :maximum-texture3d-width | 16384 | :maximum-surface2d-layered-height | 32768 | :max-block-dim-z | 64 | :maximum-surface1d-width | 32768 | :maximum-surface3d-width | 16384 | :name | GeForce GTX 1080 Ti | :maximum-texture3d-height-alternate | 8192 | :max-threads-per-multiprocessor | 2048 | :max-shared-memory-per-block | 49152 | :maximum-texture3d-width-alternate | 8192 | :compute-capability-major | 6 | :texture-alignment | 512 | :global-memory-bus-width | 352 | :maximum-surface2d-layered-width | 32768 | :memory-clock-rate | 5505000 | :maximum-surfacecubemap-layered-layers | 2046 | :maximum-surface2d-height | 65536 | :clock-rate | 1683000 | :concurrent-kernels | 1 | :compute-capability-minor | 1 | :maximum-texture2d-width | 131072 | :max-threads-per-block | 1024 | :maximum-texture1d-linear-width | 134217728 | :integrated | false | :maximum-texture2d-layered-layers | 2048 | :max-block-dim-x | 1024 | :maximum-texture1d-mipmapped-width | 16384 | :maximum-texture2d-mipmapped-height | 32768 | :local-L1-cache-supported | true | :maximum-surface1d-layered-layers | 2048 | :pci-bus-id | 2 | :maximum-texture1d-layered-layers | 2048 | :maximum-surfacecubemap-layered-width | 32768 | :max-grid-dim-x | 2147483647 | :maximum-texture2d-height | 65536 | :global-L1-cache-supported | true | :maximum-texture2d-linear-pitch | 2097120 | :maximum-texturecubemap-layered-width | 32768 | :multi-gpu-board-group-id | 0 | :pci-domain-id | 0 | :maximum-surface3d-depth | 16384 | :maximum-surface2d-width | 131072 | :stream-priorities-supported | true | :multiprocessor-count | 28 | :tcc-driver | false | :warp-size | 32 | :unified-addressing | true | :maximum-texture3d-height | 16384 | :L2-cache-size | 2883584 | :maximum-surfacecubemap-width | 32768 | :maximum-texture1d-width | 131072 | :maximum-surface1d-layered-width | 32768 | :maximum-surface3d-height | 16384 | :pci-device-id | 0 | :max-registers-per-block | 65536 | :max-block-dim-y | 1024 | :surface-alignment | 512 | :maximum-texture3d-depth-alternate | 32768 | :maximum-texture3d-depth | 16384 | :total-constant-memory | 65536 | :maximum-texture2d-linear-width | 131072 | :max-registers-per-multiprocessor | 65536 | :maximum-texture2d-layered-height | 32768 |
Create the context
(def ctx (context my-nvidia-gpu))
#'user/ctx
(info ctx)
'(:dev-runtime-pending-launch-count 2048 :dev-runtime-sync-depth 2 :malloc-heap-size 8388608 :stack-size 1024 :api-version 3020 :stream-priority-range (0 -1) :cache-config :prefer-none :printf-fifo-size 1048576 :device #object(jcuda.driver.CUdevice 0x1db4190c "CUdevice[nativePointer=0x0]") :shared-config :four-byte-bank-size)
(= ctx (current-context))
true
Play with memory
(def gpu-array (mem-alloc 1024))
#'user/gpu-array
(def main-array (float-array (range 256)))
#'user/main-array
(take 10 main-array)
0.0 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 7.0 | 8.0 | 9.0 |
(memcpy-host! main-array gpu-array)
#object[uncomplicate.clojurecuda.internal.impl.CULinearMemory 0x515e61e0 "uncomplicate.clojurecuda.internal.impl.CULinearMemory@515e61e0"]
Compute something already
extern "C" __global__ void increment(int n, float *a) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i < n) { a[i] = a[i] + 1.0f; } };
(def kernel-source "extern \"C\" __global__ void increment (int n, float *a) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i < n) { a[i] = a[i] + 1.0f; } };") (def hello-program (compile! (program kernel-source))) (def hello-module (module hello-program)) (def increment (function hello-module "increment"))
#'user/kernel-source#'user/hello-program#'user/hello-module#'user/increment
(launch! increment (grid-1d 256) (parameters 256 gpu-array))
nil
(take 12 (memcpy-host! gpu-array (float-array 256)))
1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 7.0 | 8.0 | 9.0 | 10.0 | 11.0 | 12.0 |
How to get involved
ClojureCUDA and other Uncomplicate libraries are open source. You can contribute with examples, bug fixes, new features, etc. I understand that you may need more time or knowledge to be able to do that.
That's why I've recently started a donation campaign to enable people who do not have enough time, to help me have more time to work on this, by donating a monthly amount of your choice on Patreon.
You can even adopt a pet function of your own!
Today you can be a special Thanksgiving Donator.
SmallFP ClojureCUDA talk
A few month ago I've visited SmallFP & ClojuTRE conference in beautiful Helsinki, Finland. Maybe you'll find that talk interesting. Here's the link to the video on YouTube. And here are the slides.