Gemma 3 AI model in Clojure

December 9, 2025

Please share this post in your communities. Without your help, it will stay burried under tons of corporate-pushed, AI and blog farm generated slop, and very few people will know that this exists.

These books fund my work! Please check them out.

Recently I've been working on the ONNX runtime integration into Deep Diamond, backed by the grant sponsored by the Clojurists Together Foundation. In the past few articles, we've seen how ONNX models are integrated into Deep Diamond, using only a single function onnx, with almost no need for additional configuration (which is available). I used a simple MNIST model in the demonstration. But, can we now load and run the inference on the real deal models, such as the open LLMs from the Hugging Face, for example? Let's see!

The Hugging Face model card has this to say about Gemma 3: "Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models." (etc., etc.) So, it seems to be something worth trying.

I'll try to be brief, and skip the unnecessary talk. Let's just show the code, which I've just lifted up and adapted from the Diamond's midje tests.

What we need for this? First, decide on the backend engine; this time we'll use tensors in main memory backed up by the oneDNN engine (DNNL).

(def fact (dnnl-factory))
(def neand-fact (neanderthal-factory fact))

Next, load and configure a particular flavor of Gemma 3 (a smaller one, only 1 billion parameters). The onnx function creates a generalized blueprint, which can create the actual functions when evaluated with the specific input tensors.

(def onnx-bp (onnx fact "data/gemma-3-1b-it-ONNX-GQA/onnx/model.onnx"
                   {:options (-> (options)
                           (override-dimension! "batch_size" 1)
                           (override-dimension! "sequence_length" 1)
                           (override-dimension! "past_sequence_length" 1)
                           (override-dimension! "total_sequence_length" 1))})

Gemma 3 has 63 inputs and 61 outputs. We'll need to provide these, but even here we can automate some parts with Clojure, since past-key values are pretty uniform. We only need to provide inputs, while the engine can create the outputs for us.

(def src-tz (tensor fact [1 1 28 28] :float :nchw))
(def input-ids (tensor neand-fact [1 1] :long :nc))
(def position-ids (tensor neand-fact [1 1] :long :nc))
(def attention-mask (tensor neand-fact [1 1] :long :nc))
(def past-key-values (repeatedly 60 #(tensor fact [1 3 1 64] :float :nchw)))

Next, create the executable instance model. Nothing too fancy here.

(def gemma-next! (onnx-bp (into [input-ids attention-mask position-ids] past-key-values)))

Now, these inputs need to be initialized. Normally, that would be done inside an LLM generation loop, but here we only demonstrate one step, and we transfer some mock data.

(transfer! [2] input-ids)
(transfer! [0] position-ids)
(transfer! [1] attention-mask)
(doseq [pkv past-key-values]
  (transfer! (repeat 0) pkv))

Aaaaand, we actually run the model by calling our gemma function, which provides the next token.

(gemma-next!)

Now, hold on with the celebration. This does not actually return a full answer from the LLM. This only returns the next token, but in the form of large tensor full of numbers. The information is there, but needs to be extracted from these numbers to the form of string. Also, this is only one step; a LLM would typically run this in a loop and spew tokens after tokens. There's some more work to do until we get a ready made, hands-off chatty LLM. But the main work has been done, and now it's the matter of setting it up properly, tokenizing the inputs, and calling it in a useful way! Still lots of work, but not the hardest parts :)

I've applied for Clojurists Together yearly funding in 2026. If you are a Clojurists Together member, and would like to see continued development in this area, your vote can help me keep working on this :)

My goal with this funding in 2026 is to continuously develop Clojure AI, ML, and high-performance ecosystem of Uncomplicate libraries (Neanderhal and many more), on Nvidia GPUs, Apple Silicon, and traditional PC. In this year, I will also focus on writing tutorals on my blog and creating websites for the projects involved, which is something that I wanted for years, but didn't have time to do because I spent all time on programming.