Clojure on Fennel Part One: Persistent Data Structures

An exploration of building the ClojureFnl compiler and implementing high-performance persistent data structures on Lua/Fennel to replace inefficient copy-on-write methods.
Clojure on Fennel part one: Persistent Data Structures
Somewhere in 2019 I started a project that aimed to bring some of Clojure features to Lua runtime - fennel-cljlib.
It was a library for Fennel that implemented a basic subset of clojure.core namespace functions and macros.
My goal was simple - I enjoy working with Clojure, but I don’t use it for hobby projects, so I wanted Fennel to feel more Clojure-like, besides what it already provides for that.
This library grew over the years, I implemented lazy sequences, added immutability, made a testing library, inspired by clojure.test and kaocha, and even made a port of clojure.core.async.
It was a passion project, I almost never used it to write actual software.
One notable exception is fenneldoc - a tool for documentation generation for Fennel libraries.
And I haven’t seen anyone else use it for a serious project.
The reason for that is simple - it was an experiment. Corners were cut, and Fennel, being a Clojure-inspired lisp is not associated with functional programming the same way Clojure is. As a matter of fact, I wouldn’t recommend using this library for anything serious… yet.
Recently, however, I started a new project: ClojureFnl.
This is a Clojure-to-Fennel compiler that uses fennel-cljlib as a foundation.
It’s still in early days of development, but I’ve been working on it for a few months in private until I found a suitable way to make things work in March.
As of this moment, it is capable of compiling most of .cljc files I threw at it, but running the compiled code is a different matter.
I mean, it works to some degree, but the support for standard library is far from done.
;; Welcome to ClojureFnl REPL
;; ClojureFnl v0.0.1
;; Fennel 1.6.1 on PUC Lua 5.5
user=> (defn prime? [n]
(not (some zero? (map #(rem n %) (range 2 n)))))
#<function: 0x89ba7c550>
user=> (for [x (range 3 33 2)
:when (prime? x)]
x)
(3 5 7 11 13 17 19 23 29 31)
user=>
However, there was a problem.
My initial implementation of immutable data structures in the itable library had a serious flaw. The whole library was a simple hack based on the copy-on-write approach and a bunch of Lua metatables to enforce immutability. As a result, all operations were extremely slow. It was fine as an experiment, but if I wanted to go further with ClojureFnl, I had to replace it. The same problem plagued immutableredblacktree.lua, an implementation of a copy-on-write red-black tree I made for sorted maps. It did a full copy of the tree each time it was modified.
For associative tables it wasn’t that big of a deal - usually maps contain a small amount of keys, and itable only copied levels that needed to be changed.
So, if you had a map with, say, ten keys, and each of those keys contained another map with ten keys, adding, removing or updating a key in the outer map meant only copying these ten keys - not the whole nested map.
I could do that reliably, because inner maps were immutable too.
But for arrays the story is usually quite different. Arrays often store a lot of indices, and rarely are nested (or at least not as often as maps). And copying arrays on every change quickly becomes expensive. I’ve mitigated some of the performance problems by implementing my version of transients, however the beauty of Clojure’s data structures is that they’re quite fast even without this optimization.
Proper persistent data structures
Clojure uses Persistent HAMT as a base for its hash maps and sets, and a bit-partitioned trie for vectors. For sorted maps and sets, Clojure uses an immutable red-black tree implementation, but as far as I know it’s not doing a full copy of the tree, and it also has structural sharing properties.
I started looking into existing implementations of HAMT for Lua:
- hamt.lua (based on mattbierner/hamt) - seemed incomplete
- ltrie: no transients, no hashset, no ordered map, no compound vector/hash
- Michael-Keith Bernard’s gist: no custom hashing
I could use one of those, notably ltrie seemed the most appropriate one, but given that I’m working on a fennel library that I want later to embed into my Clojure compiler I needed a library implemented in Fennel.
So I made my own library: immutable.fnl. This library features HAMT-hash maps, hash-sets, and vectors, as well as a better implementation of a persistent red-black tree, and lazy linked lists.
Persistent Hash Map
I started the implementation with a Persistent HAMT with native Lua hashing. The data structure itself is a Hash Array Mapped Trie (HAMT) with 16-factor branching. Thus all operations are O(Log16 N), which is effectively O(1) for a practical amount of keys.
As far as I know, Clojure uses branching factor of 32, but for a Lua runtime this would mean that the popcount would be more expensive, and despite a shallower tree, each mutation would need to copy a larger sparse array. With branching factor of 16 a map with 50K entries is ~4 levels deep, which would be ~3 with 32 branching factor. So my logic was that it’ll be a compromise, especially since Lua is not JVM when it comes to performance.
Of course, it’s not as fast as a pure Lua table, which is to be expected. Lua tables are implemented in C, use efficient hashing, and dynamically re-allocated based on key count. So for my implementation most operations are a lot slower, but the total time for an operation is still usable.
Here are some benchmarks:
Median time over 7 rounds (1 warmup discarded), N = 50000 elements. GC stopped during measurement. Clock: os.clock (CPU). Runtime: Fennel 1.7.0-dev on PUC Lua 5.5
| Operation | Lua table | Persistent HashMap | Ratio | per op | |---|---|---|---|---| | insert 50000 random keys | 2.05 ms | 164.80 ms | 80.3x slower | 3.3 us | | lookup 50000 random keys | 0.83 ms | 92.51 ms | 110.8x slower | 1.9 us | | delete all | 0.78 ms | 170.78 ms | 219.8x slower | 3.4 us | | delete 10% | 0.14 ms | 19.50 ms | 136.4x slower | 3.9 us | | iterate 50000 entries | 1.74 ms | 6.64 ms | 3.8x slower | 0.133 us |
For transients the situation is a bit better, but not by much:
| Operation | Lua table | Transient HashMap | Ratio | per op | |---|---|---|---|---| | insert 50000 random keys | 2.05 ms | 89.17 ms | 43.5x slower | 1.8 us | | delete all | 0.76 ms | 104.31 ms | 138.0x slower | 2.1 us | | delete 10% | 0.16 ms | 12.71 ms | 82.0x slower | 2.5 us |
On LuaJIT numbers may seem worse, but per-operation cost is much lower, it’s just that native table operations are so much faster.
With a branching factor of 32 the situation gets worse on PUC Lua, but is slightly better on LuaJIT. So there’s still space for fine-tuning.
For hashing strings and objects I decided to use djb2 algorithm. The main reason to use it was that it can be implemented even if we don’t have any bit-wise operators, and Lua doesn’t have them in all of the versions. It only uses +, *, and %.
Source: Hacker News












