A consistent-hashing ring with virtual nodes, in pure Standard ML — distribute
keys across a changing set of nodes so that adding or removing a node remaps
only a small fraction (~1/N) of the keyspace instead of reshuffling everything.
Ring positions are deterministic CRC-32 hashes from
sml-codec (vendored), so the
mapping is reproducible, byte-identically under both MLton
and Poly/ML.
- 32 assertions, green on MLton and Poly/ML.
- Basis-library + vendored
sml-codeconly; deterministic across compilers. - Vendors
sml-codec(Layout B) underlib/github.com/sjqtentacles/sml-codec/, so the repo builds standalone.
No FFI, no IO inside the library, no wall-clock, no ambient randomness, and no
threads. A key's owner is a pure function of the node set and the replica
count — never of insertion order, run, machine, or compiler. The test suite and
the demo are byte-identical under MLton and Poly/ML. Ring positions are
Word32.word values from sml-codec's CRC-32, kept in Word32 and
bounds-checked before any narrowing, so the arithmetic is identical on MLton's
32-bit Int and Poly/ML; all assertions compare exact integers and strings,
never reals.
With smlpkg:
smlpkg add github.com/sjqtentacles/sml-consistenthash
smlpkg sync
Include the MLB from your own (it pulls in the vendored sml-codec):
local
$(SML_LIB)/basis/basis.mlb
lib/github.com/sjqtentacles/sml-consistenthash/... (via smlpkg)
in
...
end
This brings structure ConsistentHash (and the vendored Crc32) into scope.
(* build a ring with 100 virtual nodes per physical node *)
val ring =
let
val r0 = ConsistentHash.make { replicas = 100 }
val r1 = ConsistentHash.addNode r0 "cache-a"
val r2 = ConsistentHash.addNode r1 "cache-b"
val r3 = ConsistentHash.addNode r2 "cache-c"
in r3 end
(* which node owns a key? *)
val owner = ConsistentHash.getNode ring "user-42" (* SOME "cache-a" *)
(* the node roster, always sorted *)
val ns = ConsistentHash.nodes ring (* ["cache-a","cache-b","cache-c"] *)
(* two distinct nodes for replication *)
val replicas = ConsistentHash.getN ring "user-42" 2 (* ["cache-a","cache-c"] *)
(* removing a node only reassigns that node's keys *)
val smaller = ConsistentHash.removeNode ring "cache-b"type t
val make : { replicas : int } -> t (* replicas = virtual nodes per node *)
val addNode : t -> string -> t (* idempotent *)
val removeNode : t -> string -> t (* no-op if absent *)
val getNode : t -> string -> string option (* owner, NONE if ring empty *)
val nodes : t -> string list (* sorted, deterministic *)
val getN : t -> string -> int -> string list (* up to N distinct owners *)getNode ring key returns the first node clockwise from key's hash on the
ring, or NONE when the ring has no nodes. getN ring key n walks clockwise
collecting n distinct physical nodes (capped at the number of nodes), for
replication. make clamps a nonpositive replicas up to 1.
- Ring placement. Each physical node
nameis placed atreplicasring points, one per virtual node, at positionsCrc32.crc (name ^ "#" ^ i)foriin0 .. replicas-1. A keykis placed atCrc32.crc kand owned by the first ring point at or clockwise after that position (wrapping past the top of the ring back to the smallest point). - Determinism. The ring is rebuilt from the sorted, distinct node roster on
every membership change, and positions are sorted by
(Word32position, node name)`, so ownership never depends on insertion order. Re-adding an existing node is a no-op; removing an absent node is a no-op. - Virtual nodes. More
replicasspreads each physical node across more ring points, smoothing the keyspace each node owns (better balance) at the cost of a larger ring. - Portability. Positions stay in
Word/Word32; the ring is sorted with a Basis-only insertion sort (noListMergeSort); only ASCII-and#appear in generated strings.
make test # MLton
make test-poly # Poly/ML
make all-tests # both
make example # build + run examples/demo.sml
make clean
Both compilers run the same strict-TDD suite (32 assertions), all exact integer/string comparisons:
- membership —
getNodeisNONEon an empty ring andSOMEonce a node is present;nodesis sorted and deduplicated;removeNodedrops exactly one node (no-op if absent) and emptying the ring restoresNONE;getNreturns the requested number of distinct nodes, capped at the roster size, with its first element equal togetNode; - determinism — the same key maps to the same node across repeated lookups, and ownership is identical whether nodes were added in forward or reverse order;
- consistency (the core property) — adding a 5th node to a 4-node ring moves fewer than 40% of 2000 keys (ideal ~20%) and every moved key lands on the new node; removing a node reassigns only that node's keys, leaving every other key exactly where it was;
- balance — every key is always owned by some node, and a high virtual-node count gives a spread no worse than a single-replica ring, with no node owning more than 2x its fair share over 5000 keys.
This library depends on
sml-codec for a stable,
reproducible hash (Crc32.crc : string -> Word32.word). Its sources are
vendored verbatim under lib/github.com/sjqtentacles/sml-codec/ — the whole
module set (base16, base64, crc32, sha1, sha256, each .sig/.sml,
plus sml-codec.mlb and sources.mlb), so the .mlb resolves intact even
though only CRC-32 is used. src/consistenthash.mlb references that
sources.mlb first, then consistenthash.sig/consistenthash.sml; the
Poly/ML use-chain in the Makefile loads the vendored codec sources first (in
the order sources.mlb implies), then the consistent-hash sources, then the
test driver, in dependency order. sml.pkg records the dependency in its
require block so smlpkg sync can refresh it.
make example builds a 4-node ring (150 virtual nodes each) over 10,000 keys,
then adds and removes a node to show the churn (output is byte-identical under
MLton and Poly/ML):
=== sml-consistenthash demo ===================================
Ring of 4 nodes, 150 virtual nodes each, on top of sml-codec CRC-32.
Distributing 10000 keys.
Load per node (keys owned):
cache-a 2904 keys
cache-b 2378 keys
cache-c 2088 keys
cache-d 2630 keys
Sample lookups:
user-1 -> cache-a
user-42 -> cache-a
user-999 -> cache-d
Replication (getN k 2):
user-1 -> [cache-a, cache-c]
user-42 -> [cache-a, cache-c]
Add cache-e: 2185 of 10000 keys moved (21.8%).
Remove cache-b: 1563 of 10000 keys moved (15.6%).
===============================================================
CI builds Poly/ML 5.9.1 from source rather than using the Ubuntu package
(Poly/ML 5.7.1), whose X86 code generator crashes (asGenReg raised while compiling) on some code. See .github/workflows/ci.yml.
MIT — see LICENSE.