Skip to content

sjqtentacles/sml-consistenthash

Repository files navigation

sml-consistenthash

A consistent-hashing ring with virtual nodes, in pure Standard ML — distribute keys across a changing set of nodes so that adding or removing a node remaps only a small fraction (~1/N) of the keyspace instead of reshuffling everything. Ring positions are deterministic CRC-32 hashes from sml-codec (vendored), so the mapping is reproducible, byte-identically under both MLton and Poly/ML.

CI

Status

  • 32 assertions, green on MLton and Poly/ML.
  • Basis-library + vendored sml-codec only; deterministic across compilers.
  • Vendors sml-codec (Layout B) under lib/github.com/sjqtentacles/sml-codec/, so the repo builds standalone.

Purity

No FFI, no IO inside the library, no wall-clock, no ambient randomness, and no threads. A key's owner is a pure function of the node set and the replica count — never of insertion order, run, machine, or compiler. The test suite and the demo are byte-identical under MLton and Poly/ML. Ring positions are Word32.word values from sml-codec's CRC-32, kept in Word32 and bounds-checked before any narrowing, so the arithmetic is identical on MLton's 32-bit Int and Poly/ML; all assertions compare exact integers and strings, never reals.

Install

With smlpkg:

smlpkg add github.com/sjqtentacles/sml-consistenthash
smlpkg sync

Include the MLB from your own (it pulls in the vendored sml-codec):

local
  $(SML_LIB)/basis/basis.mlb
  lib/github.com/sjqtentacles/sml-consistenthash/... (via smlpkg)
in
  ...
end

This brings structure ConsistentHash (and the vendored Crc32) into scope.

Quick start

(* build a ring with 100 virtual nodes per physical node *)
val ring =
  let
    val r0 = ConsistentHash.make { replicas = 100 }
    val r1 = ConsistentHash.addNode r0 "cache-a"
    val r2 = ConsistentHash.addNode r1 "cache-b"
    val r3 = ConsistentHash.addNode r2 "cache-c"
  in r3 end

(* which node owns a key? *)
val owner = ConsistentHash.getNode ring "user-42"   (* SOME "cache-a" *)

(* the node roster, always sorted *)
val ns = ConsistentHash.nodes ring                  (* ["cache-a","cache-b","cache-c"] *)

(* two distinct nodes for replication *)
val replicas = ConsistentHash.getN ring "user-42" 2 (* ["cache-a","cache-c"] *)

(* removing a node only reassigns that node's keys *)
val smaller = ConsistentHash.removeNode ring "cache-b"

API (signature CONSISTENT_HASH)

type t

val make       : { replicas : int } -> t          (* replicas = virtual nodes per node *)
val addNode    : t -> string -> t                  (* idempotent *)
val removeNode : t -> string -> t                  (* no-op if absent *)
val getNode    : t -> string -> string option      (* owner, NONE if ring empty *)
val nodes      : t -> string list                  (* sorted, deterministic *)
val getN       : t -> string -> int -> string list (* up to N distinct owners *)

getNode ring key returns the first node clockwise from key's hash on the ring, or NONE when the ring has no nodes. getN ring key n walks clockwise collecting n distinct physical nodes (capped at the number of nodes), for replication. make clamps a nonpositive replicas up to 1.

Conventions

  • Ring placement. Each physical node name is placed at replicas ring points, one per virtual node, at positions Crc32.crc (name ^ "#" ^ i) for i in 0 .. replicas-1. A key k is placed at Crc32.crc k and owned by the first ring point at or clockwise after that position (wrapping past the top of the ring back to the smallest point).
  • Determinism. The ring is rebuilt from the sorted, distinct node roster on every membership change, and positions are sorted by (Word32 position, node name)`, so ownership never depends on insertion order. Re-adding an existing node is a no-op; removing an absent node is a no-op.
  • Virtual nodes. More replicas spreads each physical node across more ring points, smoothing the keyspace each node owns (better balance) at the cost of a larger ring.
  • Portability. Positions stay in Word/Word32; the ring is sorted with a Basis-only insertion sort (no ListMergeSort); only ASCII - and # appear in generated strings.

Build & test

make test        # MLton
make test-poly   # Poly/ML
make all-tests   # both
make example     # build + run examples/demo.sml
make clean

Both compilers run the same strict-TDD suite (32 assertions), all exact integer/string comparisons:

  • membershipgetNode is NONE on an empty ring and SOME once a node is present; nodes is sorted and deduplicated; removeNode drops exactly one node (no-op if absent) and emptying the ring restores NONE; getN returns the requested number of distinct nodes, capped at the roster size, with its first element equal to getNode;
  • determinism — the same key maps to the same node across repeated lookups, and ownership is identical whether nodes were added in forward or reverse order;
  • consistency (the core property) — adding a 5th node to a 4-node ring moves fewer than 40% of 2000 keys (ideal ~20%) and every moved key lands on the new node; removing a node reassigns only that node's keys, leaving every other key exactly where it was;
  • balance — every key is always owned by some node, and a high virtual-node count gives a spread no worse than a single-replica ring, with no node owning more than 2x its fair share over 5000 keys.

Vendoring

This library depends on sml-codec for a stable, reproducible hash (Crc32.crc : string -> Word32.word). Its sources are vendored verbatim under lib/github.com/sjqtentacles/sml-codec/ — the whole module set (base16, base64, crc32, sha1, sha256, each .sig/.sml, plus sml-codec.mlb and sources.mlb), so the .mlb resolves intact even though only CRC-32 is used. src/consistenthash.mlb references that sources.mlb first, then consistenthash.sig/consistenthash.sml; the Poly/ML use-chain in the Makefile loads the vendored codec sources first (in the order sources.mlb implies), then the consistent-hash sources, then the test driver, in dependency order. sml.pkg records the dependency in its require block so smlpkg sync can refresh it.

Example

make example builds a 4-node ring (150 virtual nodes each) over 10,000 keys, then adds and removes a node to show the churn (output is byte-identical under MLton and Poly/ML):

=== sml-consistenthash demo ===================================

Ring of 4 nodes, 150 virtual nodes each, on top of sml-codec CRC-32.
Distributing 10000 keys.

Load per node (keys owned):
  cache-a  2904 keys
  cache-b  2378 keys
  cache-c  2088 keys
  cache-d  2630 keys

Sample lookups:
  user-1 -> cache-a
  user-42 -> cache-a
  user-999 -> cache-d

Replication (getN k 2):
  user-1 -> [cache-a, cache-c]
  user-42 -> [cache-a, cache-c]

Add cache-e: 2185 of 10000 keys moved (21.8%).
Remove cache-b: 1563 of 10000 keys moved (15.6%).

===============================================================

Poly/ML note

CI builds Poly/ML 5.9.1 from source rather than using the Ubuntu package (Poly/ML 5.7.1), whose X86 code generator crashes (asGenReg raised while compiling) on some code. See .github/workflows/ci.yml.

License

MIT — see LICENSE.

About

Consistent hashing ring with virtual nodes in pure Standard ML (CRC32-based). Minimal remapping on membership change. MLton + Poly/ML.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors