Skip to content

feat: SQLite-backed state and durable work queues (honker) — v0.2.0#1

Merged
tomcasaburi merged 5 commits into
masterfrom
feat/honker-sqlite-queue
May 27, 2026
Merged

feat: SQLite-backed state and durable work queues (honker) — v0.2.0#1
tomcasaburi merged 5 commits into
masterfrom
feat/honker-sqlite-queue

Conversation

@tomcasaburi

@tomcasaburi tomcasaburi commented May 27, 2026

Copy link
Copy Markdown
Member

Summary

Moves the seeder's operational state and orchestration off in-memory JS (PQueue, setInterval, JSON state file) onto a single SQLite database backed by the honker extension (@russellthehippo/honker-node, pinned exactly to 0.3.3 — alpha software).

Bumps to v0.2.0 to semver-signal the structural change.

What changes for operators

  • Pubsub-routing re-provide storm on restart is gone. The 6h throttle now lives in a pubsub_routing_provides SQLite table that survives restart. Previously every restart fired ~40 redundant tracker announces within seconds.
  • Stale-pin GC works across restart. The set of currently-pinned CIDs per community lives in community_pins and survives restart. Previously a restart erased the in-memory tracking and old pages/post-update buckets leaked in kubo.
  • Ops visibility. sqlite3 $SEEDER_DB_PATH "SELECT queue, COUNT(*) FROM _honker_live GROUP BY queue" shows current queue depth. SELECT * FROM _honker_dead shows jobs that exhausted retries.
  • New env: SEEDER_DB_PATH (defaults to ./seeder.db; Docker image sets /data/seeder.db).
  • Auto-migration: existing seederState.json is read on first start into the communities table. The JSON file is left in place so users can roll back to 0.1.3 by deleting seeder.db.

What does NOT change

Throughput, memory, CPU, network behavior, the actual kubo/pkc seeding work — all identical.

Risk

honker is alpha (the maintainer says so). It's pinned exactly to 0.3.3 so future bumps are intentional, not transitive. The README now signals to users that this repo is experimental and supplemental — desktop apps (5chan Electron etc.) seed automatically and are the protocol's load-bearing seeders; this seeder is for operators who want consistent 24/7 contribution from a VPS.

Verified end-to-end locally on macOS arm64: full seed cycle, graceful SIGTERM, restart with both communities reporting pins unchanged (throttle + pin-tracking tables held), and orphan pin-op re-claimed by a fresh worker after visibility timeout.

Test plan

  • Existing npm test passes (8/8, helpers untouched)
  • Honker e2e probe: atomic outbox, claim/ack, scheduler tick, rollback drops both sides
  • Local e2e with bitsocial-cli's bundled daemon: discovery → subscribe → 2 community updates → pins + provides processed by workers
  • SIGTERM → state preserved → restart → no redundant work enqueued
  • Native module install works on darwin-arm64 (honker prebuilt; Docker covers linux-x64-gnu and linux-arm64-gnu prebuilts)
  • CI matrix expansion (lin musl, win-from-source) deferred until honker matures (no GitHub Releases yet, ~40 days old, solo maintainer)

Note

Medium Risk
Core runtime now depends on alpha honker/SQLite and a native module; seeding behavior is intended unchanged but persistence and shutdown paths are new operational surfaces.

Overview
v0.2.0 moves seeder orchestration and state off in-memory PQueue / setInterval / periodic seederState.json writes onto a single SQLite file (SEEDER_DB_PATH, default seeder.db) via honker (@russellthehippo/honker-node 0.3.3); p-queue is removed.

State: communities, community_pins, and pubsub_routing_provides tables back the seeded community list, durable pin tracking for stale-pin GC, and the 6h pubsub-routing re-provide throttle. seederState.communitiesSeeding is a getter/setter over SQLite; one-time import from legacy seederState.json when the DB is empty.

Work: Pin add/remove and pubsub routing provides are enqueued in honker queues with transactional enqueueTx alongside pin bookkeeping. start.js runs tick workers (discover, subscribe, pubsub, update check), pin/pubsub workers with retries, and a honker scheduler for periodic ticks; SIGINT/SIGTERM aborts workers and closes the DB.

Docs/ops: README adds supplemental/experimental positioning, a State section, and SEEDER_DB_PATH in Docker/compose; gitignore/dockerignore cover SQLite WAL/SHM files.

Reviewed by Cursor Bugbot for commit f58ec9a. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • New Features

    • Migrated state storage from JSON files to persistent SQLite database for improved reliability.
    • Added durable work queues for pin operations and pubsub routing to prevent job loss.
    • Introduced new configuration parameters: MAX_COMMUNITIES, PIN_CONCURRENCY, and seeder update-check controls.
  • Documentation

    • Added operational guide explaining runtime state storage and database structure.
  • Chores

    • Version bumped to 0.2.0; added honker-node dependency.

Review Change Stack

Replaces the in-memory PQueues, the in-memory `pinsToRemove` and
`pubsubRoutingPinsLastQueuedAt` maps, and the JSON `seederState.json`
file with a single SQLite database at `SEEDER_DB_PATH` (defaults to
`./seeder.db`; the Docker image sets it to `/data/seeder.db`).

Pin add/remove and pubsub-routing-provide jobs are now durable work
queues from honker (@russellthehippo/honker-node, pinned exactly to
0.3.3 — alpha). Each handler runs inside a single `db.transaction()`
that commits the queue row alongside the bookkeeping table update via
`enqueueTx`, so a crash mid-handler either lands both or neither.

Periodic ticks (discover, subscribe, pubsub join+routing-provide,
update-check) are registered with honker's leader-elected scheduler
instead of separate `setInterval` calls.

User-visible effects:

- The 6h pubsub-routing-provide throttle now survives restarts, so the
  seeder no longer re-announces every routing CID to the configured
  trackers within seconds of boot.
- The stale-pin set per community now survives restarts, so old pages
  and post-update buckets that were dropped from a community update
  get unpinned on the next tick instead of leaking in kubo.
- Existing `seederState.json` is auto-migrated into the `communities`
  table on first start; the JSON file is left in place for rollback.
- New env: `SEEDER_DB_PATH`. Inspect with
  `sqlite3 $SEEDER_DB_PATH "SELECT * FROM communities"`.

Verified end-to-end against the bundled bitsocial-cli daemon on
macOS arm64: full seed cycle (discovery → subscribe → 2 community
updates → 4 routing-provides + 2 content pins), graceful SIGTERM, and
restart with both communities reporting `pins unchanged` (throttle and
pin-tracking tables held across the restart). Orphan pin-op job from
the killed first instance was re-claimed by the second instance after
the visibility timeout.
Adds an "Is this the only way to seed?" section near the top of the
README clarifying that:

- Desktop apps (5chan Electron, etc.) seed automatically while open
  and are the load-bearing seeders of the protocol.
- `bitsocial-seeder` is supplemental — useful for consistent 24/7
  seeding from a VPS but not required for the network to function.
- This repo is treated as experimental: releases are frequent, minor
  versions may change internals, and the project is a place to try
  ideas that benefit the protocol but are not on its critical path.

Also documents the new SQLite-backed state store, the auto-migration
from `seederState.json`, and the new `SEEDER_DB_PATH` env in a "State"
section under "Configuration".
Comment thread start.js
Comment thread lib/seed-communities.js
@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR refactors the seeder from interval-based polling with in-memory queues to a durable, distributed architecture: community state migrates from JSON files to SQLite with automatic on-startup conversion; work is now queued durably and processed by worker loops; and a honker-based scheduler orchestrates all background tasks with graceful shutdown and configurable intervals.

Changes

Durable Queue-Based Seeder Architecture

Layer / File(s) Summary
Configuration, deployment, and dependency updates
package.json, config.js, Dockerfile, docker-compose.yml, .dockerignore, .gitignore
Version bumped to 0.2.0 and honker-node added as a dependency. Config adds seeding.db.path sourced from SEEDER_DB_PATH environment variable. Docker/Compose updated to pass /data/seeder.db as the database path; database artifacts added to ignore files.
SQLite database initialization
lib/db.js
New module opens SQLite via honker with path from config (default seeder.db), runs initialization SQL to create communities and community_pins tables, and exports db and dbPath for shared use.
Persistent state management with JSON migration
lib/seeder-state.js
communitiesSeeding now reads/writes to the communities table with transactional upsert/delete logic. A one-time migration function seeds the database from legacy seederState.json on first startup if the table is empty. File-based persistence is removed.
Durable work queues and worker loops
lib/seed-communities.js
Replaces p-queue with honker-backed pinOpQueue and pubsubRoutingQueue. Community updates are transactional: they enqueue pin-op jobs and update community_pins atomically. Pubsub routing provides are throttled per (community_key, cid) using pubsub_routing_provides table. New worker framework claims jobs, processes pin operations and pubsub routing, and refreshes throttle timestamps; spawnPinWorkers starts bounded worker instances per config.
Discovery function export and worker integration
lib/discover-communities.js
discoverCommunitiesFromLists is now exported as the worker-callable API; the old interval-wrapper discoverCommunities is removed so scheduling moves to startup orchestration.
Update check scheduler removal
lib/update-check.js
The startUpdateChecks scheduler helper is removed; scheduling is now owned by start.js via honker, which invokes the remaining update-check utility functions.
Startup orchestration with honker scheduler
start.js
Completely rewritten to load environment via dotenv/config, add graceful shutdown with AbortController (closing DB and aborting workers on SIGINT/SIGTERM), define tick-queue workers for discovery/subscription/pubsub/update-check, and register periodic scheduler tasks. The scheduler runs with leader key bitsocial-seeder, orchestrates background work with configurable intervals, and is tied to the shutdown signal.
Documentation and operational guidance
README.md
New "Is this the only way to seed?" section frames the repo as supplemental and operator-focused. Configuration documentation expanded to include SEEDER_DB_PATH, MAX_COMMUNITIES, PIN_CONCURRENCY, and update-check settings. New "State" section explains SQLite-backed persistence, one-time JSON migration, state schema, and includes a sample sqlite3 query for inspection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hop hop, the queues now run deep,
State in SQLite, no more JSON to keep,
Honker schedules the workers with care,
Durable and steady—seeding everywhere!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: SQLite-backed state persistence and durable work queues using honker, with version bump to 0.2.0. It is concise, specific, and clearly reflects the primary focus of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/honker-sqlite-queue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…hang

The startup `discoverTickQ.enqueue` was a one-shot, and the honker
scheduler wasn't started until after the wait loop. If that single
discover job failed (transient network blip, GitHub rate limit, etc.)
the wait loop polled forever with no retry path.

Re-enqueueing inside the wait loop restores the recovery behavior the
old setInterval-based discoverCommunities had before the honker
migration: every 10s while we have no communities yet, fire another
attempt. Once one succeeds the table is populated and the loop exits,
after which the scheduler takes over normal periodic discovery.

Caught by Cursor Bugbot on PR #1.
The 6h throttle check + enqueueTx + pubsub_routing_provides upsert
appeared verbatim in both handleCommunityUpdate and
providePubsubTopicRoutingCids. Folded into one
`enqueueRoutingProvideIfStale(tx, communityKey, address, pin, now)`
helper so a future change to the throttle window, query, or upsert
only needs to land in one place.

No behavior change. Caught by Cursor Bugbot on PR #1.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
start.js (1)

140-151: 💤 Low value

Consider making scheduler intervals configurable.

The subscribe (10 min) and pubsub (1 min) tick intervals are hardcoded. For operational flexibility, consider exposing these as config options similar to discoverIntervalMs, allowing operators to tune based on their network conditions and load requirements.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@start.js` around lines 140 - 151, The subscribe-tick and pubsub-tick
intervals are hardcoded; make them configurable by introducing new config values
(e.g., subscribeTickIntervalMs and pubsubTickIntervalMs) with defaults of
10*60*1000 and 60*1000 respectively (similar to discoverIntervalMs), read them
from the existing config object or ENV, and replace the literal values in the
scheduler.add calls that create the 'subscribe-tick' and 'pubsub-tick' jobs so
schedule: everyS(...) uses the configured intervals; ensure validation/coercion
to numbers and fallback to the defaults if not provided.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/seeder-state.js`:
- Around line 60-62: When keptKeys is empty the current if block skips deletion
and stale rows remain; update the logic around keptKeys/tx.execute to ensure
deletion runs even when keptKeys.length === 0 by invoking
tx.execute(buildDeleteNotIn(0), []) (or a dedicated delete-all SQL helper) in
the else path so buildDeleteNotIn is always executed and persisted rows are
cleared; modify the code that references keptKeys, tx.execute, and
buildDeleteNotIn to add this else-case.

---

Nitpick comments:
In `@start.js`:
- Around line 140-151: The subscribe-tick and pubsub-tick intervals are
hardcoded; make them configurable by introducing new config values (e.g.,
subscribeTickIntervalMs and pubsubTickIntervalMs) with defaults of 10*60*1000
and 60*1000 respectively (similar to discoverIntervalMs), read them from the
existing config object or ENV, and replace the literal values in the
scheduler.add calls that create the 'subscribe-tick' and 'pubsub-tick' jobs so
schedule: everyS(...) uses the configured intervals; ensure validation/coercion
to numbers and fallback to the defaults if not provided.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a19b524-775c-411f-b0ac-3292b2477c2f

📥 Commits

Reviewing files that changed from the base of the PR and between a5eac63 and 12063f4.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (13)
  • .dockerignore
  • .gitignore
  • Dockerfile
  • README.md
  • config.js
  • docker-compose.yml
  • lib/db.js
  • lib/discover-communities.js
  • lib/seed-communities.js
  • lib/seeder-state.js
  • lib/update-check.js
  • package.json
  • start.js

Comment thread lib/seeder-state.js
…h []

Previously the DELETE branch only ran when keptKeys was non-empty, so
`seederState.communitiesSeeding = []` silently kept stale rows. The
setter signature suggests "this replaces the list" so an empty input
should empty the table. Today's callers guard against this path
(discoverCommunitiesFromLists skips the assignment when the merged
map is empty) so it's latent, but the setter shouldn't leave a
correctness footgun for future callers.

Caught by CodeRabbit on PR #1.
@tomcasaburi tomcasaburi merged commit 57a2cf6 into master May 27, 2026
5 checks passed
tomcasaburi added a commit that referenced this pull request May 27, 2026
…hang

The startup `discoverTickQ.enqueue` was a one-shot, and the honker
scheduler wasn't started until after the wait loop. If that single
discover job failed (transient network blip, GitHub rate limit, etc.)
the wait loop polled forever with no retry path.

Re-enqueueing inside the wait loop restores the recovery behavior the
old setInterval-based discoverCommunities had before the honker
migration: every 10s while we have no communities yet, fire another
attempt. Once one succeeds the table is populated and the loop exits,
after which the scheduler takes over normal periodic discovery.

Caught by Cursor Bugbot on PR #1.
tomcasaburi added a commit that referenced this pull request May 27, 2026
The 6h throttle check + enqueueTx + pubsub_routing_provides upsert
appeared verbatim in both handleCommunityUpdate and
providePubsubTopicRoutingCids. Folded into one
`enqueueRoutingProvideIfStale(tx, communityKey, address, pin, now)`
helper so a future change to the throttle window, query, or upsert
only needs to land in one place.

No behavior change. Caught by Cursor Bugbot on PR #1.
@tomcasaburi tomcasaburi deleted the feat/honker-sqlite-queue branch May 27, 2026 16:15

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f58ec9a. Configure here.

Comment thread lib/seed-communities.js

const allNewCids = new Set([...contentPins.map(p => p.cid), ...pubsubRoutingPins.map(p => p.cid)])
const now = Math.floor(Date.now() / 1000)
const provideIntervalMs = config.seeding.pubsubRoutingProvideIntervalMs

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable assigned but never read

Low Severity

provideIntervalMs is assigned from config.seeding.pubsubRoutingProvideIntervalMs but never referenced anywhere in handleCommunityUpdate. The enqueueRoutingProvideIfStale function reads from config.seeding.pubsubRoutingProvideIntervalMs directly instead. This is a dead store left over from refactoring.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f58ec9a. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant