feat: SQLite-backed state and durable work queues (honker) — v0.2.0 by tomcasaburi · Pull Request #1 · bitsocialnet/bitsocial-seeder

tomcasaburi · 2026-05-27T12:15:13Z

Summary

Moves the seeder's operational state and orchestration off in-memory JS (PQueue, setInterval, JSON state file) onto a single SQLite database backed by the honker extension (@russellthehippo/honker-node, pinned exactly to 0.3.3 — alpha software).

Bumps to v0.2.0 to semver-signal the structural change.

What changes for operators

Pubsub-routing re-provide storm on restart is gone. The 6h throttle now lives in a pubsub_routing_provides SQLite table that survives restart. Previously every restart fired ~40 redundant tracker announces within seconds.
Stale-pin GC works across restart. The set of currently-pinned CIDs per community lives in community_pins and survives restart. Previously a restart erased the in-memory tracking and old pages/post-update buckets leaked in kubo.
Ops visibility. sqlite3 $SEEDER_DB_PATH "SELECT queue, COUNT(*) FROM _honker_live GROUP BY queue" shows current queue depth. SELECT * FROM _honker_dead shows jobs that exhausted retries.
New env: SEEDER_DB_PATH (defaults to ./seeder.db; Docker image sets /data/seeder.db).
Auto-migration: existing seederState.json is read on first start into the communities table. The JSON file is left in place so users can roll back to 0.1.3 by deleting seeder.db.

What does NOT change

Throughput, memory, CPU, network behavior, the actual kubo/pkc seeding work — all identical.

Risk

honker is alpha (the maintainer says so). It's pinned exactly to 0.3.3 so future bumps are intentional, not transitive. The README now signals to users that this repo is experimental and supplemental — desktop apps (5chan Electron etc.) seed automatically and are the protocol's load-bearing seeders; this seeder is for operators who want consistent 24/7 contribution from a VPS.

Verified end-to-end locally on macOS arm64: full seed cycle, graceful SIGTERM, restart with both communities reporting pins unchanged (throttle + pin-tracking tables held), and orphan pin-op re-claimed by a fresh worker after visibility timeout.

Test plan

Existing npm test passes (8/8, helpers untouched)
Honker e2e probe: atomic outbox, claim/ack, scheduler tick, rollback drops both sides
Local e2e with bitsocial-cli's bundled daemon: discovery → subscribe → 2 community updates → pins + provides processed by workers
SIGTERM → state preserved → restart → no redundant work enqueued
Native module install works on darwin-arm64 (honker prebuilt; Docker covers linux-x64-gnu and linux-arm64-gnu prebuilts)
CI matrix expansion (lin musl, win-from-source) deferred until honker matures (no GitHub Releases yet, ~40 days old, solo maintainer)

Note

Medium Risk
Core runtime now depends on alpha honker/SQLite and a native module; seeding behavior is intended unchanged but persistence and shutdown paths are new operational surfaces.

Overview
v0.2.0 moves seeder orchestration and state off in-memory PQueue / setInterval / periodic seederState.json writes onto a single SQLite file (SEEDER_DB_PATH, default seeder.db) via honker (@russellthehippo/honker-node 0.3.3); p-queue is removed.

State: communities, community_pins, and pubsub_routing_provides tables back the seeded community list, durable pin tracking for stale-pin GC, and the 6h pubsub-routing re-provide throttle. seederState.communitiesSeeding is a getter/setter over SQLite; one-time import from legacy seederState.json when the DB is empty.

Work: Pin add/remove and pubsub routing provides are enqueued in honker queues with transactional enqueueTx alongside pin bookkeeping. start.js runs tick workers (discover, subscribe, pubsub, update check), pin/pubsub workers with retries, and a honker scheduler for periodic ticks; SIGINT/SIGTERM aborts workers and closes the DB.

Docs/ops: README adds supplemental/experimental positioning, a State section, and SEEDER_DB_PATH in Docker/compose; gitignore/dockerignore cover SQLite WAL/SHM files.

^{Reviewed by Cursor Bugbot for commit f58ec9a. Bugbot is set up for automated code reviews on this repo. Configure here.}

Summary by CodeRabbit

New Features
- Migrated state storage from JSON files to persistent SQLite database for improved reliability.
- Added durable work queues for pin operations and pubsub routing to prevent job loss.
- Introduced new configuration parameters: MAX_COMMUNITIES, PIN_CONCURRENCY, and seeder update-check controls.
Documentation
- Added operational guide explaining runtime state storage and database structure.
Chores
- Version bumped to 0.2.0; added honker-node dependency.

Replaces the in-memory PQueues, the in-memory `pinsToRemove` and `pubsubRoutingPinsLastQueuedAt` maps, and the JSON `seederState.json` file with a single SQLite database at `SEEDER_DB_PATH` (defaults to `./seeder.db`; the Docker image sets it to `/data/seeder.db`). Pin add/remove and pubsub-routing-provide jobs are now durable work queues from honker (@russellthehippo/honker-node, pinned exactly to 0.3.3 — alpha). Each handler runs inside a single `db.transaction()` that commits the queue row alongside the bookkeeping table update via `enqueueTx`, so a crash mid-handler either lands both or neither. Periodic ticks (discover, subscribe, pubsub join+routing-provide, update-check) are registered with honker's leader-elected scheduler instead of separate `setInterval` calls. User-visible effects: - The 6h pubsub-routing-provide throttle now survives restarts, so the seeder no longer re-announces every routing CID to the configured trackers within seconds of boot. - The stale-pin set per community now survives restarts, so old pages and post-update buckets that were dropped from a community update get unpinned on the next tick instead of leaking in kubo. - Existing `seederState.json` is auto-migrated into the `communities` table on first start; the JSON file is left in place for rollback. - New env: `SEEDER_DB_PATH`. Inspect with `sqlite3 $SEEDER_DB_PATH "SELECT * FROM communities"`. Verified end-to-end against the bundled bitsocial-cli daemon on macOS arm64: full seed cycle (discovery → subscribe → 2 community updates → 4 routing-provides + 2 content pins), graceful SIGTERM, and restart with both communities reporting `pins unchanged` (throttle and pin-tracking tables held across the restart). Orphan pin-op job from the killed first instance was re-claimed by the second instance after the visibility timeout.

Adds an "Is this the only way to seed?" section near the top of the README clarifying that: - Desktop apps (5chan Electron, etc.) seed automatically while open and are the load-bearing seeders of the protocol. - `bitsocial-seeder` is supplemental — useful for consistent 24/7 seeding from a VPS but not required for the network to function. - This repo is treated as experimental: releases are frequent, minor versions may change internals, and the project is a place to try ideas that benefit the protocol but are not on its critical path. Also documents the new SQLite-backed state store, the auto-migration from `seederState.json`, and the new `SEEDER_DB_PATH` env in a "State" section under "Configuration".

coderabbitai · 2026-05-27T12:45:22Z

📝 Walkthrough

Walkthrough

This PR refactors the seeder from interval-based polling with in-memory queues to a durable, distributed architecture: community state migrates from JSON files to SQLite with automatic on-startup conversion; work is now queued durably and processed by worker loops; and a honker-based scheduler orchestrates all background tasks with graceful shutdown and configurable intervals.

Changes

Durable Queue-Based Seeder Architecture

Layer / File(s)	Summary
Configuration, deployment, and dependency updates `package.json`, `config.js`, `Dockerfile`, `docker-compose.yml`, `.dockerignore`, `.gitignore`	Version bumped to 0.2.0 and honker-node added as a dependency. Config adds `seeding.db.path` sourced from `SEEDER_DB_PATH` environment variable. Docker/Compose updated to pass `/data/seeder.db` as the database path; database artifacts added to ignore files.
SQLite database initialization `lib/db.js`	New module opens SQLite via honker with path from config (default `seeder.db`), runs initialization SQL to create `communities` and `community_pins` tables, and exports `db` and `dbPath` for shared use.
Persistent state management with JSON migration `lib/seeder-state.js`	`communitiesSeeding` now reads/writes to the `communities` table with transactional upsert/delete logic. A one-time migration function seeds the database from legacy `seederState.json` on first startup if the table is empty. File-based persistence is removed.
Durable work queues and worker loops `lib/seed-communities.js`	Replaces p-queue with honker-backed `pinOpQueue` and `pubsubRoutingQueue`. Community updates are transactional: they enqueue pin-op jobs and update `community_pins` atomically. Pubsub routing provides are throttled per `(community_key, cid)` using `pubsub_routing_provides` table. New worker framework claims jobs, processes pin operations and pubsub routing, and refreshes throttle timestamps; `spawnPinWorkers` starts bounded worker instances per config.
Discovery function export and worker integration `lib/discover-communities.js`	`discoverCommunitiesFromLists` is now exported as the worker-callable API; the old interval-wrapper `discoverCommunities` is removed so scheduling moves to startup orchestration.
Update check scheduler removal `lib/update-check.js`	The `startUpdateChecks` scheduler helper is removed; scheduling is now owned by `start.js` via honker, which invokes the remaining update-check utility functions.
Startup orchestration with honker scheduler `start.js`	Completely rewritten to load environment via `dotenv/config`, add graceful shutdown with `AbortController` (closing DB and aborting workers on SIGINT/SIGTERM), define tick-queue workers for discovery/subscription/pubsub/update-check, and register periodic scheduler tasks. The scheduler runs with leader key `bitsocial-seeder`, orchestrates background work with configurable intervals, and is tied to the shutdown signal.
Documentation and operational guidance `README.md`	New "Is this the only way to seed?" section frames the repo as supplemental and operator-focused. Configuration documentation expanded to include `SEEDER_DB_PATH`, `MAX_COMMUNITIES`, `PIN_CONCURRENCY`, and update-check settings. New "State" section explains SQLite-backed persistence, one-time JSON migration, state schema, and includes a sample `sqlite3` query for inspection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hop hop, the queues now run deep,
State in SQLite, no more JSON to keep,
Honker schedules the workers with care,
Durable and steady—seeding everywhere!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: SQLite-backed state persistence and durable work queues using honker, with version bump to 0.2.0. It is concise, specific, and clearly reflects the primary focus of the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/honker-sqlite-queue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…hang The startup `discoverTickQ.enqueue` was a one-shot, and the honker scheduler wasn't started until after the wait loop. If that single discover job failed (transient network blip, GitHub rate limit, etc.) the wait loop polled forever with no retry path. Re-enqueueing inside the wait loop restores the recovery behavior the old setInterval-based discoverCommunities had before the honker migration: every 10s while we have no communities yet, fire another attempt. Once one succeeds the table is populated and the loop exits, after which the scheduler takes over normal periodic discovery. Caught by Cursor Bugbot on PR #1.

The 6h throttle check + enqueueTx + pubsub_routing_provides upsert appeared verbatim in both handleCommunityUpdate and providePubsubTopicRoutingCids. Folded into one `enqueueRoutingProvideIfStale(tx, communityKey, address, pin, now)` helper so a future change to the throttle window, query, or upsert only needs to land in one place. No behavior change. Caught by Cursor Bugbot on PR #1.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

start.js (1)
140-151: 💤 Low value

Consider making scheduler intervals configurable.

The subscribe (10 min) and pubsub (1 min) tick intervals are hardcoded. For operational flexibility, consider exposing these as config options similar to discoverIntervalMs, allowing operators to tune based on their network conditions and load requirements.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@start.js` around lines 140 - 151, The subscribe-tick and pubsub-tick
intervals are hardcoded; make them configurable by introducing new config values
(e.g., subscribeTickIntervalMs and pubsubTickIntervalMs) with defaults of
10*60*1000 and 60*1000 respectively (similar to discoverIntervalMs), read them
from the existing config object or ENV, and replace the literal values in the
scheduler.add calls that create the 'subscribe-tick' and 'pubsub-tick' jobs so
schedule: everyS(...) uses the configured intervals; ensure validation/coercion
to numbers and fallback to the defaults if not provided.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/seeder-state.js`:
- Around line 60-62: When keptKeys is empty the current if block skips deletion
and stale rows remain; update the logic around keptKeys/tx.execute to ensure
deletion runs even when keptKeys.length === 0 by invoking
tx.execute(buildDeleteNotIn(0), []) (or a dedicated delete-all SQL helper) in
the else path so buildDeleteNotIn is always executed and persisted rows are
cleared; modify the code that references keptKeys, tx.execute, and
buildDeleteNotIn to add this else-case.

---

Nitpick comments:
In `@start.js`:
- Around line 140-151: The subscribe-tick and pubsub-tick intervals are
hardcoded; make them configurable by introducing new config values (e.g.,
subscribeTickIntervalMs and pubsubTickIntervalMs) with defaults of 10*60*1000
and 60*1000 respectively (similar to discoverIntervalMs), read them from the
existing config object or ENV, and replace the literal values in the
scheduler.add calls that create the 'subscribe-tick' and 'pubsub-tick' jobs so
schedule: everyS(...) uses the configured intervals; ensure validation/coercion
to numbers and fallback to the defaults if not provided.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a19b524-775c-411f-b0ac-3292b2477c2f

📥 Commits

Reviewing files that changed from the base of the PR and between a5eac63 and 12063f4.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (13)

.dockerignore
.gitignore
Dockerfile
README.md
config.js
docker-compose.yml
lib/db.js
lib/discover-communities.js
lib/seed-communities.js
lib/seeder-state.js
lib/update-check.js
package.json
start.js

…h [] Previously the DELETE branch only ran when keptKeys was non-empty, so `seederState.communitiesSeeding = []` silently kept stale rows. The setter signature suggests "this replaces the list" so an empty input should empty the table. Today's callers guard against this path (discoverCommunitiesFromLists skips the assignment when the merged map is empty) so it's latent, but the setter shouldn't leave a correctness footgun for future callers. Caught by CodeRabbit on PR #1.

…hang The startup `discoverTickQ.enqueue` was a one-shot, and the honker scheduler wasn't started until after the wait loop. If that single discover job failed (transient network blip, GitHub rate limit, etc.) the wait loop polled forever with no retry path. Re-enqueueing inside the wait loop restores the recovery behavior the old setInterval-based discoverCommunities had before the honker migration: every 10s while we have no communities yet, fire another attempt. Once one succeeds the table is populated and the loop exits, after which the scheduler takes over normal periodic discovery. Caught by Cursor Bugbot on PR #1.

The 6h throttle check + enqueueTx + pubsub_routing_provides upsert appeared verbatim in both handleCommunityUpdate and providePubsubTopicRoutingCids. Folded into one `enqueueRoutingProvideIfStale(tx, communityKey, address, pin, now)` helper so a future change to the throttle window, query, or upsert only needs to land in one place. No behavior change. Caught by Cursor Bugbot on PR #1.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f58ec9a. Configure here.}

cursor · 2026-05-27T16:19:35Z

+
+  const allNewCids = new Set([...contentPins.map(p => p.cid), ...pubsubRoutingPins.map(p => p.cid)])
+  const now = Math.floor(Date.now() / 1000)
+  const provideIntervalMs = config.seeding.pubsubRoutingProvideIntervalMs


Unused variable assigned but never read

Low Severity

provideIntervalMs is assigned from config.seeding.pubsubRoutingProvideIntervalMs but never referenced anywhere in handleCommunityUpdate. The enqueueRoutingProvideIfStale function reads from config.seeding.pubsubRoutingProvideIntervalMs directly instead. This is a dead store left over from refactoring.

^{Reviewed by Cursor Bugbot for commit f58ec9a. Configure here.}

tomcasaburi added 2 commits May 27, 2026 19:13

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread start.js

Comment thread lib/seed-communities.js

tomcasaburi added 2 commits May 27, 2026 23:05

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread lib/seeder-state.js

tomcasaburi merged commit 57a2cf6 into master May 27, 2026
5 checks passed

tomcasaburi deleted the feat/honker-sqlite-queue branch May 27, 2026 16:15

cursor Bot reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SQLite-backed state and durable work queues (honker) — v0.2.0#1

feat: SQLite-backed state and durable work queues (honker) — v0.2.0#1
tomcasaburi merged 5 commits into
masterfrom
feat/honker-sqlite-queue

tomcasaburi commented May 27, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tomcasaburi commented May 27, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes for operators

What does NOT change

Risk

Test plan

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 27, 2026

Choose a reason for hiding this comment

Unused variable assigned but never read

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tomcasaburi commented May 27, 2026 •

edited by cursor Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading