rrr: safety + perf milestone — Marshal Vec rewrite, ClientConnection @safe class, HashMap/Weak library fixes#67
Merged
Merged
Conversation
Three pieces:
1. **rusty-cpp submodule bump** to commit 519b5861 ("Enable borrow-checking
C++23 module TUs in libc++ / module-graph builds"). Three fixes upstream:
- `build.rs` now honors `LIBCLANG_PATH` first (was hardcoded to
`/usr/lib/llvm-14/lib`, which lost to a system clang too old to
export `clang_CXXMethod_isDeleted`).
- `src/main.rs` passes `-resource-dir=` (not `-I<dir>/include`) when
compile_commands.json already owns the STL paths, so libclang's
`__has_include_next` dance with libc++ wrappers works. Also
passes through `-D` / `-U` / `-m` / `-f` / `-O` / `-g` flags
from compile_commands.json so `-march=native` reaches libclang
(without it, `__builtin_ia32_*` intrinsics fail to declare).
- `src/parser/mod.rs` adds `-ferror-limit=0` so large module TUs
don't abort on libc++ vs `import std;` duplicate-decl spam.
2. **`RRR_BORROW_SRC` updated** to include the modular .cpp's that were
previously omitted with "module unit" comments: `base/basetypes`,
`base/debugging`, `base/logging`, `base/misc`, `base/strop`,
`base/threading`, `base/unittest`, `misc/alock`, `misc/any_message`,
`misc/marshal`, `misc/rand`, `rpc/utils`, `rpc/client`. The
`reactor/*.cc` lineup is replaced with `reactor/{epoll_wrapper.cc,
fiber.cpp,future.cpp,reactor.cpp}` to reflect the post-modularization
layout. `rpc/server.cpp` stays omitted — libclang 22 crashes parsing
it (separate clang-22 issue, distinct from the codegen-crash
workarounds in srpc_module_migration_plan.md).
3. **Build-order dependency**: each per-file `borrow_check_rrr_borrow_<F>`
target now `add_dependencies(... rrr)`. Without this, ninja
parallelizes borrow checks with `rrr`'s compilation and the checks
start before `rrr.dir/.../*.o.modmap` and the matching `.pcm` BMIs
exist — libclang then can't resolve `import rrr.X;` and analysis
runs without module context. Adding the dep at the
`borrow_check_all_rrr_borrow` aggregate isn't enough; ninja groups
skip the implied dep for siblings.
End-to-end result on clang 22.1.5, `-j16`: `ninja borrow_check_rrr`
takes ~1m53s, analyzes 17 files, 15 are clean, 2 produce real
findings (`reactor.cpp`: 1; `client.cpp`: 23 — all `rusty::*` container
methods called from `@safe` code without `@unsafe` blocks). Zero
modmap-not-found warnings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Submodule bump (rusty-cpp d53d9c7) flips `option.hpp` and `result.hpp`
namespace default from `@unsafe` to `@safe`, matching the other rusty
containers. That alone closes ~22 of the 24 findings — they were noise
from those container methods (`Option::unwrap`, `Result::unwrap`,
`Option::is_some`, `RefCell::borrow`, ...) being marked unsafe by
default.
Project-side touches for the 2 remaining sites:
- `src/rrr/base/debugging.cpp`: `// @safe` annotation on `rrr::verify`
(pure precondition check; aborts on failure, parity with Rust's
`assert!`). Note: this does not yet take effect on cross-module
callers — rusty-cpp's annotation parser doesn't follow `import`
statements (only `#include`), so reactor.cpp / client.cpp / etc.
that import `rrr.debugging` still see `verify` as unannotated.
File the upstream issue and leave the annotation in for the day
that lands.
- `src/rrr/reactor/reactor.cpp:2012`: drop the obsolete
`// @unsafe { verify(wakeup_time > 0); }` wrapper. The text parser
mis-classified the wrapped call as a function declaration (the
`> 0` inside the `verify(...)` triggers
`is_function_declaration`'s template-syntax heuristic). With
verify itself now annotated `@safe`, the wrapper is no longer
needed anyway.
- `src/rrr/rpc/client.cpp:2893`: rewrite `(*fu_ptr.unwrap()).clone()`
as `fu_ptr.unwrap()->clone()` plus a multi-line `@unsafe { ... }`
block around the assignment. `HashMap::get` returns `Option<V*>`
so `unwrap()` yields a raw pointer; the deref is genuinely needed
here. rusty-cpp's pointer-safety analysis doesn't honor inline
`@unsafe { }` blocks (separate upstream gap), so this finding
persists in the report even though the source-level intent is
annotated.
End-to-end on clang 22.1.5, `-j16`, `ninja borrow_check_rrr`:
Before: 24 findings (reactor.cpp 1, client.cpp 23)
After: 2 findings (reactor.cpp 1, client.cpp 1)
Both remaining findings are documented upstream-rusty-cpp limitations,
not real safety regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up two upstream rusty-cpp parser fixes (committed locally on
rusty-cpp `main`, ahead of origin by 5 commits):
- 278737e header_cache: follow C++23 `import X.Y.Z;` for annotation
discovery. Resolves the first of the two remaining false-positive
findings from the earlier pass: `Calling non-safe function
'rrr::verify' at line 1361 ...` in reactor.cpp. With this change,
the parser walks `import rrr.debugging;` to
`src/rrr/base/debugging.cpp` and picks up the `// @safe`
annotation on `verify`.
- 4b82823 ast_visitor: recognize multi-line `// @unsafe` comment
blocks. Resolves the second false-positive: `Unsafe pointer
dereference in function call at line 2896 ...` in client.cpp.
The `check_for_unsafe_annotation` heuristic now walks UP through
the contiguous comment block above a `{`, instead of inspecting
only the immediately-preceding line. The multi-line `// @unsafe`
annotation on the HashMap-get raw-pointer deref is now honored.
End-to-end on clang 22.1.5, `-j16`, `ninja borrow_check_rrr`:
Before this bump: 2 spurious findings + ~11 legitimate ones.
After this bump: 0 spurious findings + 11 legitimate ones.
The 11 legitimate findings are calls to `rrr::SpinLock::lock` /
`unlock` / `rrr::PollThread::add` from `@safe` rrr code without
`@unsafe { }` wrappers. Those methods are explicitly `@unsafe` in
their own module declarations; they were exposed (correctly) by the
import-chasing fix. Fixing them is a separate code-review pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two source-level changes close all 11 outstanding findings from
`ninja borrow_check_rrr`:
- `src/rrr/base/threading.cpp` — flip `SpinLock::lock` and
`SpinLock::unlock` from `@unsafe` to `@safe`. Parity with Rust's
`Mutex::lock` / drop semantics: atomic compare/exchange and store
are memory-safe; the only genuinely unsafe call inside `lock()`
is `nanosleep(&t, nullptr)` (passes the address of a stack-local
`timespec` to a libc syscall), and that's now wrapped in an
`@unsafe { ... }` block. The prior `@unsafe` annotation on
`unlock()` was over-conservative — `std::atomic::store` is
memory-safe outright. Removes 10 of the 11 findings (every
`ClientPool::*` call to `SpinLock::lock` / `unlock`).
- `src/rrr/rpc/client.cpp` — restructure `Client::close`'s
poll-thread-add block. The `// @unsafe - schedules channel proxy
close on poll thread` annotation was previously a statement-level
comment; rusty-cpp only honors block-level `// @unsafe` (the
comment must immediately precede a `{`). Wrap the body in an
explicit `{ ... }` so the annotation now binds. Removes the last
finding.
End-to-end on clang 22.1.5, `-j16`, `ninja borrow_check_rrr`:
Before: 11 findings (1 PollThread::add + 10 SpinLock::lock/unlock).
After: 0 findings. 17/17 files clean, build succeeds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Was: 17 hand-picked module units (the "correctness-critical" subset).
Now: every non-test C++ source file in `src/rrr/{base,misc,reactor,rpc}/`
that the borrow checker can actually parse, with two exclusions called
out inline with the reason:
- `rpc/server.cpp` — clang-22 libclang crash on parse. Pre-existing.
- `rpc/fiber_channel.cpp` — same clang-22 libclang crash, discovered
during this expansion. Likely the same bug class as server.cpp
(a `create_sp_event<>`-shaped pattern in module purview).
- `reactor/fiber_context_{aarch64,x86_64}.cc` — implicitly excluded
(asm-only context-switch trampolines, ~30 lines each, not in any
cmake source list).
End-to-end on clang 22.1.5, `-j16`, `ninja borrow_check_rrr`:
43 files analyzed, 43 clean, 0 violations. Total time ~57s.
Caveat on the "0 violations" result: of the 27 files added in this
commit, 22 currently have NO `@safe` annotations and pass the check
vacuously (no `@safe→@unsafe` call sites to flag). The remaining 5
do have meaningful `@safe` coverage:
- `rpc/idempotency.cpp` — 37 `@safe`, 10 `@unsafe`
- `rpc/completion_tracker.cpp` — 32 `@safe`, 5 `@unsafe`
- `rpc/request_queue.cpp` — 4 `@safe`, 39 `@unsafe`
- `misc/serializable.cpp` — 2 `@safe`, 6 `@unsafe`
- `rpc/tcp_channel.cpp` — 0 `@safe`, 22 `@unsafe`
Future annotation work on the 22 vacuous-pass files will start
producing meaningful checks; until then this expansion's value is
ensuring the build infrastructure covers every borrow-checkable file
in rrr (regression-proof against new findings).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h recovery)
Both files previously hard-failed libclang parse with "Failed to parse
file: Crash" (clang-22 libclang bug; clang++ CLI compiles them fine).
With the rusty-cpp bump in this commit (c12ef31), the parser now
detects the Crash and retries with module args dropped, yielding a
partial AST that's still usable for borrow analysis.
Three pieces:
- **submodule bump (third-party/rusty-cpp → c12ef31)**: trust the
compile_commands compiler for `-resource-dir=...` so resource-dir
matches the toolchain that built the PCMs, and add a Crash-recovery
fallback that retries with std-only module inputs.
- **CMakeLists.txt**: drop the `# omitted — clang-22 libclang crash`
exclusions for `rpc/server.cpp` and `rpc/fiber_channel.cpp`. Both
now go through the recovery path. Inline comment notes that
findings on these two are partial-info (cross-module callee names
not resolved) until upstream libclang is fixed.
- **src/rrr/rpc/server.cpp**: reconcile four decl-vs-def annotation
mismatches. The class-body declarations of `run_async` (both
ServerConnection and DeferredReply overloads), `wait_for_shutdown`,
and `add_shutdown_hook` said `@safe`; the out-of-line definitions
said `@unsafe`. The bodies do unsafe things (invoke caller-supplied
`rusty::Function<void()>` callbacks; push to a mutex-guarded Vec;
block on a condvar with arbitrary predicate). Flip declarations
to `@unsafe` to match the definitions and the actual behaviour.
These were the 4 partial-info findings that the recovery surfaced.
End-to-end on clang 22.1.5, `-j16`, `ninja borrow_check_rrr`:
Before: 43 files analyzed, 2 hard-failed, 0 findings (with the 2
excluded).
After: 45 files analyzed, 0 hard failures, 0 findings.
That's the complete borrow-checkable surface of rrr (45 of 47 .cpp/.cc
files in src/rrr/{base,misc,reactor,rpc}/; the 2 holdouts are
asm-only context-switch trampolines).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures everything we learned during the bisection: the trigger (first-time user-type template instantiation in module purview), the parse-state-sensitivity (file truncations at 699 succeed but 700 crashes; 710 succeeds again), the matched-toolchain test that isolated this as a libclang-22 regression vs libclang-19, and the reason we don't downgrade (clang-19's own multi-attachment bug blocks rrr's server.cpp from compiling). Includes a filing checklist for an upstream LLVM bug report — not yet filed (per Shuai's preference to hold). Cross-references the workaround in third-party/rusty-cpp's parser and the broader migration plan doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…annotations
Switch the rrr borrow-check pipeline (and reference build) from Homebrew
clang 22.1.5 to Homebrew clang 21.1.8. clang 21 is the sweet spot: it
fixed the clang-19 multi-attachment trap (the original reason we left
clang 19), and it does not exhibit clang 22's libclang parse-time crash
on first-time user-type template instantiations in module purview that
forced rusty-cpp into degraded crash-recovery on server.cpp and
fiber_channel.cpp.
clang 21 is also a stricter compile, which surfaced one latent bug:
* rpc/load_balancer.cpp: drop the GMF forward-decl
`namespace rrr { class Client; }`. The class is fully declared in
client.cpp's `rrr.client` module purview, and clang 21 (correctly)
rejects a global-module forward-decl that re-introduces the same
name. The three "private member" errors clang 21 reported in client.cpp
were symptoms of the duplicate-Client declaration, not real
visibility violations — they vanish with this delete.
Bump third-party/rusty-cpp to df79169 so the analyzer can run cleanly
on a clang-21 toolchain:
* skip-cross-file-bodies: the analyzer now restricts its scan to the
TU under check rather than the whole import graph. Previously this
produced 181 cross-module-noise findings on the clang-21 pipeline
because libclang's AST exposes function bodies from imported modules.
* qualify module annotations by namespace: annotations harvested from
`export module rrr.foo;` units are now stored under their derived
namespace (`rrr::Log_warn`, not bare `Log_warn`), matching the
qualified lookup the analyzer uses.
Project-side annotation reconciliations needed for full-info borrow
check (0 findings on 45/45 files):
* base/logging.cpp: annotate `Log_debug/Log_info/Log_warn/Log_error`
template shims `@safe`. They forward printf-style format strings
(literals at every call site we control) plus variadic args; no
memory operations escape.
* misc/serializable.cpp: `registry()` returns a reference into a
process-wide singleton (`'static`-lifetime). Marked `@unsafe`
because rusty-cpp doesn't yet express `&'static` lifetimes; safer
than a half-modelled `@lifetime` annotation it can't enforce.
* rpc/client.cpp: `enqueue_heartbeat_probe` decl and its single call
site marked `@unsafe`. The body uses Marshal `operator<<` chains
that rusty-cpp treats as unsafe by default; wrapping the call site
is cleaner than restructuring the Marshal API.
* rpc/server.cpp: declaration of `Server::close` aligned with the
body's `@unsafe` annotation. Two stale `// @unsafe - Log_warn`
inline comments removed — they confused the text parser into
attaching `@unsafe` to the surrounding scope.
Verification on clang-21:
borrow_check_rrr: 45/45 files clean, 0 findings, 1m54s.
…ad_local`
Required for clang 21 production link. clang 21 emits module-attached
class-static `thread_local` storage as a strong external in every TU
that uses the variable (typically via an inline accessor like
`is_on_poll_thread()`), causing duplicate-definition linker errors
when more than one consumer links into the same executable:
multiple definition of `TLS init function for
rrr::Reactor@rrr.reactor::sp_reactor_th_'
clang 22 happened to keep these in vague linkage; clang 19 didn't
emit them in consumer TUs at all. `inline` makes the linkage explicit
and toolchain-independent: the symbol is vague-linkage in every TU
that sees the declaration, the linker dedupes, and there's no need
for an out-of-class definition.
Variables converted:
- `Reactor::sp_reactor_th_`
- `Reactor::sp_disk_reactor_th_`
- `Reactor::sp_running_fiber_th_`
- `PollThreadWorker::current_worker_`
The non-thread_local `clients_` / `dangling_ips_` HashMaps and the
plain SpinLock `trying_job_` are left as out-of-class definitions —
they don't have inline accessors and don't trip the regression.
Verified on Homebrew clang 21.1.8:
- clean `rpcbench` link (1m4s wallclock).
- `borrow_check_rrr`: 45/45 files, no violations.
- CMakeLists.txt: prefer Homebrew/Linuxbrew clang-21 libclang for the rusty-cpp checker before falling back to system /usr/lib/llvm-*. We intentionally do not auto-detect clang-22 because of the documented libclang parse-crash regression (see docs/dev/libclang22_parse_crash.md). - docs/dev/libclang22_parse_crash.md: production switched to clang 21.1.8; refresh the three-way comparison table; document the source fixups (load_balancer.cpp duplicate forward-decl, reactor.cpp `static inline thread_local`) required for the switch. - docs/dev/srpc_module_migration_plan.md: replace the clang-22 cmake invocation with the clang-21 one; cross-reference the libclang doc; add a "toolchain note" preamble to the Metrics table noting the rows were measured under clang 22 but production now builds on clang 21 (numbers within noise). Verified on Homebrew clang 21.1.8: - clean rpcbench link - borrow_check_rrr: 45/45 files, 0 violations
Mechanical conversions where rusty-cpp's @unsafe reason was encapsulable in an inner block, leaving the public surface @safe. Borrow check stays clean (45/45 files, 0 violations) on Homebrew clang 21.1.8. Conversions by category: Aggregate-initialized POD factories (7): BufferingConfig::defaults / disabled / to_queue_config RequestQueueConfig::defaults / small / large / disabled These were marked @unsafe for "returns struct by value" with an inner @unsafe { struct construction } block. Both are redundant — aggregate-init of a POD is among the safest operations in C++. std::chrono helpers (3): QueuedRequest::is_expired QueuedRequest::age_ms rrr::current_time_ms std::chrono::steady_clock::now + duration_cast are read-only and memory-safe; the wrap-and-flip pattern hides the unannotated std calls behind a public @safe surface. Atomic counter accessors on Server (3): Server::pending_request_count Server::increment_pending Server::decrement_pending Each is one atomic load / fetch_add / fetch_sub; wrap that call in an inner @unsafe block and mark the accessor @safe. Matches the pattern already used for set_drop_heartbeat_replies / drop_heartbeat_replies in the same class. SpinCondVar signal/bcast (2): SpinCondVar::signal / SpinCondVar::bcast Each is one atomic store. wait / timed_wait stay @unsafe — they additionally call SpinLock and Time::sleep, which is Tier 3 work (apply Pattern B1 across the remaining SpinLock users). Bumps third-party/rusty-cpp to 755344c which annotates rusty::sync::Weak::upgrade / rusty::rc::Weak::upgrade as @safe. Two DeferredReply methods in server.cpp (reply, reply_error) drop their inline `// @unsafe - weak pointer upgrade` wrappers as a result. Not in this commit (defer): - Future::timed_wait — currently labeled "Uses std::chrono" but actually uses rusty::Mutex + Condvar, which is the larger blocker. - Server::drain — has chrono + usleep + multiple atomic loads in one function; a larger refactor than mechanical wrap-and-flip. - SpinCondVar::wait / timed_wait — wait until Tier 3 lands the remaining SpinLock pattern-B1 conversions.
…e callers The rusty-cpp library already annotates rusty::VecDeque (namespace- level `// @safe`), rusty::Function (per-method `// @safe`), and SpinMutex / SpinMutexGuard / Cell as @safe. The remaining @unsafe labels on rrr callers were leftover from before those library annotations landed. Flipping them now removes ~18 false-unsafe function signatures and drops 13 stale inline `@unsafe { }` markers that no longer have a non-safe operation to wrap. request_queue.cpp (9 functions, 8 stale inline markers): - RequestQueue ctor (defaults() factory is @safe per tier 1) - enqueue / dequeue / expire_stale - size / empty / full / remaining_capacity - clear_all Body composition for each: SpinMutex::lock() (@safe) + VecDeque method (@safe via namespace) + optional rusty::Function call (@safe). try/catch around callback invocation is ignored by rusty-cpp's analyzer (per its CLAUDE.md), so the catch arm doesn't need wrapping either. client.cpp (6 functions, 4 stale inline markers): - ClientConnection::pending_request_count (RequestQueue::size) - ClientConnection::clear_pending_requests (RequestQueue::clear_all) - ClientConnection::set_on_server_restart (Function move-assign) - ClientConnection::check_server_instance (Cell::get/set + Function::op()) - Client::pending_request_count (forwards to ClientConnection) - Client::clear_pending_requests (forwards to ClientConnection) - ClientConnection::replay_pending_requests (no-op stub returning 0) reactor.cpp (6 inline marker removals): - 6 `// @unsafe - rusty::Function constructor` markers around extract_if / retain callback construction in Reactor::check_timeout and Reactor::loop. The Function constructor is @safe in the library; the braces are kept as scope hygiene for the local iterator variables. Verified clean on Homebrew clang 21.1.8: borrow_check_rrr: 45/45 files, 0 violations.
…State>
ClientPool held `SpinLock l_` + two unprotected fields (`cache_` and
`lb_state_`), with seven public methods doing manual `l_.lock()` /
`l_.unlock()` around BTreeMap operations. The pattern was the last
raw-SpinLock holder in rrr.
Bundle the protected state into a `SpinMutex<PoolState>`:
struct PoolState {
rusty::BTreeMap<std::string, rusty::Vec<rusty::Arc<Client>>> cache;
rusty::BTreeMap<std::string, LoadBalancerState> lb_state;
};
mutable SpinMutex<PoolState> state_;
Bundling matches the access pattern: `get_client` already touches
both maps under the same lock, so they want one mutex, not two.
Every method that used `l_.lock()/.unlock()` now uses RAII via
`auto guard = state_.lock().unwrap();`, accessing fields as
`guard->cache` / `guard->lb_state`. The destructor also takes the
guard before iterating (it was previously unlocked-by-convention as
"the last reference"; locking is a no-op in that case but lets the
analyzer see the guard).
Methods flipped @unsafe → @safe:
- get_healthy_client_count
- remove_unhealthy_clients
- close_idle_clients
- remove_all_unhealthy
- close_all_idle
- total_client_count
- address_count
Still @unsafe (for non-lock reasons):
- get_client — drives Client::connect / try_reconnect_if_needed
synchronously, which is network I/O.
- reconnect_all (both overloads) — async batch driver uses
nanosleep + std::atomic loop. The state_ snapshot at the top is
@safe; the body is not.
`ClientPool::is_client_healthy` (used inside the lock) is one of the
remaining function-level @unsafe sites worth flipping next — it just
reads Cell + ConnectionMetrics, no actual unsafe op.
Verified on Homebrew clang 21.1.8:
- borrow_check_rrr: 45/45 files, 0 violations.
- clean rrr rebuild + rpcbench link.
… Client Two large classes carried the tag "marked unsafe to suppress rusty-cpp false positives (rusty-cpp is under development)". Those annotations predated several analyzer improvements that landed in the last few weeks: import-chasing for cross-module annotations, multi-line `// @unsafe { }` block parsing, cross-file body filter, and namespace-qualified annotation lookup (see commits 28e64c5, 0c16ebf, df2388f). With the matured analyzer, the suppressions are no longer needed. Drop them and flip both classes to `// @safe` at the class level. The methods inside that genuinely cross into unsafe territory (network I/O, std::chrono, rusty::Mutex + Condvar combinations) already carry their own method-level `// @unsafe` overrides; the rest of the class is now analyzed as @safe by inheritance. This is a class-level boundary change, not a method-level flip — no individual method's annotation changed. The net effect: previously- unannotated methods inside Future and Client now default to @safe via class inheritance instead of being @unsafe-by-namespace. Callers of those methods can now invoke them from @safe code without `// @unsafe { }` wrappers. Verified on Homebrew clang 21.1.8: - borrow_check_rrr: 45/45 files, 0 violations (no new findings from the @safe flip — the suppressions were genuinely no-ops). - clean rrr rebuild.
…ers in client.cpp These were defensive `// @unsafe { ... }` scope wrappers around method bodies whose underlying operations are all @safe in the matured rusty-cpp library + project annotations: - rusty::Arc::make (@safe in arc.hpp) - rusty::Mutex::lock / MutexGuard::operator*/-> (@safe in mutex.hpp) - rusty::RefCell::borrow / borrow_mut (@safe via namespace in refcell.hpp) - rusty::Option::is_some/is_none/as_ref/unwrap (@safe via namespace) - rusty::Cell::get/set (@safe via namespace) - rusty::Arc::clone (@safe in arc.hpp) - ClientConnection::host/connected/connection_state/server_instance_id /pending_request_count/clear_pending_requests/set_keepalive (all @safe at decl after tier 1+2) The wrappers were no-ops once those library/project methods landed their @safe annotations; the analyzer was already letting these through because the function-level @safe overrides the inner block scope, but the dead `// @unsafe { ... }` markers were noise to anyone reading the code. Methods cleaned: - Future::create, Future::ready, Future::timed_out, Future::add_completion_callback, Future::get_reply - Client::create, Client::host, Client::connected, Client::connection_state, Client::connection, Client::server_instance_id, Client::is_reconnecting, Client::set_keepalive, Client::set_reconnect_policy - Client::request (3 overloads), Client::request_with_options (2 overloads) Still @unsafe (kept): - Future::get_error_code's `{ timed_wait(x); }` block — timed_wait is genuinely @unsafe (Mutex+Condvar+chrono). Bare-marker count in rrr: 115 → 96. Verified on Homebrew clang 21.1.8: borrow_check_rrr: 45/45 files, 0 violations.
…pers
server.cpp (4):
- Server::reg_service (Box<Service> overload) — rusty::Vec::push +
Box move + ServiceProxy dispatch are @safe at the boundary.
- Server::reg_service (Box<T> ServiceLike overload) — same.
- Server::reg_rpc — rusty::HashMap::contains_key + insert are @safe
via the rusty namespace.
- Server::service_count — Option + Vec::size are @safe.
reactor.cpp (1):
- Reactor::make_arc — rusty::Arc::make is @safe; the wrap was
misdocumented as a "localized unsafe allocation boundary".
Kept @unsafe:
- Server::for_each_service — body has `static_cast<Service*>(...)`
on a raw pointer from ServiceProxy's __get_service__; that's a
genuine raw-pointer cast.
Bare-marker count in rrr: 96 → 92.
Verified on Homebrew clang 21.1.8:
borrow_check_rrr: 45/45 files, 0 violations.
Baseline (computed with /tmp/safety_loc.py): total in-fn LOC: 22,438 @safe (function or class-inherited): 1,435 ( 6.4%) @unsafe (function or class-inherited): 2,526 (11.3%) inner `@unsafe { ... }` blocks: 733 ( 3.3%) unannotated (namespace default = @unsafe): 17,744 (79.1%) The unannotated bucket is the conversion target. None of it violates the borrow check today; it just hasn't been actively tagged. Plan structure: Phase 0: fix LOC script's out-of-class inheritance bug (honest baseline) Phase 1: labeling sweep — class-level + namespace-level @safe sweeps on files that are already safe in fact (target 35–40%) Phase 2: easy raw-pointer refactors (target 50%) - ChannelConnectionProxy / FactoryProxy → rusty::Box - PollThreadWorker* TLS → rusty::Weak<PollThreadWorker> - rusty::sys::* syscall wrappers - ServiceProxy::__get_service__ → rusty::Arc<Service> Phase 3: harder refactors (target 65–70%) - alock.cpp ALock* → rusty::Weak<ALock> - serializable.cpp std::shared_ptr<Marshallable> → rusty::Arc (touches generated rcc_rpc.h codegen) - Reactor::loop @unsafe block scoping - Pthread_* → rusty::sync::* wrappers Phase 4: stretch (target 80%) - Marshal byte ops (refactor / external annot / quarantine) - fiber context quarantine (~150 LOC explicit @unsafe) - rcc_rpc.h codegen rewrite Honest ceiling without rusty-cpp expressiveness changes or perf-costly Marshal refactors is ~70%. The plan treats 70% as the success criterion and 80% as a stretch. Each Phase has a checklist; the self-pacing loop ticks one item per iteration, runs borrow_check_rrr + rrr build, commits, and updates this doc's Progress log.
Phase 0 of the rrr_safety_80pct_plan: an honest, reproducible baseline. Previously the script lived in /tmp and had a bug where it could match function calls (`Class::method(args)`) inside other function bodies as out-of-class method definitions, inflating the @safe / @unsafe in-fn counts. The fix: - Anchor OUT_OF_CLASS_METHOD regex to lines that don't start with control-flow keywords (`if/while/for/switch/return/else/do/try/catch`). - Only consider an out-of-class match at brace depth==0 (file scope) — function definitions open at file scope; function calls don't. - Handle multi-line signatures: stash a pending out-of-class name when the `ClassName::method(` line lacks `{`, attach when the `{` arrives. Honest baseline after the fix (was a moving target): in-fn LOC: 22,438 @safe: 1,408 (6.3%) @unsafe: 2,226 (9.9%) inner @unsafe: 733 (3.3%) unannotated: 18,071 (80.5%) The earlier estimate that the fix would land "+4-6pp" was wrong — the fix tightens classification rather than growing @safe. Real ratio growth comes from the Phase 1 labeling sweep.
Two intertwined changes, both surface @safe LOC that was previously hidden: 1. **`Server` class** (`rpc/server.cpp`) flipped from `// @unsafe` to `// @safe` at the class level, mirroring the Tier-4 flip on `Client`. Every method that genuinely crosses into unsafe territory (socket I/O via TcpListener, Pthread / std::atomic primitives, raw pointer extraction from ChannelListenerProxy, `usleep` in drain, etc.) already carries a method-level `// @unsafe` override and is unaffected. 2. **LOC-script bug fix**: `scripts/rrr_safety_loc.py` previously used `";" in line` to detect forward-decls and drop pending annotations. Multi-line `// @safe -` annotation comments often contain a `;` in prose ("// overrides; the rest of the class…"), which fired the drop spuriously and stripped the pending annotation BEFORE the class declaration was reached. Net effect: `Future` and `Client` class flips from Tier 4 were silently uncredited in every prior LOC report. Fix: check `";" in stripped` (after strip_line_comment) instead. borrow_check_rrr: 45/45 clean. LOC ratio: 6.3% → 7.2% @safe (1,408 → 1,623 LOC inside fn bodies). Note: this delta combines Server's flip and the script fix's retroactive credit for Future/Client; the pure Server contribution is about half of it.
The amend in 54a2d98 changed the SHA from the originally-recorded d5028a19. Update the Progress log entry to match.
Add `// @safe` at the `class Reactor` declaration. The Reactor design
already documents itself as memory-safe (single-threaded, RefCell /
Cell / Rc / HashMap with rusty borrow rules); the comment was
descriptive but the analyzer wasn't credited.
Methods that genuinely cross into unsafe territory keep their
existing method-level `// @unsafe` overrides:
- `get_reactor` / `get_disk_reactor` (thread-local Rc access)
- `register_stackless_poller` / `enqueue_stackless_task` /
`process_stackless_tasks` / `spawn_stackless_task` (RefCell-mut
via mutable members)
- `check_timeout` / `loop` (rusty-cpp doesn't fully model RefCell)
- `continue_fiber` / `recycle` (fiber-context interactions)
Most Reactor:: out-of-class methods already had per-method @safe or
@unsafe annotations, so the class-level flip mainly credits the
inline methods (getters, setters, factory helpers, register_fiber,
restore_running_fiber, etc.) that were silently unannotated.
borrow_check_rrr: 45/45 clean.
LOC ratio: 7.2% → 7.4% @safe (1,623 → 1,652 LOC).
Modest gain, as expected — the bigger reactor wins live in Phase 2
(PollThreadWorker* raw pointer → rusty::Weak) and Phase 3 (Reactor::loop
@unsafe-block tightening).
; LOC script pending_for_class leak fix Two correctness wins, no ratio movement: 1. Add `// @safe` at the class level for `IdempotencyKeyGenerator` and `IdempotencyCache` in `rpc/idempotency.cpp`. Both classes use only rusty primitives (Cell, Mutex, HashMap) with explicit @unsafe annotations on the few methods that copy Marshal byte payloads. 2. Fix `scripts/rrr_safety_loc.py`: `pending_for_class` was leaking across function-body `{` consumption. Only `pending` got reset when a function brace was consumed; `pending_for_class` kept the stale annotation forever, which then got attached to the next class declared anywhere in the file. Concrete effect: pre-fix, `IdempotencyKeyGenerator` was wrongly credited as `@unsafe` because `CachedResponse::get_response_data` set `pending_for_class = "unsafe"` 17 lines earlier and that never got cleared. Same pattern was likely contaminating other files' class annotations too — the previous LOC report was about 1pp too optimistic on @safe and 0.9pp too optimistic on @unsafe. Honest baseline after both fixes: in-fn LOC: 22,453 @safe: 1,398 (6.2%) @unsafe: 2,137 (9.5%) inner @unsafe: 733 (3.3%) unannotated: 18,185 (81.0%) This iteration's class flips on idempotency.cpp don't move the ratio because every method in both classes already had explicit per-method annotations — class-level @safe only helps when there are unannotated method bodies for it to inherit. borrow_check_rrr: 45/45 clean.
Add `// @safe` at the `class CompletionTracker` declaration. The class already uses only rusty primitives (Cell, Mutex<std::list>, Mutex<HashSet>) with no raw pointers, syscalls, or Marshal byte ops. All previously unannotated methods inside the class body now inherit @safe. borrow_check_rrr: 45/45 clean. +107 @safe LOC; ratio 6.2% → 6.7%.
Single-threaded state machine; every field is rusty::Cell<T> with trivially-copyable interior mutability. No raw pointers, syscalls, or Marshal byte ops. borrow_check_rrr: 45/45 clean. +118 @safe LOC; ratio 6.7% → 7.2%.
Cell-based heartbeat tracker with rusty::Function timeout callback. No raw pointers, syscalls, or Marshal byte ops. borrow_check_rrr: 45/45 clean. +88 @safe LOC; ratio 7.2% → 7.6%.
…address Five method-level `// @unsafe` markers on TcpConnection / TcpListener adapter accessors were tied to ops that have since become @safe: - `TcpConnectionChannelAdapter::is_closed` — forwards to TcpConnection:: is_closed (Cell::get, @safe). - `TcpConnectionChannelAdapter::peer_address` — forwards to TcpConnection:: peer_address (const std::string accessor). - `TcpConnectionPollableAdapter::is_closed` — same as above. - `TcpConnectionPollableAdapter::check_pending_write_update` — forwards to TcpConnection::check_pending_write_update (Cell::get + Cell::set, @safe). - `TcpListener::local_address` — returns a copy of the const std::string member; the historical "std::string copy constructor isn't borrow-checked" rationale was over-conservative. Verification: borrow_check_rrr 45/45 clean. Ratio holds at 71.5% (impact small; mostly comment changes on one-liners).
Both methods only call SpinMutex::lock (@safe since Tier 2.x) and CallbackWrapper move-assign (@safe via namespace+class annotation in rrr.callback_wrapper). The historical "(not @safe)" rationale was out of date. Verification: borrow_check_rrr 45/45 clean. Ratio holds at 71.5% (+6 @safe LOC, −6 @unsafe LOC).
…::make Both static factory templates wrap `rusty::Arc<T>::make(args...)`. The historical "(non-borrow-checked)" rationale predates the rusty::Arc namespace-level @safe annotation: Arc::make is itself @safe with an inner `// @unsafe { new }` block around the heap allocation. Verification: borrow_check_rrr 45/45 clean. Ratio 71.5% → 71.6%.
- TcpListener::handle_write (returns PollMode::NO_CHANGE, no side effects)
- TcpListener::check_pending_write_update (returns false)
- PollThreadWorker::do_remove_pollable (HashMap::contains_key +
HashSet::insert, both @safe; the "uses STL operations" rationale
was wrong — these are rusty types)
- PollThreadWorker::do_close_pollable (HashSet::remove / HashMap::get
/ HashMap::remove / HashMap::contains_key are @safe; only
Epoll::Remove and virtual Pollable::close() dispatch escape into
inner @unsafe blocks).
Verification: borrow_check_rrr 45/45 clean. Ratio 71.6% → 71.8%
(+27 @safe LOC, −25 @unsafe LOC).
…loop Two inline @unsafe blocks in Reactor's fiber-execution path were rooted in "Fiber::finished() is not marked @safe" — that's outdated. Fiber::finished() is annotated @safe (reads a Cell<Status>); RefCell::borrow + Option::as_ref / unwrap are @safe via namespace inheritance. Verification: borrow_check_rrr 45/45 clean. Ratio 71.8% → 71.9% (+6 @safe LOC, −6 @unsafe LOC).
`Event::test()` only calls @safe ops: - `verify()` is @safe (pure precondition check, marked @safe in rrr.debugging) - `is_ready()` is virtual; the base @safe-annotates the virtual - `status_.get()` / `status_.set()` (Cell, @safe) - `wp_fiber_.upgrade()` (rusty::sync::Weak::upgrade, @safe) - `option_fiber.is_some()` (@safe) - `Log_debug(fmt, args...)` is @safe (template shim wrapping Log::debug in an inner @unsafe block) The historical "verify/log helpers not marked @safe" rationale was outdated. Verification: borrow_check_rrr 45/45 clean. Ratio holds at 71.9%.
RefCell / Option Eight inner `// @unsafe { ... }` blocks were wrapping operations that have been @safe for a while: - marshal.cpp: three `// @unsafe { Vec::clear is @safe; wrap defensively. }` blocks around `buf_.clear()` / `src.buf_.clear()` — defensive wraps that were never needed; rusty::Vec::clear is @safe via namespace inheritance. - client.cpp: two `// @unsafe { SpinMutex::lock }` wraps around `factory_.lock()` / `pending_factory_.lock()` — SpinMutex::lock is @safe since Tier 2.x. - client.cpp: five `// @unsafe { RefCell::borrow, Option::unwrap are not borrow-checked }` blocks around connection_.borrow() + Option::is_some / as_ref / unwrap — both RefCell::borrow and Option::unwrap are @safe via namespace inheritance. Verification: borrow_check_rrr 45/45 clean. Ratio 71.9% → 72.1% (+21 @safe LOC; inner-block LOC dropped by 25).
Nine more inner `// @unsafe { ... }` blocks wrapped operations that have been @safe for a while: - client.cpp x4: `// @unsafe { RefCell::borrow, Option::unwrap }` around connection_.borrow() + Option::is_some / as_ref / unwrap. - client.cpp x2: `// @unsafe { SpinMutex::lock + Option::take }` around fiber_channel_.lock().unwrap() / direct_channel_.lock() .unwrap(). - reactor.cpp x2: `// @unsafe { RefCell::borrow_mut, Option operator= are not borrow-checked }` around the sp_running_fiber_th_.borrow_mut() = Some/None idiom. The wrapped operations are all @safe via the rusty namespace + RefCell / Option / SpinMutex class annotations. Verification: borrow_check_rrr 45/45 clean. Ratio 72.1% → 72.4% (+31 @safe LOC; inner-block LOC dropped by 31, mostly net-neutral moves into the @safe pool).
Two more pairs of inner `// @unsafe { ... }` blocks were wrapping purely-@safe operations: - `// @unsafe { SpinMutex::lock + ChannelFactoryProxy move }` (x2) around factory_.lock().unwrap() + Some(std::move(factory)) — both SpinMutex::lock and Box move-assign are @safe. - `// @unsafe { make_box + SpinMutex mutation }` (x2) around fiber_channel_.lock().unwrap() + Some(make_box<FiberChannel>) — rusty::make_box is @safe (Arc::make is @safe), SpinMutex::lock is @safe. Verification: borrow_check_rrr 45/45 clean. Ratio 72.4% → 72.5% (+12 @safe LOC; inner-block LOC dropped by 12).
- idempotency.cpp: Marshal `operator<<`/`operator>>` overloads for
`IdempotencyKey` flip @safe — the Marshal class is namespace @safe
via the Phase 4 Cursor rewrite; the per-method @unsafe blocks
only remain on the chunk-list-era methods (which the rewrite
deleted).
- client.cpp `current_time_ms()`: dropped the `std::chrono::steady_clock`
+ `duration_cast` plumbing in favor of `rusty::sys::time::
clock_monotonic_us() / 1000`. Function flips @safe end-to-end.
- request_queue.cpp `QueuedRequest`: storage migrates from
`std::chrono::steady_clock::time_point timestamp` to plain
`std::uint64_t timestamp_us` (monotonic microseconds via
`clock_monotonic_us`). All three methods (ctor, is_expired,
age_ms) flip @safe.
Verification: borrow_check_rrr 45/45 clean. Ratio 72.5% → 72.6%
(+6 @safe LOC; inner-block LOC dropped by 15).
… Vec)
- client.cpp `reconnect_address_ = addr` — std::string assignment from
`const char*` is benign in @safe code; no need for an inner wrap.
- client.cpp `bind_channel_direct` body — SpinMutex::lock +
Option::operator= are both @safe.
- server.cpp's channel-listener on_accept lambda — SpinMutex::lock +
rusty::Vec::push are both @safe.
Verification: borrow_check_rrr 45/45 clean. Ratio 72.6% → 72.7%
(+6 @safe LOC; inner-block LOC dropped by 6).
`update_config` was method-level @unsafe with an inner @unsafe block around `queue_.lock().unwrap()` + `config_ = config`. Both are @safe: SpinMutex::lock and POD-struct assignment have no escape hatches. Verification: borrow_check_rrr 45/45 clean. Ratio holds at 72.7% (+8 @safe LOC, −9 @unsafe LOC).
…ty() std::string::empty() is a pure const accessor — completely safe in @safe code. The lingering @unsafe wrap was incidental. Verification: borrow_check_rrr 45/45 clean. Ratio 72.7% → 73.2% (+58 @safe LOC; the inline @unsafe block was straddling an entire if-condition, so removing the wrap relabels the surrounding code).
Third std::chrono call-site in client.cpp (after Future::timed_wait and current_time_ms) flips to rusty::sys::time::clock_monotonic_us / 1000. Removes the last std::chrono::steady_clock::now() use in this file outside of the test-side rpcbench tooling. Verification: borrow_check_rrr 45/45 clean. Ratio holds at 73.2%.
Phase 2 (rrr safety push, item 3 — second wrapper family). Bumps the rusty-cpp submodule to pick up `rusty::sys::process::*` and updates the five rrr files calling those syscalls directly: - base/misc.cpp `get_ncpu`: sysconf(_SC_NPROCESSORS_ONLN) → @safe `rusty::sys::process::sysconf`. Function flips @unsafe → @safe. - base/misc.cpp `get_exec_path`: getpid → @safe wrapper. The function stays @unsafe overall (static char[PATH_MAX] + readlink + raw char* return) but the getpid call no longer carries an explicit syscall annotation. - misc/cpuinfo.cpp `CPUInfo::CPUInfo` + `get_cpu_stat`: sysinfo / sysconf / times / getpid all route through the new helpers. The methods stay @unsafe because of get_network / get_memory parsers (raw char* / strtok) but the syscall plumbing is gone. - misc/netinfo.cpp `NetInfo::NetInfo` + `get_net_stat`: drop the `// @unsafe { times(&tms_buf) }` inner blocks — times() now flows through rusty::sys::process::process_times. - reactor/reactor.cpp `fiber_task_t::init_context`: sysconf(_SC_PAGESIZE) → @safe wrapper. - rpc/server.cpp instance-id generation: getpid → @safe wrapper. Submodule bump: 843ba3b adds `include/rusty/sys/process.hpp` with getpid / sysconf / process_times (ProcessTimes aggregate) / sysinfo (Linux-only SysInfo aggregate). Each entry is @safe, body wraps a single libc syscall in an inner @unsafe block, return types are plain integers or small PODs. Verification: borrow_check_rrr 45/45 clean; rrr library compiles. Ratio 73.3% → 73.5%.
Phase 2 (rrr safety push, item 3 — third wrapper family). Bumps the
rusty-cpp submodule to pick up rusty::sys::env::hostname() and updates
the one rrr caller:
- rpc/utils.cpp get_host_name(): drops the raw char[1024] buffer +
direct gethostname call. Flow now goes through @safe sys::env::
hostname which wraps gethostname in an inner @unsafe block and
returns an owned std::string. Function flips @unsafe → @safe.
Submodule bump: d720f95 adds include/rusty/sys/env.hpp. Buffer size
defaults to 256 bytes (overridable via RUSTY_SYS_ENV_HOSTNAME_BUF).
Verification: borrow_check_rrr 45/45 clean. Ratio 73.5% → 73.6%.
…d_hash
Phase 2 (rrr safety push, item 3 — fourth wrapper family). Bumps the
rusty-cpp submodule to pick up rusty::sys::pthread::current_id_hash()
and updates the one rrr caller:
- basetypes.cpp Rand::Rand: pthread_self() seed contributor now goes
through @safe sys::pthread::current_id_hash. The only remaining
@unsafe op is , which is now
confined to a tight inner @unsafe { } block. Function flips
@unsafe → @safe.
Submodule bump: 97c45b4 adds include/rusty/sys/pthread.hpp scoped
narrowly to thread-identity (the mutex / condvar / thread-create
surface continues to live in rusty::sync::* / rusty::thread::*).
Verification: borrow_check_rrr 45/45 clean. Ratio holds at 73.6%
(Rand::Rand body is small; +8 @safe LOC, -7 @unsafe LOC, +2 inner-block).
Phase 2 item 3 (rusty::sys::* syscall wrappers) was [partial] after the sys::time landing. With sys::process / sys::env / sys::pthread also shipped + folded into rrr, the item is now [x]. Logs the four sub-families that landed (time, process, env, pthread) with their submodule + parent commit hashes, and documents the deliberate skips for epoll/kqueue, pthread mutex/condvar, and the full file I/O surface (each annotated with the reason rrr's safety ratio wouldn't move).
… Queue
Audit confirmed these classes had zero production-touching method
calls:
- `ThreadPool` was constructed in `paxos_worker.cc` and
`raft_worker.cc` (`base::ThreadPool::make(num_threads)` + `make(1)`
for hb) and stored in `rusty::Arc<base::ThreadPool>` fields, but
`run_async` was never invoked. The fields were checked via empty
`if (thread_pool_g) { /* Arc auto-releases */ }` blocks during
shutdown — i.e. it spun up worker threads on an idle job queue
forever, allocated for nothing.
- `RunLater`: zero callers outside threading.cpp.
- `SpinCondVar`: zero callers outside threading.cpp.
- `Queue<T>` (pthread mutex + cond blocking queue): only consumed
internally by `ThreadPool::run_thread`.
Deletes:
- threading.cpp: drops SpinCondVar (~70 LOC) + Queue<T> (~75 LOC) +
ThreadPool class + impl (~155 LOC) + RunLater class + impl
(~220 LOC). File shrinks from 930 → 411 LOC. Unused includes
(rusty/arc.hpp, /box.hpp, /fn.hpp, /function.hpp, /vecdeque.hpp,
sys/time.h) and imports (rrr.misc) come out too.
- server_worker.{h,cc}: drops `svr_thread_pool_`, `thread_pool_g`,
`hb_thread_pool_g` fields + the `hb_thread_pool_g = svr_thread_pool_`
assignment.
- paxos_worker.{h,cc}: drops `thread_pool_g` / `hb_thread_pool_g`
fields + the two `base::ThreadPool::make(...)` construction calls.
- raft_worker.{h,cc}: same shape; also drops the two empty
`if (...thread_pool_g) { /* Arc auto-releases */ }` shutdown
blocks.
Verification: borrow_check_rrr 45/45 clean. Channel-mode unit tests
pass (inmemory 24/24, fiber 8/8, tcp 20/20, client_factory 5/5).
rrr + mako + rpcbench + txlog_core_obj all link cleanly. Ratio
73.6% → 74.0%; @unsafe pool drops by ~114 LOC; total in-fn LOC
drops by ~360.
Net: 554 LOC removed, 2 added.
Two dead-code prunes surfaced by the Phase 2 follow-on survey:
- `NetInfo` (src/rrr/misc/netinfo.cpp, ~75 LOC) gauged ens4 rx/tx
bytes/sec through `NetInfo::net_stat()`; zero callers anywhere in
the tree. Removed the file, dropped `import rrr.netinfo;` from
`rrr.hpp`, and pruned the entries from RRR_MODULE_SRC /
RRR_BORROW_SRC in `src/rrr/CMakeLists.txt`.
- `ServerConnection::{fd, poll_mode, content_size, handle_read,
handle_write, handle_error, handle_free, check_pending_write_update}`
in `src/rrr/rpc/server.cpp` (~95 LOC) were stub methods whose
doc-comments cited "PollableProxy facade ABI compatibility". But
`class ServerConnection` has no base class — the stubs aren't
overrides — and `make_pollable_proxy_from_typed_arc<T>` is never
instantiated with `T = ServerConnection` anywhere in the tree
(verified via grep). The only callers lived inside one test
(`ServerApiSafetyTest.ServerConnectionContentSizeAndHandleFreeAreSafe`
in test_rpc_extended.cc) that exercised the no-op behavior to
confirm it was a no-op. Deleted the test along with the stubs.
`is_closed()`, the one method in the same block that is actually
used by the poll loop and many callers, is kept.
Verification:
- rrr builds clean (`cmake --build build_clang21 --target rrr`).
- borrow_check_rrr clean across 44 files (47/48 ninja tasks; 48th
is the rrr.netinfo BMI which is no longer emitted).
- test_rpc_extended builds + links clean.
Safety LOC tally: 8395 @safe / 11351 in-function = 74.0%.
Plan doc: append a "Phase 2 follow-on — dead-code removal" entry
listing this commit + the prior threading.cpp prune (f6be7df).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tomic
The Log class held a static `pthread_mutex_t m_s` that protected
writes to `level_s` (int) and `fp_s` (FILE*). Two findings:
1. `fp_s` and `Log::set_file()` had zero callers in the tree.
They were dead state.
2. `log_v` (the hot path) reads `level_s` without taking the lock,
so the mutex only guarded the writer — a torn-write hazard, not
a torn-read one. Rust would model this as `AtomicI32`, not a
`Mutex<i32>`.
Changes in `src/rrr/base/logging.cpp`:
- Drop `static FILE* fp_s` field + its definition.
- Drop `static pthread_mutex_t m_s` field + its definition.
- Drop `Log::set_file(FILE*)` declaration and definition.
- Change `static int level_s` to
`static rusty::sync::atomic::Atomic<int> level_s`.
- Initialize to `Log::DEBUG` via `Atomic<int>{Log::DEBUG}`.
- `set_level` becomes `level_s.store(level, Ordering::Relaxed)`.
- `log_v` reads via `level_s.load(Ordering::Relaxed)`.
- Drop `#include <pthread.h>` and `import rrr.threading;` — the
file no longer touches pthread or the Pthread_* wrappers.
The rusty-cpp submodule bumps to `5150277` (one commit) which adds
namespace-level `// @safe` annotations to `rusty::sync::atomic` so
`load`/`store` can be called from @safe code without inline
`@unsafe { ... }` blocks at the call sites.
Verification:
- rrr builds clean.
- borrow_check_rrr clean across all 44 files (logging.cpp
specifically: 0 violations).
- LOC tally: 8394 @safe / 11341 in-function = 74.0%.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The paxos/raft configs in `config/1leader_2followers/` hardcode ports (45xxx/46xxx for paxos, 27xxx/28xxx for raft). Between consecutive CI runs in the same job — including the auto-retries inside `ci/ci.sh` — a process from the previous attempt that exited abnormally can leave its listen sockets in TIME_WAIT / FIN-WAIT-2 on those ports. SO_REUSEADDR helps with TIME_WAIT but not with sockets still tracked in connected states, so the next attempt's `s102` bind on 0.0.0.0:46002 panics with AddressInUse and the shard goes down before any throughput emits. We hit this on `shard2Replication` in PR #67's CI run. Mirror what simpleTransactionRep already does for the shard config (`make_simple_txn_rep_config`): pick a random base port at test init, probe the leader port of each (shard, cluster) tuple via real `bind()` to verify the range is free, then materialize per-shard rebased copies of the paxos/raft YAML into `/tmp/mako_paxos_cfg_XXXX/`. Test wrappers export `MAKO_PAXOS_CONFIG_DIR` so `bash/shard.sh` and `simpleTransactionRep` redirect to the tmp dir. Files touched: - `examples/simple_transaction_rep_port_utils.sh`: + `pick_paxos_replication_port_base(nshards, nthreads)` — picks a base in [40000, 56000-window) with up to 2000 random retries, probing 4 * nshards leader ports via SO_REUSEADDR bind. + `write_paxos_replication_config(new_base, src, dest)` — rebases every `s<id>:<port>` token in `site.server` by `delta = new_base - min(existing ports)`. Preserves yaml-cpp- friendly flow-style nested lists. + `make_paxos_replication_configs(nshards, nthreads, type)` — orchestrates: picks base, mkdir-tmp, calls `write_paxos_replication_config` per shard with shard-stride `+1000`, returns the tmp dir path. - `bash/shard.sh`: honor `MAKO_PAXOS_CONFIG_DIR` (fall back to the in-tree `config/1leader_2followers/`). - `examples/simpleTransactionRep.cc`: same `MAKO_PAXOS_CONFIG_DIR` override at the `paxos_config_files[]` construction site. - All eight CI replication test wrappers: `test_{1,2}shard_replication{,_simple,_raft,_simple_raft}.sh` call `make_paxos_replication_configs`, export the env var, and clean up the tmp dir on EXIT / interrupt. Verification: - `bash -n` clean on all 10 modified shell scripts. - `simpleTransactionRep` rebuilds. - Smoke test: $ tmp=$(make_paxos_replication_configs 2 3 paxos) $ head -8 $tmp/paxos3_shardidx{0,1}.yml shows shard 0 at base+0/100/200/300 and shard 1 at base+1000/1100/1200/1300 with the original site IDs (s101..s403) intact and yaml-cpp-style flow lists preserved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first run of the randomized-port helper produced a base around 57900,
which gave paxos site ports up to ~58900. PaxosWorker::CtrlPortDelta = 10000
means each site also spawns a heartbeat listener at `port + 10000` — so we
were asking the kernel to bind 68900, which is past the valid TCP port
range, and bind() returned AddressInvalid:
rrr::Server::start: channel listener failed to bind 0.0.0.0:67901: AddressInvalid
The original hardcoded configs never hit this: paxos shard 0 lived in 45xxx
(heartbeat 55xxx) and shard 9 topped out at 54xxx (heartbeat 64xxx).
Two fixes in `pick_paxos_replication_port_base`:
1. Lower BASE_MAX by 10000 + per-shard window so
`max_heartbeat = base + (nshards-1)*1000 + 300 + (nthreads-1) + 10000`
stays below 65535.
2. Add the matching `port + CTRL_PORT_DELTA` to the probe set so external
port conflicts on heartbeat ports also block selection at init.
Sanity-checked: 5 random picks for `(nshards=2, nthreads=3)` produced bases
43692..51131 with max heartbeat 54994..62433 — all within range.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After the paxos/raft randomization landed in bb6bb21, CI moved past all replication failures but now fails on `multiShardSingleProcess` with `bind 0.0.0.0:31000: AddressInUse` / `0.0.0.0:31102: AddressInUse`. Root cause: the multi-shard single-process tests hardcode `--shard-config $path/config/local-shards2-warehouses$trd.yml`, which ships with port 31000+ for `localhost`. Earlier tests (`simpleTransaction`) pick a random base in `[20000, 28599]` with offsets up to 3100, so their highest port is 31699 — overlapping 31000-31102. A TIME_WAIT / leftover socket from `simpleTransaction` takes 31000 down before `multiShardSingleProcess` can bind it. Apply the same `make_simple_txn_rep_config` pattern used by the replication tests: - `examples/test_multi_shard_single_process.sh`: source the port-utils, generate `MAKO_CONFIG=$(make_simple_txn_rep_config 2 $trd)`, pass `--shard-config $TEMP_CONFIG`, clean up on EXIT. - `examples/test_2shard_single_process.sh`: same pattern. - `examples/test_2shard_single_process_replication.sh`: also generate a randomized paxos dir (`make_paxos_replication_configs 2 $trd paxos`) and rewrite the two hardcoded `-F config/1leader_2followers/paxos${trd}_shardidx*.yml` args to use `${MAKO_PAXOS_CONFIG_DIR}/...`. After this, every CI test that binds ports does so on a per-run randomized + bind-probed range; no test still depends on the hardcoded 31xxx-34xxx or 45xxx-46xxx ranges. `bash -n` clean on all three scripts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A milestone of work on the
worktree-srpcbranch covering:rusty::Vec<uint8_t>+ read-cursor (incidentally fixed a 5-year-old chunk leak)ClientConnectionclass-level@safewith carefully-scoped per-method @unsafe overridesHashMap::getreturnsOption<V&>(notOption<V*>),Vec::extend_from_slicememcpy fast path + geometricreserve, full Weak @safe annotations,Vec::value_typetypedefstatic inline thread_localform)const-ification cascade throughConnectionStateMachine+ClientConnectionlifecycle methods, eliminating 2const_castsitesHeadline numbers
scripts/rrr_safety_loc.pyratio:client.cppspecifically:Marshal microbench (
bench_marshal):End-to-end RPC (
rpcbenchloopback, 4 threads × 100 outstanding, mode=fast, 8s):What's in the PR
rrr/misc/marshal.cpp — Vec rewrite
(size, char**)to(offset, size))init_block_readreduced to aVec::reservehint;read_chnk/read_reuse_chnkremoved (no external callers)Marshal::read()(a commented-outdelete chnk;dating to 2020) is fixed as a side effectrrr/rpc/client.cpp — class-level @safe + cleanups
class ClientConnectionflipped from class-level@unsafeto@safe(was the single biggest @unsafe-LOC source in rrr; +296 LOC moved to @safe in that one flip alone)fail_pending_future,handle_free,mark_closing,allow_request_with_circuit_metrics,record_circuit_result)Weak<>copy-block@unsafewraps dropped after library annotations landedinvalidate_pending_futures,mark_closing,close,handle_errorare nowconst;reconnecting_/reconnect_abort_atomics markedmutable; 2 const_casts droppedrrr/rpc/server.cpp — Weak cleanups
@unsafeblocks dropped (2 sites)third-party/rusty-cpp — library improvements
Vec::extend_from_slice/Vec::write: memcpy fast path for trivially-copyable T (~10× faster on byte writes)Vec::reserve: geometric growth (max(new, 2×capacity)) — required for amortized O(1) writesHashMap::get/get_mut: returnOption<V&>instead ofOption<V*>(matches Rust + matchesVec::get)sync::Weak<T>/rc::Weak<T>: every method individually@safewith atomic/raw-ptr work in inline@unsafe { ... }blocksrrr/tests/bench_marshal — new Marshal microbench
docs/dev/marshal_perf_baseline.mdwith baseline numbers, calibrated A/B vs leak-fixed chunk-list, end-to-end RPC numbersMerge resolution
static inline thread_localover srpc's function-local-static accessor approachsrc/deptran/communicator.cc: updated toReactor::clients_member accessvec value_type+ noexcept dtor) into local @safe branch; pushed toshuaimu/rusty-cpp mainTest plan
cmake --build build_clang21 --target rrr -j32— cleanborrow_check_rrr— 45/45 clean./build_clang21/test_marshal— 23/23./build_clang21/test_rpc_marshal_archive— 68/68./build_clang21/test_rpc_marshallable_proxy— 12/12./build_clang21/test_rpc_any_message— 8/8./build_clang21/test_idempotency— 32/32./build_clang21/test_rpc— 17/17 (incl. multi-threaded scenarios)./build_clang21/bench_marshal(microbench numbers match expectations)shardNoReplication,shard1Replication, etc.) — defer to CINotes for reviewers
Option<V&>) is source-breaking for any caller binding toV*. The rrr-side rewire is included in this PR.docs/dev/marshal_perf_baseline.mdas a separable concern. The "rewrite + leak fix" combo is faster than "rewrite alone" against the original (leaky) chunk-list; against the leak-fixed chunk-list the wins are smaller but still real (~15% on small-payload RPC).🤖 Generated with Claude Code