Skip to content

[MEDIUM] WithdrawalAuditLog is in-memory only — process crash loses every withdrawal event #38

@Alqku

Description

@Alqku

Overview

WithdrawalAuditLog (crates/tools/src/withdrawal_audit.rs lines 50–95) keeps its entire audit history in-process memory as Vec<WithdrawalLogEntry>. Any process crash, restart, or container eviction loses every withdrawal event — Requested, Approved, Submitted, Rejected. The audit log is the primary non-blockchain record of admin actions on creator withdrawals; losing it silently breaks compliance and post-incident forensics.

Evidence

// crates/tools/src/withdrawal_audit.rs
#[derive(Default, Debug, Clone)]
pub struct WithdrawalAuditLog {
    entries: Vec<WithdrawalLogEntry>,
}

impl WithdrawalAuditLog {
    pub fn log(&mut self, ...) {
        self.entries.push(WithdrawalLogEntry { ..., timestamp: Utc::now().timestamp(), ... });
    }
    // No fsync, no DB call, no network write.
}

The struct is also non-#[serde(Serialize)]-friendly (note types), so a "rotate to JSON on shutdown" strategy would need new machinery. Nothing in crates/tools wires any persistence or shutdown hook.

Impact

  • An admin who revokes a withdrawal but loses the log can't prove what was revoked.
  • A nightly cron restart of the orchestration pod drops a day of audit trails.
  • Combined with chrono::Utc::now() as the timestamp source, the logs are not even synchronised to a single time authority between processes (multi-instance deployments diverge).
  • Disagrees with the intent implied by use cases like "freeze the contract" — there is no forensic trail to rely on.

Recommended Approach

Two-layer approach:

  1. Short term — append-only on-disk sink. Add WithdrawalAuditLog::flush_to_disk(&self, path: &str) -> Result<()> and a periodic tokio::task in worker_logger.rs (or a new audit_writer module) that calls it every N entries / T seconds. Append JSON lines, set 0o600 on the file, and rotate with size-bounded files (audit-YYYY-MM-DD.jsonl).
  2. Medium term — durable store. Persist to SQLite or Postgres so multi-host deployments converge on the same log. This requires schema work and migrations; track as a separate issue.

Also: replace chrono::Utc::now() with an injected Clock trait so audit timestamps can be deterministic in tests and aligned with the Soroban-ledger time the corresponding on-chain WithdrawalRequested/WithdrawalApproved event carries. Currently the in-memory log's timestamp and the on-chain event timestamp come from independent clocks — they cannot be cross-checked.

Acceptance Criteria

  • WithdrawalAuditLog::flush_to_disk writes valid JSON lines with restrictive file permissions and an atomic append
  • Periodic flush is invoked from a real worker loop, not just behind a tokio::test
  • Clock injection is in place: tests run with a fixed-time clock; production uses chrono::Utc::now() only via the trait default
  • Crash-restart-fork test: log is rewritten on next start without truncation
  • Documentation in docs/deployment.md describes log rotation policy and on-disk schema

Affected Files

  • crates/tools/src/withdrawal_audit.rs
  • crates/tools/src/worker_logger.rs (or a new audit_writer module)
  • crates/tools/src/main.rs (wire the periodic flusher)
  • crates/tools/Cargo.toml (any new dependency, e.g. sqlx if SQLite is chosen)
  • docs/deployment.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededrefactoringCode restructuring without behavioral change

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions