Add Unix socket listener with proper HBA support#175
Open
Conversation
0a555a4 to
65d09ad
Compare
added 9 commits
April 6, 2026 12:43
Client connections via Unix domain socket (unix_socket_dir config option). Bypasses TCP stack entirely — useful for benchmarking and local deployments. Changes: - unix_socket_dir config creates .s.PGSQL.<port> socket (pgbench-compatible) - HBA local rules now correctly match Unix socket connections (previously local rules were always skipped) - HBA host/hostssl/hostnossl rules only match TCP connections - max_connections enforced for Unix socket clients - Socket file cleaned up on shutdown - is_unix flag threaded through check_hba -> Client::startup
- BDD test: psql via Unix socket reaches PostgreSQL backend
(pg_backend_pid proves real backend, not synthetic response)
- CHANGELOG.md with known limitations
- Fix duplicate comment in annotated config for unix_socket_dir
- Add ${PG_TEMP_DIR} placeholder for BDD tests
Needed: compare all three poolers (pg_doorman, pgbouncer, odyssey) via Unix domain socket alongside existing TCP benchmarks. What changed: bench.feature runs Unix socket tests first — simple, extended, prepared protocols with 1/40/120/500/10000 clients, with and without reconnect. Odyssey gets a dedicated unix-socket-only instance (it cannot bind TCP and Unix socket in one process). Benchmark tables now use ▲/▼ arrows for faster visual scanning of results.
Needed: AWS Fargate bench run failed because odyssey refused to start the unix-socket-only instance with "unix_socket_mode is not set". Added unix_socket_mode "0644" to the dedicated odyssey instance config.
Needed: during the long benchmark run the polling loop only printed "Task status is RUNNING" with no insight into what the bench was doing. Hard to tell if it was making progress or stuck. Now every 5th poll (~2.5 minutes) the loop fetches new log events from CloudWatch and prints them inline, so the workflow output shows the bench output as it happens.
Needed: the .s.PGSQL.<port> socket file inherited the process umask after bind, so the access surface depended on whoever launched pg_doorman. There was no way to declare the intended permissions in the config; operators had to umask-wrap systemd units or rely on parent directory permissions. What changed: a new general.unix_socket_mode setting (octal string, default "0600") fixes the permission bits on the socket file immediately after bind. The mode is validated at config load — invalid octal, empty strings, and bits above 0o777 (setuid/setgid/sticky) are rejected upfront rather than failing later inside the listener. The applied mode is logged at startup. To grant a Unix group, set "0660"; to allow any local user, set "0666".
Needed: cargo clippy --tests --deny "warnings" failed with 34 errors across BDD test helpers and unit-test modules, blocking CI lint gate. What changed: applied clippy's recommended fixes - converted manual range checks to RangeInclusive::contains, replaced loop+match patterns with while let, swapped &[b'X'] for b"X" byte strings, removed useless format!/return/lifetimes, derived Default where trivial, used struct update syntax instead of post-construction field assignment, dropped unnecessary i32 casts, and switched from DoubleEndedIterator::last() to next_back(). No test logic or assertions were modified.
Needed: previous run hit the 120-minute polling timeout at 124 minutes of ECS RUNNING state — the bench had finished test 226/240 and only needed ~7 more minutes. Adding the unix socket suite (90 new pgbench runs at -T 30s each) pushed the wall-clock past the old 120-min ceiling. The pre-unix-socket bench took ~1.2 hours, so a 3-hour ceiling gives comfortable headroom for future additions. Also: aws logs get-log-events was called without --start-from-head and without pagination, returning only the tail of CloudWatch events. For a multi-MB log stream this could chop off the ===BEGIN_BENCHMARK_RESULTS=== marker even on a successful run, so the markdown file was at risk of silent truncation. Now we paginate from the start until nextForwardToken stops advancing.
Needed: a 2-hour bench produces hundreds of MB of CloudWatch events (the unix-socket odyssey instance alone logs every connection accept and disconnect). Paginating the whole stream from the start with get-log-events takes 30+ minutes and times out the workflow before the artifact is uploaded. What changed: switched to filter-log-events which accepts a CloudWatch filter pattern and a time window. Two new optional inputs let the caller skip the noisy setup phase or grab a specific minute range. With a focused pattern (e.g. "?Scenario ?pgbench ?error") the workflow finishes in under a minute even on the largest streams.
fc2df36 to
7c96ac6
Compare
added 15 commits
April 6, 2026 13:53
Needed: the previous fix paginated the entire CloudWatch stream from the start to avoid truncating the markdown result. For a 2-hour bench the stream is hundreds of MB (mostly odyssey accept/disconnect spam), so the download step crawled for 30+ minutes and could exceed the GitHub Actions runner timeout. What changed: a single filter-log-events call locates the timestamp of the ===BEGIN_BENCHMARK_RESULTS=== marker server-side, then we only fetch events from that timestamp onward and stop as soon as the END marker lands in the file. The download finishes in seconds regardless of stream size while still capturing the full markdown payload. If the marker is missing (bench did not finish) we fall back to the last 200 events for context.
Needed: 3.4.0 is not released yet, so a 3.4.1 entry on top of it was incorrect. The items attributed to 3.4.1 (log readability, auth failure IP logging, replenish noise, runtime log level control, etc.) all belong to the same unreleased 3.4.0 train.
Needed: 3.4.0 is still unreleased, so a separate Unreleased section above it was redundant. The Unix socket listener, HBA local rule matching, and unix_socket_mode entries all belong to the same 3.4.0 release train.
Needed: the new is_unix parameter turned check_hba into a security-critical matcher (local rules gate Unix sockets, host* rules gate TCP). The only existing tests exercised TCP paths, so any regression in the unix_socket listener's authentication path would have passed CI. What changed: thirteen unit tests spread across src/auth/hba.rs and src/config/tests.rs pin down the new semantics — local vs host rule transport matching, reject-before-trust ordering via Unix, scram-only rules returning NotMatched for md5 clients, and the legacy general.hba fallback allowing all Unix clients while still enforcing CIDR limits on TCP clients. The legacy branch became reachable from tests after extracting check_hba_with_general, a pure helper that takes an explicit General snapshot instead of pulling from the global config.
Needed: the listener bind path created the socket file at umask-derived rights (usually 0644), then ratcheted them down to unix_socket_mode via set_permissions. The window between bind and chmod allowed a concurrent local client to reach the listener with weaker permissions than the operator configured, silently bypassing the intended restriction and contradicting the changelog promise that the socket "does not depend on the process umask". What changed: before bind() the process temporarily tightens its umask to cover every bit the configured mode forbids, then restores the old umask once the file is in place. set_permissions still runs afterwards so operators can loosen the mode (e.g. 0660 with a group bit) — the guard only enforces a floor. Two serial unit tests pin the guard's restore-on-drop and bit-preservation behaviour, and two new BDD scenarios stat the socket file to verify both the 0600 default and the 0660 override are honoured end-to-end.
Needed: the listener bring-up called remove_file unconditionally, so pointing unix_socket_dir at a shared directory such as /var/run/postgresql would silently delete another service's live socket and cut off every local client before pg_doorman even bound its own. What changed: a dedicated prepare_unix_socket_path helper stats the target, probes it with a local connect, and only unlinks the file when the probe is refused (stale leftover from a crash). If a process is still listening, the helper returns a descriptive error and run_server exits with OSERR instead of clobbering the foreign inode. Three unit tests cover the missing-file, stale-file, and live-listener branches.
Needed: when an operator migrates to the Unix listener but keeps the legacy general.hba CIDR whitelist, the IP-based rule no longer covers local clients — check_hba_with_general returns Allow for every Unix connection. The silent privilege expansion is only visible by reading the HBA matcher source, so an operator can easily miss it. What changed: Config::validate now calls legacy_hba_bypassed_by_unix_socket and emits a warn! that tells the operator to move to pg_hba with explicit `local` rules if they want the whitelist to keep working. The check lives in a dedicated helper so unit tests cover the four branches (bypass detected, no unix dir, empty legacy list, pg_hba overrides) without touching the log stream.
Needed: the Unix accept branch dropped the socket the moment CURRENT_CLIENT_COUNT exceeded max_connections, so psql and similar clients saw "server closed the connection unexpectedly" and operators went hunting for a segfault or an OOM. The TCP branch had the right behaviour for years — it consumes the startup packet and returns a proper 53300 ErrorResponse — and Unix needs to match it. What changed: a new client_entrypoint_too_many_clients_already_unix reads the startup message (refusing a stray SSL request with a clear 08P01) and writes the same "sorry, too many clients already" error response that TCP emits. The Unix accept branch calls it instead of silently returning, so max_connections rejection is now observable end-to-end for both transports. Test coverage for the rejection BDD scenario is intentionally deferred: the scenario needs two concurrent Unix clients to cross the limit deterministically, and the current BDD harness has no helper for parallel psql processes. Follow-up will land once that helper exists.
Needed: on SIGUSR2 binary upgrade the successor rebinds the same .s.PGSQL.<port> file before the parent reaches its shutdown path. The parent then unlinked the path unconditionally, wiping the successor's inode and leaving the new listener nameless — every subsequent Unix client would see ENOENT. The changelog admitted a ~100ms refusal gap, but the real outcome was losing the Unix listener until a full restart. What changed: right after bind+chmod the listener startup captures the (dev, ino) of the socket file via a new UnixSocketOwnership helper. At shutdown we stat the path again and only call remove_file when the inode still matches; if it has changed we log that another process owns the path and leave the file alone. The decision lives in a pure UnixSocketOwnership::inspect helper covered by six unit tests — match, replacement, missing, skip-on-stat-error, plus a round-trip that actually re-binds the same path to exercise the full "successor replaces inode" scenario.
Needed: run_server inlined ~60 lines of bind+chmod+ownership+umask plumbing, which made the failure modes (missing directory, stale socket, chmod denied) unreachable from unit tests and noisy to read. What changed: create_unix_listener now owns the whole bring-up — prepare path, raise the umask, bind, drop the guard, chmod, capture ownership — and returns a Result so the caller keeps the exit-on- error policy without leaking OS exit codes into the helper. Three tokio tests exercise the happy path, the missing-directory error, and the group-readable mode override in a tempdir, so regressions in any of those branches trip CI instead of a production bring-up.
Needed: the HBA matcher and Client::startup took a (SocketAddr, ssl,
is_unix) triplet with two unrelated booleans sitting next to each
other — a classic swap-me footgun. On the Unix path callers had to
manufacture a 127.0.0.1:0 "fake_addr" to satisfy the TCP-shaped
signature, which then leaked into SHOW CLIENTS rows and HBA error
messages and made operators think unix clients were localhost TCP.
What changed: a new top-level ClientTransport enum (Tcp { peer, ssl }
or Unix) is threaded through PgHba::check_hba, check_hba_with_general,
the public config::check_hba wrapper, Client::startup, and both
client_entrypoint variants. is_unix(), is_tls(), hba_ip() and
peer_display() keep the call sites readable. Unix clients no longer
construct fake_addr at the entrypoint layer — the sentinel lives
exclusively inside Client::startup for struct-level compatibility
while a follow-up teaches Client to store a typed peer. Existing HBA
unit tests and config legacy-hba tests were ported to the new API
(nothing lost), the hba_eval_tests suite now builds its
ClientIdentifier via ClientTransport::Tcp, and test modules that
touch the filesystem were marked `#[serial]` to coexist with the
umask guard tests. 423 lib tests stay green.
Needed: the TCP and Unix entrypoints each inlined the same "call Client::startup, log connected, run handle, flush disconnect_stats" sequence. Same four locals, same branch shape, only the ClientTransport variant and the log label differed. The plain-continue-after-rejected- TLS branch in the TCP path was a third copy. Any change to the lifecycle meant hunting down three mirrored places. What changed: a new drive_authenticated_client generic helper takes the split stream halves, the transport descriptor, and a log label, then runs the full post-startup pipeline. The three former copies collapse to single call sites and client_entrypoint_unix shrinks to a straight dispatch on the three ClientConnectionType variants. No behavioural change — tests (incl. the 13 HBA unit tests) still pass.
Needed: both accept sites inlined the same match on Result<Option<ClientSessionInfo>, Error> — same identity formatting, same elapsed-time rendering, same info-vs-warn split. Any tweak to session logging meant changing two places in lockstep. What changed: a new log_session_end helper takes the entrypoint result, the connection_id, a peer label, the session start time, and the config flag, and produces the single-line disconnect log. The TCP accept branch passes addr.to_string(), the Unix branch passes the "unix:" sentinel — the only transport-specific bit left in the select arms. Spawned tasks shrink from ~45 lines each to ~25 with no behavioural change; ClientSessionInfo became pub to cross the module boundary. 423 lib tests green.
Needed: the unix-socket feature only carried the happy-path scenario plus the file-mode assertions added with the chmod fix. Critical HBA and TLS rules that reach the matcher only via the new ClientTransport plumbing had no end-to-end coverage, so a regression in is_unix handling could pass CI silently. What changed: three new scenarios on top of the existing happy path — "local reject blocks Unix", "host rule does not match Unix", and "only_ssl_connections does not block Unix". Each spins up pg_doorman with the relevant pg_hba snippet (or tls_mode = require) and asserts the expected outcome via psql over the socket. Two new step definitions in postgres_helper exercise the rejection path: psql_unix_connection_fails (no error string check) and psql_unix_connection_fails_with (substring match for structured error verification).
Needed: the BDD suite checked the trust path and the bare-rejection path through the new HBA local-rule plumbing, but never exercised a password-authenticated client coming in over the Unix socket. md5 is the most common production setup, so a regression in the auth pipeline for Unix would still pass CI. What changed: two new scenarios in unix-socket.feature reuse the existing tests/fixture.sql user (example_user_1 / md5...test) — one asserts SELECT 1 succeeds when the password matches, the other asserts the connection fails with the wrong password. Two new step definitions in postgres_helper supply the PGPASSWORD env var for both the success and failure paths.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
unix_socket_dirconfig creates.s.PGSQL.<port>socket file. Connect withpsql -h <dir>orpgbench -h <dir>.localrules now match Unix socket connections (previously parsed but ignored).host/hostssl/hostnosslmatch only TCP.is_unixflag threaded throughcheck_hba→Client::startupfor correct rule dispatch.Known limitations
unix_socket_permissions. Restrict access through directory permissions.SIGUSR2binary upgrade (~100ms reconnect window).only_ssl_connectionsdoes not reject Unix socket connections.