Goal
Make QUIC connection migration / NAT-rebind survive setWorkers > 1.
Single-worker migration already works (server commit 27f073c, RFC 9000 §9: feed ngtcp2 the real datagram source, sync conn->peer, drain on ps.path). But with a worker pool, each worker runs its own http3_listener_t with its own per-worker conn_map, and the kernel's SO_REUSEPORT group hashes by 4-tuple. When a client migrates (new source port → new 4-tuple), the rebound datagram can hash to a different worker that has no entry for that connection → it emits a stateless reset → the connection breaks.
This is the last open piece of the HTTP/3 migration story (roadmap #59, now closed).
Chosen approach — userspace forward (h2o / quicly style, not eBPF)
- nginx steers with eBPF (
bpf_sk_select_reuseport); h2o encodes a thread id in the CID and forwards misrouted packets between threads over an fd array. We take the h2o style: portable, no root/eBPF.
- Encode the owning worker index into the SCID (the connection id the server mints): high nibble of byte 0,
scid[0] = (worker_index << 4) | (rand & 0x0f). HTTP3_SCID_LEN == 8, so the nibble is free. Caps at 16 workers (matches quicly's thread_id:24-style field; widen the field if we ever need >16 H3 workers).
- A short-header (1-RTT) datagram with no local conn whose DCID names a different worker → forward the raw datagram to the owner; the owner injects it into its own
http3_connection_dispatch from its own reactor thread (preserves the single-thread-per-ngtcp2_conn invariant). Forwards are rare (only migrated conns).
- Long-header / Initial packets (
vc.version != 0) are never forwarded — those are brand-new handshakes the receiving worker legitimately owns; their DCID is client-chosen and carries no worker nibble.
⛔ Blocker (the reason this is paused)
The original plan assumed the owner could drain a ThreadChannel non-blocking from inside the H3 poll callback. Verified in source — this is impossible as written:
zend_async_channel_t ABI (php-async Zend/zend_async_API.h:1925-1931) has only send / receive / close. Both block:
thread_channel_send (thread_channel.c:95-115): pushes immediately when there is room, but on a full buffer registers a trigger and calls ZEND_ASYNC_SUSPEND() (needs a coroutine).
thread_channel_receive (thread_channel.c:142-185): pops immediately when data is present, but on empty suspends.
- There is no
try_send / receive_nowait / peek / count in the channel ABI (checked shipped php-release/include and live php-src/Zend).
http3_listener_poll_cb is a reactor event callback, not a coroutine → it cannot suspend. send() from the poll callback on a full buffer would crash, and that is flood-triggerable (forged short-header packets with a spoofed worker nibble).
- The channel's own
event base is not notified on send (only receiver_triggers of an already-parked receiver fire) → hanging a callback on channel.event won't wake on an incoming packet.
- The only non-blocking access that exists today is direct
circular_buffer_pop/push under ch->mutex (as php-async thread_pool.c:808-820 thread_pool_drain_tasks does). The buffer + mutex fields are public in thread_channel.h, but circular_buffer_* (internal/) and ASYNC_MUTEX_* (zend_common.h) are not shipped in php-release/include → unusable from the server today.
Resume options (pick one before coding)
- (A) Extend ThreadChannel + drain coroutine. Add non-blocking
async_thread_channel_try_send() / try_receive() to php-async thread_channel.{h,c} (lock + circular_buffer_*, drop-on-full for send, immediate-or-false for receive). Owner wakes via its own zend_async_trigger_event_t (uv_async_send, thread-safe, per-thread loop) and/or a parked drain coroutine. Keeps the documented "ThreadChannel + Z_PTR" direction and reuses the IS_PTR passthrough.
- (B) Self-rolled per-worker MPSC ring + trigger event in the server, no ThreadChannel. Forward-hook pushes (drop-on-full) and fires the trigger; owner drains in a plain event callback (no coroutine). More idiomatic to this codebase's "no coroutine in internal machinery" rule, but reimplements a small thread-safe queue and leaves the IS_PTR passthrough unused.
Verified edit map (line numbers current as of branch 59-h3-finish-all)
- Worker index plumbing.
pool_worker_ctx_t (server src/http_server_class.c:1620) holds only server_transit → add int worker_index. Fill ctxs[i].worker_index = i in the fill loop (1949). Stamp it onto the worker clone between LOAD and start() in pool_worker_handler (1639): Z_HTTP_SERVER_P(&server_zv)->worker_index = wctx->worker_index; (the clone is rebuilt fresh per worker — a raw field on http_server_object would be dropped).
- SCID encode — TWO sites. Initial SCID at
src/http3/http3_connection.c:301-307 (after http3_fill_random, before the ngtcp2 copy/conn_map registration). And get_new_connection_id_cb at src/http3/http3_callbacks.c:89 — stamp cid->data[0] before http3_packet_compute_sr_token (line ~103) so the SR token covers the final bytes. Missing the second site silently breaks steering after the client migrates to a server-issued CID. Decode helper http3_dcid_decode_worker_index(dcid, len) → dcid[0] >> 4 (or −1 if len 0), add to http3_internal.h next to HTTP3_SCID_LEN.
- Shared channel array. It must live in
http_server_shared_config_t (src/http_server_config.c:50-115, config->frozen) — the single pemalloc'd, atomically-refcounted snapshot whose pointer all workers share (TRANSFER config.c:2894 / LOAD config.c:2918). Not on http_server_object and not in http_server_view_t (those are per-worker copies and would each get a separate pointer). Populate from http_server_start_pool after freeze; free in the shared-config destructor (config.c:~2758). Access from H3 via http3_listener_server_obj(l) → config → frozen → worker_channels.
- Forward-hook.
http3_connection_dispatch (src/http3/http3_connection.c:548): after ngtcp2_pkt_decode_version_cid (:563) and the conn_map lookup (:581), inside the conn == NULL && vc.version == 0 branch (:586/:592), before the stateless reset (:596): decode owner from vc.dcid[0] >> 4; if owner != my_worker && owner < worker_count, forward and return. peer is const struct sockaddr * backed by sockaddr_storage — the forward struct must copy bytes+len+ecn+sockaddr (the vc.dcid/data pointers point into the recvmmsg buffer, valid only synchronously). Forward payload is pemalloc, freed by the receiver after inject.
- Inject / drain. Owner pulls from its channel and calls
http3_connection_dispatch(my_listener, …) + http3_listener_flush_dirty from its own reactor thread, then pefree. Drain entry near http3_listener_poll_cb end (src/http3/http3_listener.c:541, after flush_dirty) and/or a dedicated wake — depends on option A vs B above. The http3_listener_t struct is defined in http3_listener.c:66-176 (opaque in the header) — new fields go there.
- Test.
workers=2 + h3client H3CLIENT_REQUEST_COUNT=2 H3CLIENT_MIGRATE_AFTER=1: both responses 200, owner worker quic_conn_accepted == 1, quic_path_migrations >= 1. Model on tests/phpt/server/h3/032-h3-connection-migration.phpt. Note: per-worker getHttp3Stats() visibility from the parent thread is uncertain with a pool — the primary signal is both requests succeeding across the rebind. New counters quic_forwarded_out / quic_forwarded_in should be added to http3_packet_stats_t (src/http3/http3_packet.h).
Status
- Paused. No server cross-worker code written (the map above came from a read-only analysis pass — nothing to revert).
- php-async IS_PTR passthrough in
thread_transfer_zval_inner (thread.c) is committed under true-async/php-async and referenced from this issue — it lets a ThreadChannel carry an opaque Z_PTR to a persistent packet struct with no copy/refcount. Benign and generally useful regardless of which resume option we pick.
- The 6 commits on branch
59-h3-finish-all (single-worker migration + tests + HEAD-body fix + rejected-stream-leak fix + docs) are independent finished work, not part of this issue.
- Environment caveat: WSL, loopback only, no
sch_netem on the default kernel (a netem-enabled bzImage is staged but needs a wsl --shutdown to take effect) — lossy-path testing is limited.
Goal
Make QUIC connection migration / NAT-rebind survive
setWorkers > 1.Single-worker migration already works (server commit
27f073c, RFC 9000 §9: feed ngtcp2 the real datagram source, syncconn->peer, drain onps.path). But with a worker pool, each worker runs its ownhttp3_listener_twith its own per-workerconn_map, and the kernel'sSO_REUSEPORTgroup hashes by 4-tuple. When a client migrates (new source port → new 4-tuple), the rebound datagram can hash to a different worker that has no entry for that connection → it emits a stateless reset → the connection breaks.This is the last open piece of the HTTP/3 migration story (roadmap #59, now closed).
Chosen approach — userspace forward (h2o / quicly style, not eBPF)
bpf_sk_select_reuseport); h2o encodes a thread id in the CID and forwards misrouted packets between threads over an fd array. We take the h2o style: portable, no root/eBPF.scid[0] = (worker_index << 4) | (rand & 0x0f).HTTP3_SCID_LEN == 8, so the nibble is free. Caps at 16 workers (matches quicly'sthread_id:24-style field; widen the field if we ever need >16 H3 workers).http3_connection_dispatchfrom its own reactor thread (preserves the single-thread-per-ngtcp2_conninvariant). Forwards are rare (only migrated conns).vc.version != 0) are never forwarded — those are brand-new handshakes the receiving worker legitimately owns; their DCID is client-chosen and carries no worker nibble.⛔ Blocker (the reason this is paused)
The original plan assumed the owner could drain a
ThreadChannelnon-blocking from inside the H3 poll callback. Verified in source — this is impossible as written:zend_async_channel_tABI (php-async Zend/zend_async_API.h:1925-1931) has onlysend/receive/close. Both block:thread_channel_send(thread_channel.c:95-115): pushes immediately when there is room, but on a full buffer registers a trigger and callsZEND_ASYNC_SUSPEND()(needs a coroutine).thread_channel_receive(thread_channel.c:142-185): pops immediately when data is present, but on empty suspends.try_send/receive_nowait/peek/countin the channel ABI (checked shippedphp-release/includeand livephp-src/Zend).http3_listener_poll_cbis a reactor event callback, not a coroutine → it cannot suspend.send()from the poll callback on a full buffer would crash, and that is flood-triggerable (forged short-header packets with a spoofed worker nibble).eventbase is not notified onsend(onlyreceiver_triggersof an already-parked receiver fire) → hanging a callback onchannel.eventwon't wake on an incoming packet.circular_buffer_pop/pushunderch->mutex(asphp-async thread_pool.c:808-820thread_pool_drain_tasksdoes). Thebuffer+mutexfields are public inthread_channel.h, butcircular_buffer_*(internal/) andASYNC_MUTEX_*(zend_common.h) are not shipped inphp-release/include→ unusable from the server today.Resume options (pick one before coding)
async_thread_channel_try_send()/try_receive()to php-asyncthread_channel.{h,c}(lock +circular_buffer_*, drop-on-full for send, immediate-or-false for receive). Owner wakes via its ownzend_async_trigger_event_t(uv_async_send, thread-safe, per-thread loop) and/or a parked drain coroutine. Keeps the documented "ThreadChannel + Z_PTR" direction and reuses the IS_PTR passthrough.Verified edit map (line numbers current as of branch
59-h3-finish-all)pool_worker_ctx_t(server src/http_server_class.c:1620) holds onlyserver_transit→ addint worker_index. Fillctxs[i].worker_index = iin the fill loop (1949). Stamp it onto the worker clone between LOAD andstart()inpool_worker_handler(1639):Z_HTTP_SERVER_P(&server_zv)->worker_index = wctx->worker_index;(the clone is rebuilt fresh per worker — a raw field onhttp_server_objectwould be dropped).src/http3/http3_connection.c:301-307(afterhttp3_fill_random, before the ngtcp2 copy/conn_map registration). Andget_new_connection_id_cbatsrc/http3/http3_callbacks.c:89— stampcid->data[0]beforehttp3_packet_compute_sr_token(line ~103) so the SR token covers the final bytes. Missing the second site silently breaks steering after the client migrates to a server-issued CID. Decode helperhttp3_dcid_decode_worker_index(dcid, len)→dcid[0] >> 4(or −1 if len 0), add tohttp3_internal.hnext toHTTP3_SCID_LEN.http_server_shared_config_t(src/http_server_config.c:50-115,config->frozen) — the single pemalloc'd, atomically-refcounted snapshot whose pointer all workers share (TRANSFERconfig.c:2894/ LOADconfig.c:2918). Not onhttp_server_objectand not inhttp_server_view_t(those are per-worker copies and would each get a separate pointer). Populate fromhttp_server_start_poolafter freeze; free in the shared-config destructor (config.c:~2758). Access from H3 viahttp3_listener_server_obj(l) → config → frozen → worker_channels.http3_connection_dispatch(src/http3/http3_connection.c:548): afterngtcp2_pkt_decode_version_cid(:563) and theconn_maplookup (:581), inside theconn == NULL && vc.version == 0branch (:586/:592), before the stateless reset (:596): decode owner fromvc.dcid[0] >> 4; ifowner != my_worker && owner < worker_count, forward and return.peerisconst struct sockaddr *backed bysockaddr_storage— the forward struct must copy bytes+len+ecn+sockaddr (thevc.dcid/datapointers point into the recvmmsg buffer, valid only synchronously). Forward payload is pemalloc, freed by the receiver after inject.http3_connection_dispatch(my_listener, …)+http3_listener_flush_dirtyfrom its own reactor thread, thenpefree. Drain entry nearhttp3_listener_poll_cbend (src/http3/http3_listener.c:541, afterflush_dirty) and/or a dedicated wake — depends on option A vs B above. Thehttp3_listener_tstruct is defined inhttp3_listener.c:66-176(opaque in the header) — new fields go there.workers=2+h3client H3CLIENT_REQUEST_COUNT=2 H3CLIENT_MIGRATE_AFTER=1: both responses 200, owner workerquic_conn_accepted == 1,quic_path_migrations >= 1. Model ontests/phpt/server/h3/032-h3-connection-migration.phpt. Note: per-workergetHttp3Stats()visibility from the parent thread is uncertain with a pool — the primary signal is both requests succeeding across the rebind. New countersquic_forwarded_out/quic_forwarded_inshould be added tohttp3_packet_stats_t(src/http3/http3_packet.h).Status
thread_transfer_zval_inner(thread.c) is committed undertrue-async/php-asyncand referenced from this issue — it lets aThreadChannelcarry an opaqueZ_PTRto a persistent packet struct with no copy/refcount. Benign and generally useful regardless of which resume option we pick.59-h3-finish-all(single-worker migration + tests + HEAD-body fix + rejected-stream-leak fix + docs) are independent finished work, not part of this issue.sch_netemon the default kernel (a netem-enabled bzImage is staged but needs awsl --shutdownto take effect) — lossy-path testing is limited.