Skip to content

feat: Server offloading in dc-quic#3061

Open
maddeleine wants to merge 13 commits intoaws:mainfrom
maddeleine:offloading
Open

feat: Server offloading in dc-quic#3061
maddeleine wants to merge 13 commits intoaws:mainfrom
maddeleine:offloading

Conversation

@maddeleine
Copy link
Copy Markdown
Contributor

@maddeleine maddeleine commented Apr 29, 2026

Release Summary:

Integrated offloading feature into dc-quic server

Resolved issues:

N/A

Description of changes:

This code integrates the offloading feature into a dc-quic server. It's a pretty significant code change because the exporter secret is now only accessible in the Offload feature's on_exporter_secret callback. This means that I had to make the dc-quic Secret and Stateless Reset creation available in both the normal path and the offloaded path, which accounts for most of this code change.
I did have to change the public API of both the Offload trait and the DC trait to do this, which should be fine since they're both unstable.
Changes to the Offload trait:

  • Added a new API on_client_application_params. This is something we didn't consider when we wrote the feature. We need a callback if we want to provide mutable access to the server's transport parameters based on what the client sent. The existing code did not allow a user to actually mutate the server's params in time due to the async request/response message passing.

Changes to the DC trait:

  • Adds a callback on_secret. This is due to the fact that the dc-quic secret is now being passed to dc-quic wrapped in a Box rather than having access to the TlsSession trait.

Call-outs:

I could resurrect the packet storage feature first to get rid of all the caveats in our bach tests about not having it. I just didn't bother since just this change showed good perf numbers in the server stress test.

Testing:

Includes bach test for dc-quic handshake success and dc-quic handshake failure.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@maddeleine maddeleine marked this pull request as ready for review April 29, 2026 23:08
@maddeleine maddeleine requested review from a team as code owners April 29, 2026 23:08
Comment thread dc/s2n-quic-dc/src/path/secret/map/handshake.rs Outdated
path_secrets.application_data = application_data;
}
Err(err) => {
path_secrets.error = Some(err.inner);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a dead store since we're not actually doing anything with path_secrets after this line. I think in the previous code it was storing to self.error which probably did actually get persisted (and then logged in the internal code using this as DcError).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My newest revision fixes this. I think it's still not right though... Now I'm actually passing err.inner back to dc-quic, but not using err.msg. Would be nice to have a test for this path.

Comment thread dc/s2n-quic-dc/src/path/secret/map.rs Outdated
Comment thread dc/s2n-quic-dc/src/psk/server/builder.rs Outdated
Comment thread dc/s2n-quic-dc/src/psk/server/builder.rs
Comment thread dc/s2n-quic-dc/src/psk/io.rs Outdated
Comment thread dc/s2n-quic-dc/src/psk/io.rs Outdated
Comment thread quic/s2n-quic-core/src/dc/testing.rs
/// can be stored in the map.
///
/// The boxed type is a PathSecrets struct. This callback is only triggered
/// if offloading is enabled for this endpoint.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth explaining why we have the Any here? Are we confident we can't make it a type Secret = ... on the trait here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more explict what you're thinking here? Not sure what a new type would do on the trait specifically.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type would avoid needing to erase to dyn Any -- that isn't horrible, but it does mean we need runtime coverage of code to notice we're not quite passing the right type when downcasting, making bugs more likely. An associated type could let us avoid that in some cases (similar to the existing one on this trait).

Comment thread quic/s2n-quic-transport/src/space/mod.rs Outdated
@maddeleine maddeleine requested a review from Mark-Simulacrum May 6, 2026 05:35
_session: &impl s2n_quic_core::crypto::tls::TlsSession,
_e: &(dyn core::error::Error + Send + Sync + 'static),
) -> Option<Box<dyn std::any::Any + Send>> {
// TODO: Not sure if we need to be doing anything with these errors
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this event get emitted on the primary s2n-quic subscriber? We have code internally which uses that event (https://tiny.amazon.com/7zc7p9aa). In general I thought we had wired up offloading such that it still behaves like a 'normal' s2n-quic event emitter, albeit at the cost of extra forced allocations. So I'm a bit confused on why we need an ExporterHandler rather than just keeping the main events... maybe it's because of the ability to return from these with state?

}
Err(e) => {
self.error = Some(e);
Err(s2n_quic_core::transport::Error::INTERNAL_ERROR)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the old code returning as an error in this case? I think it probably wasn't an internal error, right? The error here is mostly going to be a 'normal' thing, not a bug inside s2n-quic (e.g., extra pass of certificate validation failed), so I don't think we want to bucket it as an internal error unless that's what we were doing before.

self
}

/// Controls the number of extra threads used to process incoming TLS handshakes (default: off)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"off" is not a valid value for usize. I'd suggest writing this comment with a set of examples:

  • 0 - no offloading, single thread per provider
  • 1+ - one extra thread performing offload from primary network I/O event loop

(Obviously assuming these are accurate)

let param_decoder = DecoderBuffer::new(client_params.transport_parameters);
let Ok((client_params, remaining)) =
<ClientTransportParameters as s2n_codec::DecoderValue>::decode(param_decoder)
else {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do the parsing here, rather than passing into this method ClientTransportParameters and parsing somewhere inside s2n-quic?


let server = if builder.thread_offload_count > 0 {
// Hs=handshake, o=offload, s=server
let thread_name = format!("hsos-{}", builder.thread_offload_count);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at this, I'm actually not sure if including the thread count is useful here. In most profiling contexts we should have a thread ID (or be able to get one), so distinguishing the threads by name is not so useful. Maybe let's expand this a bit to hs-s-offload, which should still fit (we have 15 characters).

_client_params: tls::ApplicationParameters,
server_params: &mut Vec<u8>,
) -> Option<std::result::Result<(), s2n_quic_core::transport::Error>> {
let dc_quic_params: [u8; 6] = [128, 220, 0, 0, 1, 0];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to properly encode something here? Seems like this is going to be a pain to debug failures in / tweak in the future if these params change/expand. I'm also a bit worried that it's easy for this to diverge from the production configuration so we might not be testing what we think we are.

match session_info.session.poll(&mut context)? {
Poll::Ready(_success) => {
if Config::DcEndpoint::ENABLED {
if let Some(context) = self.tls_context.take() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is taking the tls_context the right thing here? It's not obvious to me that it won't get used again -- I think there are subsequent events we might need to emit with it?

I'm also a bit confused, since while on_token calls its parameter context, it passes it to on_secret which seems to expect a PathSecrets value. Is that actually the same as the TLS context after your changes? If so, that seems a little odd to me (that we started using the TLS context where AFAICT we didn't before), maybe we can tweak naming or add some comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants