feat: Server offloading in dc-quic#3061
Conversation
| path_secrets.application_data = application_data; | ||
| } | ||
| Err(err) => { | ||
| path_secrets.error = Some(err.inner); |
There was a problem hiding this comment.
This looks like a dead store since we're not actually doing anything with path_secrets after this line. I think in the previous code it was storing to self.error which probably did actually get persisted (and then logged in the internal code using this as DcError).
There was a problem hiding this comment.
My newest revision fixes this. I think it's still not right though... Now I'm actually passing err.inner back to dc-quic, but not using err.msg. Would be nice to have a test for this path.
| /// can be stored in the map. | ||
| /// | ||
| /// The boxed type is a PathSecrets struct. This callback is only triggered | ||
| /// if offloading is enabled for this endpoint. |
There was a problem hiding this comment.
Maybe worth explaining why we have the Any here? Are we confident we can't make it a type Secret = ... on the trait here?
There was a problem hiding this comment.
Can you be more explict what you're thinking here? Not sure what a new type would do on the trait specifically.
There was a problem hiding this comment.
The type would avoid needing to erase to dyn Any -- that isn't horrible, but it does mean we need runtime coverage of code to notice we're not quite passing the right type when downcasting, making bugs more likely. An associated type could let us avoid that in some cases (similar to the existing one on this trait).
| _session: &impl s2n_quic_core::crypto::tls::TlsSession, | ||
| _e: &(dyn core::error::Error + Send + Sync + 'static), | ||
| ) -> Option<Box<dyn std::any::Any + Send>> { | ||
| // TODO: Not sure if we need to be doing anything with these errors |
There was a problem hiding this comment.
Does this event get emitted on the primary s2n-quic subscriber? We have code internally which uses that event (https://tiny.amazon.com/7zc7p9aa). In general I thought we had wired up offloading such that it still behaves like a 'normal' s2n-quic event emitter, albeit at the cost of extra forced allocations. So I'm a bit confused on why we need an ExporterHandler rather than just keeping the main events... maybe it's because of the ability to return from these with state?
| } | ||
| Err(e) => { | ||
| self.error = Some(e); | ||
| Err(s2n_quic_core::transport::Error::INTERNAL_ERROR) |
There was a problem hiding this comment.
What was the old code returning as an error in this case? I think it probably wasn't an internal error, right? The error here is mostly going to be a 'normal' thing, not a bug inside s2n-quic (e.g., extra pass of certificate validation failed), so I don't think we want to bucket it as an internal error unless that's what we were doing before.
| self | ||
| } | ||
|
|
||
| /// Controls the number of extra threads used to process incoming TLS handshakes (default: off) |
There was a problem hiding this comment.
"off" is not a valid value for usize. I'd suggest writing this comment with a set of examples:
0- no offloading, single thread per provider1+- one extra thread performing offload from primary network I/O event loop
(Obviously assuming these are accurate)
| let param_decoder = DecoderBuffer::new(client_params.transport_parameters); | ||
| let Ok((client_params, remaining)) = | ||
| <ClientTransportParameters as s2n_codec::DecoderValue>::decode(param_decoder) | ||
| else { |
There was a problem hiding this comment.
Why do we need to do the parsing here, rather than passing into this method ClientTransportParameters and parsing somewhere inside s2n-quic?
|
|
||
| let server = if builder.thread_offload_count > 0 { | ||
| // Hs=handshake, o=offload, s=server | ||
| let thread_name = format!("hsos-{}", builder.thread_offload_count); |
There was a problem hiding this comment.
Hm, looking at this, I'm actually not sure if including the thread count is useful here. In most profiling contexts we should have a thread ID (or be able to get one), so distinguishing the threads by name is not so useful. Maybe let's expand this a bit to hs-s-offload, which should still fit (we have 15 characters).
| _client_params: tls::ApplicationParameters, | ||
| server_params: &mut Vec<u8>, | ||
| ) -> Option<std::result::Result<(), s2n_quic_core::transport::Error>> { | ||
| let dc_quic_params: [u8; 6] = [128, 220, 0, 0, 1, 0]; |
There was a problem hiding this comment.
Any reason not to properly encode something here? Seems like this is going to be a pain to debug failures in / tweak in the future if these params change/expand. I'm also a bit worried that it's easy for this to diverge from the production configuration so we might not be testing what we think we are.
| match session_info.session.poll(&mut context)? { | ||
| Poll::Ready(_success) => { | ||
| if Config::DcEndpoint::ENABLED { | ||
| if let Some(context) = self.tls_context.take() { |
There was a problem hiding this comment.
Is taking the tls_context the right thing here? It's not obvious to me that it won't get used again -- I think there are subsequent events we might need to emit with it?
I'm also a bit confused, since while on_token calls its parameter context, it passes it to on_secret which seems to expect a PathSecrets value. Is that actually the same as the TLS context after your changes? If so, that seems a little odd to me (that we started using the TLS context where AFAICT we didn't before), maybe we can tweak naming or add some comments.
Release Summary:
Integrated offloading feature into dc-quic server
Resolved issues:
N/A
Description of changes:
This code integrates the offloading feature into a dc-quic server. It's a pretty significant code change because the exporter secret is now only accessible in the Offload feature's on_exporter_secret callback. This means that I had to make the dc-quic Secret and Stateless Reset creation available in both the normal path and the offloaded path, which accounts for most of this code change.
I did have to change the public API of both the Offload trait and the DC trait to do this, which should be fine since they're both unstable.
Changes to the Offload trait:
Changes to the DC trait:
Call-outs:
I could resurrect the packet storage feature first to get rid of all the caveats in our bach tests about not having it. I just didn't bother since just this change showed good perf numbers in the server stress test.
Testing:
Includes bach test for dc-quic handshake success and dc-quic handshake failure.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.