Skip to content

refactor signal infrastructure and support SIGINT (ctrl-c)#690

Open
CvvT wants to merge 2 commits intomainfrom
weiteng/ctrl_c
Open

refactor signal infrastructure and support SIGINT (ctrl-c)#690
CvvT wants to merge 2 commits intomainfrom
weiteng/ctrl_c

Conversation

@CvvT
Copy link
Contributor

@CvvT CvvT commented Feb 27, 2026

This PR supports async signal SIGINT.

On Linux, the workflow is the following:

  1. when boot, register SIGINT handler --> Ctrl-C arrives --> record signal in TLS and wake up a thread if it is being blocked
  2. about to switch to guest (i.e., prepare_to_run_guest) --> check TLS and queue signals --> process signal
  3. any blocking operations
    3.1. before commit to wait, check_for_interrupt would check TLS and queue signals --> process signal
    3.2. during wait, wake up and go to 3.1
    3.3. after waiting, got to 3.1

For shim to be able to retrieve pending signals from platform, add a new SignalProvider with the interface fn take_pending_signals(&self, mut f: impl FnMut(litebox::shim::Signal)).

One tricky problem is that when a signal arrives, the running thread may be about to be blocked or blocked already (i.e., case 3.1 and 3.2), and we didn't interrupt a thread if it is running inside LiteBox code. In which case, we never have the chance to process that signal. It might be possible to examine the RIP of the thread in the signal handler and divert the execution to skip mutex wait call it if RIP is close to it (e.g., use assembly code to build a critical region), but this may not work on Windows (where we don't call syscall instruction directly). The current solution I have is based on the observation that for all blocking operations (e.g., futex wait, sleep) boil down to RawMutex::block_or_timeout with the per-thread WaitState and the expected value ThreadState::WAITING. WaitState stores the thread status like RUNNING_IN_HOST, WAITING, WOKEN, and etc. Thus, in the signal handler, besides recording the signal, we can update the thread state from WAITING to WOKEN and then calls futex wake. If the thread is about to call futex wait, futex call would return immediately because of the change of the value. If the thread is being blocked, it would wake up and return. To access WaitState in the signal handler, I added two new interfaces on_interruptible_wait_start and on_interruptible_wait_end to RawMutexProvider so that platform can use Waker to wake up the blocking thread.

The signal handlers for SIGINT can be called on any threads. However, we don't want non-guest threads (e.g., background network worker, additional test thread spawned by cargo test) to be called with the signal. For background network worker, we can block those signals. For test thread, the signal handler would check if it has our custom TLS. If not, it blocks those signals and raise them again.

On Windows, the implementation is almost the same except that signal handler is called on a kernel thread (and thus cannot access litebox thread's TLS easily). I have to use a global variable to record all active threads.

I added some end-to-end C tests and Rust unit tests. To make Rust unit test work, I also introduced a new interface fn run_test_thread<R>(f: impl FnOnce() -> R) -> R to ThreadProvider. This is because spawning a guest thread (via sys_clone) and a regular thread in test (via std::thread::spawn) have different routes and require different setup. For example, while running in test, there is no guest code, and thus we don't need to do fs/gs swap (hence no setup for one of fs/gs). To make the code work for both guest thread and non-guest thread, we can set gs == fs before running the test thread.

@CvvT CvvT marked this pull request as ready for review February 27, 2026 02:14
Copy link
Member

@wdcui wdcui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the code change looks good to me. I left some comments below.

///
/// The default implementation simply calls `f()` with no additional setup.
/// Platforms that require explicit thread registration should override this.
fn run_test_thread<R>(f: impl FnOnce() -> R) -> R {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we scope this function to tests only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it. Using cfg(test) does not work here because the tests reside in a different crate (litebox_shim_linux). Test functions cannot be used by a different crate. One possible solution is adding a new feature, but that adds complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can use cfg(debug_assertions).

thread.interrupt();
}

fn run_test_thread<R>(f: impl FnOnce() -> R) -> R {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit this to test only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cfg(debug_assertions)

/// Runs `f`, ensuring that [`CURRENT_THREAD_HANDLE`] is set while in the call to `f`.
fn run_with_handle<R>(tls: &TlsState, f: impl FnOnce() -> R) -> R {
// Safety: `tls_state` lives for the duration of this call.
unsafe { install_tls(tls) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we install tls on demand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All threads should install tls. Previously install_tls was called in the caller function, I moved it here because I want to reuse this function for test threads. So now threads including guest threads and test threads would install tls properly.

.unwrap()
.retain(|h| !Arc::ptr_eq(&h.0, &current.0));
*current.0.lock().unwrap() = None;
let tls_index = TLS_INDEX.load(Ordering::Relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a function to uninstall_tls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated.

// Safety: the TLS pointer is valid as long as the thread is
// alive, and we hold the thread handle lock.
let tls = unsafe { &*inner.tls.0 };
tls.pending_host_signals.fetch_or(bit, Ordering::SeqCst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the effect of this fetch_or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performs a bitwise “or” operation on the current value and the argument val, and sets the new value to the result.

Returns the previous value.

Copy link
Member

@wdcui wdcui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left some comments below.

@github-actions
Copy link

🤖 SemverChecks 🤖 No breaking API changes detected

Note: this does not mean API is unchanged, or even that there are no breaking changes; simply, none of the detections triggered.

Comment on lines +139 to +141
/// Signal numbers are 1-based and must be in the range 1–63.
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
pub struct Signal(u32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: store as a u8?

Comment on lines +166 to +172
/// A set of [`Signal`]s, stored as a 64-bit bitmask.
///
/// Bit `(signum - 1)` is set when signal `signum` is present in the set.
/// Because signal numbers are 1-based and capped at 63, all 63 possible
/// signals fit in a single `u64`.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct SigSet(u64);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are explicitly defining the internal details of how things are stored, you can just expose the value inside it via pub struct SigSet(pub u64) to remove from_u64/as_u64. Alternatively, you should not be specifying how it is being internally stored in the docs.

Comment on lines +115 to +125
/// Provider for consuming platform-originating signals.
///
/// Platforms can record signals (e.g., `SIGINT`) and the shim should call
/// [`SignalProvider::take_pending_signals`] to consume them.
pub trait SignalProvider {
/// Atomically take all pending asynchronous signals (e.g., SIGINT and SIGALRM)
/// for the current thread, passing each one to `f`.
///
/// Platforms that support asynchronous signals should override this method.
#[allow(unused_variables, reason = "no-op by default")]
fn take_pending_signals(&self, f: impl FnMut(crate::shim::Signal)) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite surprised by this provider. There are some surprising inversions happening here, for example, it depends on crate::shim in its signature.

I don't think the Signal sits well in shim, since the shim module is meant for traits that the shim must implement but there isn't a trait being implemented there.

Also, "shim should call" implies that a shim that does not call stuff is incorrect. This sort of requirement is exactly what traits are for, and thus the way the design should be is to actually have the shim implement a trait that probably looks like the following:

pub trait ConsumeSignal {
  fn consume_signal(Signal);
}

Then the platform can always invoke this trait, and the shim can make a decision of it wants to buffer things or not. If you need/want the "all in a group" behavior, then you can make the trait (instead of the above):

pub trait ConsumeSignal {
  fn consume_signals(&[Signal]);
}

/// for the current thread, passing each one to `f`.
///
/// Platforms that support asynchronous signals should override this method.
#[allow(unused_variables, reason = "no-op by default")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer expect rather than allow

Comment on lines +254 to +255
/// The passed `waker` should live at least until the thread leaves the interruptible
/// wait (i.e., [`on_interruptible_wait_end`](Self::on_interruptible_wait_end) is called).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a somewhat surprising requirement; we can simplify this by updating the signature in this to take a Waker rather than a &Waker, right?

pub trait RawMutexProvider {
type RawMutex: RawMutex;

/// Called when a thread enters an interruptible wait.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Called by whom?

///
/// The passed `waker` should live at least until the thread leaves the interruptible
/// wait (i.e., [`on_interruptible_wait_end`](Self::on_interruptible_wait_end) is called).
/// The platform can use the `waker` to wake up the thread while it is in the interruptible wait.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"can use" -> "uses"?

///
/// This is a no-op by default.
#[allow(unused_variables)]
fn on_interruptible_wait_start(&self, waker: &crate::event::wait::Waker<Self>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function is somewhat weird. It is not actually beginning the wait, right? Maybe something like register_waker (and the other one should be unregister_waker)?

Also, what is the expected behavior if someone runs "start" multiple times in a row?

Comment on lines +266 to +269
/// Called when a thread leaves an interruptible wait.
///
/// This is a no-op by default.
fn on_interruptible_wait_end(&self) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can merge with the above one by taking in a Option<Waker> instead of a Waker; in that case, it is behaving more like update_interruptible_waker.

Comment on lines +105 to +112
impl TryFrom<Signal> for litebox::shim::Signal {
type Error = Signal;

fn try_from(value: Signal) -> Result<Self, Self::Error> {
match value {
Signal::SIGINT => Ok(Self::SIGINT),
Signal::SIGALRM => Ok(Self::SIGALRM),
_ => Err(value),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite surprising, haven't you designed the litebox::shim::Signal literally to support the full set of Linux signals? You've even hard-coded the "up to 64" and so on, so I see no reason why this does not support SIGILL etc converted over.

I do think that the long-term move is for litebox::shim::Signal to not have this 64 restriction, but if you are assuming that right now, then wouldn't the correct move just be Signal::from_raw(...).ok_or(Err(...))?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants