Skip to content

Use spin_lock_bh on recinf->lock to fix softirq deadlock#313

Closed
aversecat wants to merge 1 commit into
mainfrom
auke/recov_bh_spinlock
Closed

Use spin_lock_bh on recinf->lock to fix softirq deadlock#313
aversecat wants to merge 1 commit into
mainfrom
auke/recov_bh_spinlock

Conversation

@aversecat
Copy link
Copy Markdown
Contributor

timer_callback() runs in softirq context and acquires recinf->lock, but the process-context callers (scoutfs_recov_prepare, _begin, _finish, _is_pending, _next_pending, _shutdown) were taking the same lock with plain spin_lock(), leaving softirqs enabled. Found by Lockdep:

	================================
	WARNING: inconsistent lock state
	5.14.0-427.35.1.el9_4.x86_64+debug #1 Tainted: G           OE     -------  ---
	--------------------------------
	inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
	swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
	ffff88813cdd9c20 (&recinf->lock){+.?.}-{2:2}, at: timer_callback+0x26/0x380 [scoutfs]
	{SOFTIRQ-ON-W} state was registered at:
	  __lock_acquire+0x7d0/0x1900
	  lock_acquire+0x1da/0x640
	  _raw_spin_lock+0x34/0x80
	  scoutfs_recov_finish+0x80/0x830 [scoutfs]
	  server_greeting+0x244/0xe60 [scoutfs]
	  scoutfs_net_proc_worker+0x28a/0xce0 [scoutfs]
	  recv_one_message+0x7e3/0xd10 [scoutfs]
	  scoutfs_net_recv_worker+0x441/0xe00 [scoutfs]
	  process_one_work+0x8e5/0x1530
	  worker_thread+0x598/0xf70
	  kthread+0x2a4/0x350
	  ret_from_fork+0x29/0x50
	irq event stamp: 549813370
	hardirqs last  enabled at (549813370): [<ffffffffabe25cb4>] _raw_spin_unlock_irq+0x24/0x50
	hardirqs last disabled at (549813369): [<ffffffffabe2594e>] _raw_spin_lock_irq+0x5e/0x90
	softirqs last  enabled at (549813356): [<ffffffffabe28c91>] __do_softirq+0x621/0x9c2
	softirqs last disabled at (549813363): [<ffffffffa9a44665>] __irq_exit_rcu+0x185/0x230

	other info that might help us debug this:
	 Possible unsafe locking scenario:
	       CPU0
	       ----
	  lock(&recinf->lock);
	  <Interrupt>
	    lock(&recinf->lock);

	 *** DEADLOCK ***

Convert the six process-context sites to spin_lock_bh()/spin_unlock_bh().

timer_callback() runs in softirq context and acquires recinf->lock,
but the process-context callers (scoutfs_recov_prepare, _begin,
_finish, _is_pending, _next_pending, _shutdown) were taking the
same lock with plain spin_lock(), leaving softirqs enabled. Found
by Lockdep:

```
	================================
	WARNING: inconsistent lock state
	5.14.0-427.35.1.el9_4.x86_64+debug #1 Tainted: G           OE     -------  ---
	--------------------------------
	inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
	swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
	ffff88813cdd9c20 (&recinf->lock){+.?.}-{2:2}, at: timer_callback+0x26/0x380 [scoutfs]
	{SOFTIRQ-ON-W} state was registered at:
	  __lock_acquire+0x7d0/0x1900
	  lock_acquire+0x1da/0x640
	  _raw_spin_lock+0x34/0x80
	  scoutfs_recov_finish+0x80/0x830 [scoutfs]
	  server_greeting+0x244/0xe60 [scoutfs]
	  scoutfs_net_proc_worker+0x28a/0xce0 [scoutfs]
	  recv_one_message+0x7e3/0xd10 [scoutfs]
	  scoutfs_net_recv_worker+0x441/0xe00 [scoutfs]
	  process_one_work+0x8e5/0x1530
	  worker_thread+0x598/0xf70
	  kthread+0x2a4/0x350
	  ret_from_fork+0x29/0x50
	irq event stamp: 549813370
	hardirqs last  enabled at (549813370): [<ffffffffabe25cb4>] _raw_spin_unlock_irq+0x24/0x50
	hardirqs last disabled at (549813369): [<ffffffffabe2594e>] _raw_spin_lock_irq+0x5e/0x90
	softirqs last  enabled at (549813356): [<ffffffffabe28c91>] __do_softirq+0x621/0x9c2
	softirqs last disabled at (549813363): [<ffffffffa9a44665>] __irq_exit_rcu+0x185/0x230

	other info that might help us debug this:
	 Possible unsafe locking scenario:
	       CPU0
	       ----
	  lock(&recinf->lock);
	  <Interrupt>
	    lock(&recinf->lock);

	 *** DEADLOCK ***
```

Convert the six process-context sites to spin_lock_bh()/spin_unlock_bh().

Signed-off-by: Auke Kok <auke.kok@versity.com>
@aversecat aversecat added the Bugfix Fixes a known bug label May 12, 2026
@aversecat
Copy link
Copy Markdown
Contributor Author

This needs to be part of #314 - closing.

@aversecat aversecat closed this May 13, 2026
@aversecat aversecat deleted the auke/recov_bh_spinlock branch May 13, 2026 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bugfix Fixes a known bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant