Skip to content

⛱️ Stripe Mutex Lock Contention #12788

@masaori335

Description

@masaori335

Summary

We have observed a lock contention issue with the Stripe mutex. This is an umbrella issue to track related changes.

Problem

Cache::open_read() has severe lock contention - every read operation (includes cache lookup) acquires an exclusive lock on stripe->mutex, serializing all cache operations and limiting throughput.

CACHE_TRY_LOCK(lock, stripe->mutex, mutex->thread_holding);

Difficulties

I attempted to use a reader-writer lock instead of a mutex lock, and some proof-of-concept tests showed significant performance improvements. However, I found that we cannot simply replace this mutex lock with a reader-writer lock or a lock-free data structure. The main reason is that StripeSM, as a Continuation, requires the mutex lock when called from the event system.

#12601 is recent another attempt by @bryancall.

Event Handlers

  • Event handlers of StripeSM

int handle_dir_clear(int event, void *data);
int handle_dir_read(int event, void *data);
int handle_recover_from_data(int event, void *data);
int handle_recover_write_dir(int event, void *data);
int handle_header_read(int event, void *data);

Dir operations

Some Dir functions seems read only operation, but it actually does write operation under some conditions.

  • e.g. Directory::probe()

} else { // delete the invalid entry
ts::Metrics::Gauge::decrement(cache_rsb.direntries_used);
ts::Metrics::Gauge::decrement(stripe->cache_vol->vol_rsb.direntries_used);
ATS_PROBE7(cache_dir_remove_invalid, stripe->fd, s, dir_to_offset(e, seg), dir_offset(e), dir_approx_size(e),
key->slice64(0), key->slice64(1));
e = dir_delete_entry(e, p, s, this);
continue;

Proposed Solution

Implement a two-tier locking architecture by decoupling StripeSM and Stripe:

  1. Separate StripeSM (Continuation) and Stripe (shared data)

StripeSM (a Continuation) contains event handlers, while Stripe contains shared data.

Half of this change has already been completed by #11565 and related PRs, but we still need to clarify the separation between event handling and shared data access more explicitly.

  1. Add Reader-Writer Lock to Stripe

Access to the shared data requires a reader-writer lock to allow concurrent reading. Alternatively, making Stripe a lock-free data structure (using RCU or Hazard Pointers) is another option.

  1. Allocate StripeSM per Transaction

Each cache operation gets a lightweight StripeSM instance with its own mutex for event handling. It acquires an RW lock on the shared Stripe for data access.

Architecture Diagram

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions