Skip to content

Stable Constant Qps#1144

Open
jcleezer wants to merge 8 commits intomasterfrom
oyadav/stable-constant-qps
Open

Stable Constant Qps#1144
jcleezer wants to merge 8 commits intomasterfrom
oyadav/stable-constant-qps

Conversation

@jcleezer
Copy link
Contributor

  1. Problem Statement
    When setting a target QPS for dark cluster forking, the observed dispatch rate does not match the configured target. The error was be as high as +33% or -25% depending on the target QPS value.

  2. Test Setup
    Source cluster: IRPS-irps-feedstorage-test17-0 (1 pod)
    Dark cluster: IRPS-irps-feedstorage-test9-0 (1 pod)
    Incoming traffic: ~350-380 req/s (consistent across all tests)
    Buffer: size=2000, TTL=10 seconds for ConstantQpsRateLimiter
    Cluster size ratio: 1:1

  3. Root Cause Analysis
    3.1 How the Rate Limiter Works
    The dispatch chain is:

IrpRcService.trafficRecord()
→ ConstantQpsForkingStrategy.handleRequest() [~350/s incoming]
→ ConstantQPSDarkClusterStrategy.handleUnaryRequest()
→ rateLimiter.submit(callback) [adds to circular buffer]
→ EventLoop dispatches from buffer at rate [observed QPS]
→ IrpBaseDarkClusterDispatcher.unaryCall() [actual dark cluster call]
3.2 Key Finding: Buffer Replays Requests
The EvictingCircularBuffer.get() does not remove items from the buffer. Items stay until they expire (TTL=10s) or are overwritten by newer items. The rate limiter's event loop circles through the buffer, re-dispatching the same requests to maintain the target rate even when incoming traffic is lower than the target.
From the SI source code comment:

"should only be used in cases where the user demands a constant rate of callback execution, and it's not important that all callbacks are executed, or executed only once."
This is by design for dark cluster testing.

  1. Proposed Fix for SI Library
    4.1 Requirement
    The rate limiter must dispatch at rates that are not constrained to integer-ms periods. For a target of 750 QPS, the ideal period is 1.333ms -- the fix must achieve sub-millisecond timing precision without changing the SI library's public API.
    5.2 Algorithm: Fractional Permit Accumulation
    Instead of refilling a fixed burst of permits every Math.round(period) ms, accumulate fractional permits each millisecond based on the exact rate.
    The key insight: run the event loop at a fixed 1ms tick, but track permits as a double and accumulate targetQps / 1000.0 permits per tick.

permitsPerMs = targetQps / 1000.0
Every 1ms tick:
permitBalance += permitsPerMs
while permitBalance >= 1.0:
dispatch one request
permitBalance -= 1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants