Flashring externalize#371
Conversation
WalkthroughThis PR introduces Flashring, a production-grade NVMe-backed cache system in Go featuring: append-only direct-I/O file storage with ring-buffer wrapping, io_uring-based async batch read/write operations, in-memory hash-indexed metadata with TTL/frequency tracking, double-buffered memtables for batched writes, adaptive rewrite prediction, and sharded cache with MGet batching. It includes extensive test coverage, metrics instrumentation, RESP protocol server support, and SSD setup automation. ChangesFlashring Cache System Core
File I/O Benchmark Tool
SSD Preparation Utility
|
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (17)
flashring/pkg/cache/freecache.go-14-18 (1)
14-18:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid global side effects in constructor.
debug.SetGCPercent(20)is a process-wide setting that affects the entire Go runtime. Embedding this in a cache constructor is problematic:
- Unexpected side effect for callers
- Affects all goroutines/components, not just this cache
- Multiple instantiations redundantly set the same value
- Removes caller's control over GC tuning
Move this to application initialization code or document it prominently if intentional for benchmark purposes.
Proposed fix
func NewFreecache(sizeBytes int) (*Freecache, error) { cache := freecache.NewCache(sizeBytes) - debug.SetGCPercent(20) return &Freecache{cache: cache}, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/pkg/cache/freecache.go` around lines 14 - 18, The constructor NewFreecache currently calls debug.SetGCPercent(20), which creates a process-wide side effect; remove the debug.SetGCPercent(20) call from NewFreecache (and any direct runtime/debug manipulation) so the constructor only creates and returns &Freecache{cache: cache}; if you want GC tuning provide guidance in documentation or expose a separate application-level initializer (e.g., SetGlobalGCPercent) that callers can opt into during program startup instead of in the Freecache constructor.flashring/internal/fs/fs.go-87-97 (1)
87-97:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFile descriptor leak if
fdToFilefails.If
fdToFilefails at line 93, the successfully openedfdis never closed. The same pattern exists increatePreAllocatedWriteFileDescriptorandcreateReadFileDescriptor.🐛 Proposed fix for createAppendOnlyWriteFileDescriptor
func createAppendOnlyWriteFileDescriptor(filename string) (int, *os.File, bool, error) { fd, directIO, err := openWithDirectIO(filename, O_WRONLY|O_CREAT|O_DSYNC) if err != nil { return 0, nil, false, err } file, err := fdToFile(fd, filename) if err != nil { + syscall.Close(fd) return 0, nil, false, err } return fd, file, directIO, nil }Apply the same fix to
createPreAllocatedWriteFileDescriptor(lines 111-114) andcreateReadFileDescriptor(lines 124-127).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/fs/fs.go` around lines 87 - 97, The open file descriptor returned by openWithDirectIO is leaked if fdToFile fails; update createAppendOnlyWriteFileDescriptor to close(fd) before returning the error when fdToFile(filename) returns an error, and mirror this fix in createPreAllocatedWriteFileDescriptor and createReadFileDescriptor so any opened fd is closed on fdToFile failure (refer to functions createAppendOnlyWriteFileDescriptor, createPreAllocatedWriteFileDescriptor, createReadFileDescriptor and the fd variable returned by openWithDirectIO).flashring/pkg/metrics/metric.go-117-139 (1)
117-139:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRace condition in
Init()due to check outsidesync.Once.The
initializedflag is checked at line 118 before enteringsync.Once.Do, but set at line 137 inside it. Concurrent callers can pass the early return check simultaneously beforeonce.Doexecutes, leading to unnecessary log spam. More critically,initializedis read and written without synchronization.Move the initialization check inside
sync.Onceor use an atomic flag.🐛 Proposed fix
func Init() { - if initialized { - log.Debug().Msgf("Metrics already initialized!") - return - } once.Do(func() { + if initialized { + return + } var err error samplingRate = viper.GetFloat64("APP_METRIC_SAMPLING_RATE") globalTags := getGlobalTags() statsDClient, err = statsd.New( telegrafAddress, statsd.WithTags(globalTags), ) if err != nil { log.Panic().AnErr("StatsD client initialization failed", err) } log.Info().Msgf("Metrics client initialized with telegraf address - %s, global tags - %v, and "+ "sampling rate - %f, flashring metrics enabled - %v", telegrafAddress, globalTags, samplingRate, metricsEnabled) initialized = true }) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/pkg/metrics/metric.go` around lines 117 - 139, The current Init() reads the package-level initialized flag before once.Do, causing a race; remove the outer if initialized check and rely solely on once.Do to run initialization, then set initialized = true inside the once.Do closure (or replace initialized with an atomic.Bool and use atomic operations) so all reads/writes are synchronized; update Init() to call once.Do(...) without the pre-check and ensure the initialized assignment happens inside the closure after statsd.New succeeds (symbols: Init, initialized, once.Do, statsd.New).flashring/internal/maths/predictor.go-102-115 (1)
102-115:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSame unsigned underflow issue as in estimator.go.
The
ringZonefunction has the same unsigned integer underflow problem whenkeyMemId > activeMemId. See the review comment onestimator.goline 61-68 for details and fix.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/maths/predictor.go` around lines 102 - 115, The ringZone function currently subtracts keyMemId from activeMemId causing unsigned underflow when keyMemId > activeMemId; change the distance calculation to use modular-safe arithmetic (e.g., compute distance as (activeMemId + maxMemTableCount - keyMemId) % maxMemTableCount) so risk is always the forward distance around the ring, then compute pct := float64(risk) / float64(maxMemTableCount) and keep the existing switch; update the ringZone function signature usage accordingly to avoid negative wraparound when keyMemId > activeMemId.flashring/internal/maths/estimator.go-61-68 (1)
61-68:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUnsigned integer underflow in overwrite risk calculation.
When
keyMemId > activeMemId, the subtractionactiveMemId - keyMemIdunderflows (wraps to a large uint32 value). AddingmaxMemTableCountdoes not fix this because the wrapped value is already huge.For example:
activeMemId=2,keyMemId=5,maxMemTableCount=10:
- Expected "risk" might be 7 (positions until overwrite)
- Actual:
(2 - 5 + 10) % 10=(4294967293 + 10) % 10=3Consider casting to signed integers or reordering the arithmetic:
🔧 Proposed fix using signed arithmetic
func (e *Estimator) CalculateRewriteScore(freq uint64, lastAccess uint64, keyMemId, activeMemId, maxMemTableCount uint32) float32 { - overWriteRisk := (activeMemId - keyMemId + maxMemTableCount) % maxMemTableCount + // Use signed arithmetic to handle wrap-around correctly + diff := int32(activeMemId) - int32(keyMemId) + if diff < 0 { + diff += int32(maxMemTableCount) + } + overWriteRisk := uint32(diff) % maxMemTableCount overWriteRiskScore := float32(overWriteRisk) / float32(maxMemTableCount)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/maths/estimator.go` around lines 61 - 68, The subtraction in CalculateRewriteScore underflows when keyMemId > activeMemId; change the overwrite-risk computation to use signed arithmetic or a conditional modular-safe formula: convert activeMemId, keyMemId, and maxMemTableCount to a signed type (e.g., int32), compute delta := int32(activeMemId) - int32(keyMemId); if delta < 0 { delta += int32(maxMemTableCount) }, then set overWriteRisk = uint32(delta) and compute overWriteRiskScore := float32(overWriteRisk) / float32(maxMemTableCount); leave the frequency and lastAccess score math unchanged in function CalculateRewriteScore.flashring/internal/maths/predictor.go-77-82 (1)
77-82:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGoroutine leak: no way to stop the background processor.
The goroutine started at line 77 reads from
hitRateChindefinitely. If thePredictoris discarded without closing the channel, this goroutine leaks. Consider adding aClose()method that closeshitRateCh, or use acontext.Contextfor cancellation.🔧 Proposed fix: Add Close method
type Predictor struct { Estimator *Estimator GridSearchEstimator *GridSearchEstimator ReWriteScoreThreshold float32 MaxMemTableCount uint32 freqBands FreqBands recencyBands RecencyBands hitRateCh chan float64 + closeCh chan struct{} } // ... in NewPredictor: + closeCh := make(chan struct{}) p := &Predictor{ // ... existing fields hitRateCh: make(chan float64, 1024), + closeCh: closeCh, } go func() { - for hitRate := range p.hitRateCh { - p.GridSearchEstimator.RecordHitRate(hitRate) + for { + select { + case hitRate := <-p.hitRateCh: + p.GridSearchEstimator.RecordHitRate(hitRate) + case <-closeCh: + return + } } }() +func (p *Predictor) Close() { + close(p.closeCh) +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/maths/predictor.go` around lines 77 - 82, The background goroutine reading from p.hitRateCh in Predictor leaks because there's no stop mechanism; add a stop API (either a Close() that safely closes p.hitRateCh using a sync.Once or add a context.Context + cancel stored on the Predictor) and change the goroutine to select on the channel and cancellation so it can exit: update the Predictor struct to hold either a closeOnce and closed flag or a ctx and cancel func, implement Predictor.Close() to trigger cancellation or close the channel, and ensure the goroutine that calls GridSearchEstimator.RecordHitRate(hitRate) returns when closed to avoid leaking.flashring/internal/server/resp.go-79-79 (1)
79-79:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSET command matching is too permissive.
The condition
len(cmd) >= 3allows commands likeSETT,SETX, orSETTINGSto match as SET. This should uselen(cmd) == 3for exact matching, consistent with the GET command check on line 67.🔧 Proposed fix
- case len(cmd) >= 3 && (cmd[0]|0x20) == 's' && (cmd[1]|0x20) == 'e' && (cmd[2]|0x20) == 't': + case len(cmd) == 3 && (cmd[0]|0x20) == 's' && (cmd[1]|0x20) == 'e' && (cmd[2]|0x20) == 't':🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/server/resp.go` at line 79, The SET branch currently uses len(cmd) >= 3 causing tokens like "SETT" or "SETTINGS" to match; change the condition in the switch/case that tests (cmd[0]|0x20)=='s' && (cmd[1]|0x20)=='e' && (cmd[2]|0x20)=='t' from len(cmd) >= 3 to len(cmd) == 3 so it only matches exactly "SET", mirroring how the GET case is implemented; update the case condition in resp.go (the SET command matcher) accordingly.flashring/internal/server/resp.go-104-105 (1)
104-105:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSilently ignoring
cache.Puterror returns OK to client on failure.When
cache.Putfails, the error is discarded and the client receives+OK, which is misleading. Consider returning an error response when the write fails.🔧 Proposed fix
- _ = cache.Put(key, value, ttl) - writeSimpleString(w, "OK") + if err := cache.Put(key, value, ttl); err != nil { + writeError(w, "cache write failed") + } else { + writeSimpleString(w, "OK") + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/server/resp.go` around lines 104 - 105, The code currently discards the error from cache.Put and always calls writeSimpleString(w, "OK"), misleading clients on failure; update the handler that calls cache.Put(key, value, ttl) to check the returned error and, on non-nil error, send an error RESP reply (e.g., via writeError / writeSimpleError) and do not call writeSimpleString("OK"); only return "OK" when cache.Put succeeds. Reference cache.Put and writeSimpleString in your change.flashring/internal/server/resp.go-34-37 (1)
34-37:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
net.Error.Temporary()is deprecated since Go 1.18 and should not be used.The
Temporary()method was deprecated because its meaning is ill-defined and its behavior is unreliable across different error types. This makes the retry logic unreliable—the condition may not behave as expected across different platforms and Go versions.Consider replacing with
net.Error.Timeout()to handle timeout-specific errors, or simply remove the retry logic if transient errors are not expected:🔧 Proposed fix: use Timeout() or remove retry
for { conn, err := ln.Accept() if err != nil { - if ne, ok := err.(net.Error); ok && ne.Temporary() { - time.Sleep(50 * time.Millisecond) - continue - } + if ne, ok := err.(net.Error); ok && ne.Timeout() { + time.Sleep(50 * time.Millisecond) + continue + } return err }Or remove retry entirely:
for { conn, err := ln.Accept() if err != nil { - if ne, ok := err.(net.Error); ok && ne.Temporary() { - time.Sleep(50 * time.Millisecond) - continue - } return err }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/server/resp.go` around lines 34 - 37, The retry branch that checks net.Error.Temporary() should be updated because Temporary() is deprecated; replace the condition "if ne, ok := err.(net.Error); ok && ne.Temporary() { time.Sleep(50 * time.Millisecond); continue }" with a timeout-specific check using Timeout(), i.e. "if ne, ok := err.(net.Error); ok && ne.Timeout() { time.Sleep(50 * time.Millisecond); continue }" (or remove this retry block entirely if transient timeouts are not expected); update any comments to reflect the change to net.Error.Timeout().flashring/internal/iouring/iouring_writer.go-15-21 (1)
15-21:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't silently ignore
flags.
NewIoUringWriteracceptsflagsbut never applies them to the underlying ring setup, so callers can think SQPOLL or other ring options are enabled when this wrapper always uses the same config. Either plumb the value through or remove it from the API.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/iouring/iouring_writer.go` around lines 15 - 21, NewIoUringWriter currently accepts a flags parameter but never applies it; either plumb flags through into the underlying writer creation or remove the unused parameter. Update NewIoUringWriter to set the flags on the BatchIoUringConfig passed to NewBatchIoUringWriter (e.g., add a Flags/Options field to BatchIoUringConfig and assign the flags param) so NewBatchIoUringWriter / the created IoUringWriter honors SQPOLL or other ring options, or if the underlying BatchIoUringWriter cannot support flags, remove the flags parameter from NewIoUringWriter signature and callers.flashring/internal/index/delete_manager.go-73-74 (1)
73-74:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate
TrimHead()failures instead of always asking for a retry.If
TrimHead()fails, the caller only sees"trim needed retry this write"and will keep retrying against unchanged file state.Suggested fix
- dm.wrapFile.TrimHead() + if err := dm.wrapFile.TrimHead(); err != nil { + return err + } return errors.New("trim needed retry this write")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/index/delete_manager.go` around lines 73 - 74, The current code calls dm.wrapFile.TrimHead() but always returns errors.New("trim needed retry this write"), losing TrimHead() failures; change it to capture the error from dm.wrapFile.TrimHead() and propagate it (or wrap it with context) when non-nil instead of returning the generic retry error—i.e., call err := dm.wrapFile.TrimHead(); if err != nil return fmt.Errorf("trim head failed: %w", err) (or return err) else keep the existing errors.New("trim needed retry this write") to signal a successful trim that still requires a retry.flashring/internal/fs/file_bench_test.go-12-39 (1)
12-39:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake
BenchmarkPwritescale its file size withb.N.With 4 KiB writes and a fixed 1 GiB file, this benchmark starts returning
ErrFileSizeExceededonce calibration pushesb.Npast roughly 262k iterations. At that point it stops benchmarking throughput and just fails.Suggested fix
config := FileConfig{ Filename: filename, - MaxFileSize: 1024 * 1024 * 1024, // 1GB + MaxFileSize: int64(b.N+1) * 4096, FilePunchHoleSize: 64 * 1024, BlockSize: 4096, }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/fs/file_bench_test.go` around lines 12 - 39, BenchmarkPwrite is failing once b.N grows because FileConfig.MaxFileSize is fixed to 1GiB; compute the required file size from the actual write size and b.N and set FileConfig.MaxFileSize accordingly before calling NewRollingAppendFile (e.g. requiredSize := int64(b.N) * int64(len(data)) and use a small safety margin), or otherwise cap b.N-based size to a minimum default; update the code that builds FileConfig (referencing FileConfig, MaxFileSize, NewRollingAppendFile, raf.Pwrite and ErrFileSizeExceeded) so the benchmark file size scales with b.N to avoid ErrFileSizeExceeded.flashring/internal/fs/rolling_appendonly_file.go-34-40 (1)
34-40:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClean up the write side if opening the read descriptor fails.
If
createReadFileDescriptorreturns an error, the already-opened write fd/file are leaked.Suggested fix
readFd, readFile, rDirectIO, err := createReadFileDescriptor(filename) if err != nil { + _ = writeFile.Close() + _ = syscall.Close(writeFd) return nil, err }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/fs/rolling_appendonly_file.go` around lines 34 - 40, The code opens the write side via createAppendOnlyWriteFileDescriptor (producing writeFd, writeFile, wDirectIO) and then opens the read side via createReadFileDescriptor; if the latter fails the write descriptors are leaked—update the error path in the function that performs these calls to close/cleanup writeFd and writeFile (and any other resources like wDirectIO if needed) before returning the error from createReadFileDescriptor, ensuring no resource leak when read creation fails.flashring/pkg/cache/badger.go-37-47 (1)
37-47:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMap unexpected read failures to misses, not hits.
Line 47 turns every
Viewfailure exceptErrKeyNotFoundintofound=true. A Badger read error would be reported as a cache hit with a nil/stale payload, which will skew callers and metrics.Suggested fix
+import "errors" + func (b *Badger) Get(key string) ([]byte, bool, bool) { var val []byte err := b.cache.View(func(txn *badger.Txn) error { item, err := txn.Get([]byte(key)) if err != nil { return err } val, err = item.ValueCopy(val) return err }) - return val, err != badger.ErrKeyNotFound, false + if err != nil { + if errors.Is(err, badger.ErrKeyNotFound) { + return nil, false, false + } + return nil, false, false + } + return val, true, false }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/pkg/cache/badger.go` around lines 37 - 47, Badger.Get currently treats any View error other than badger.ErrKeyNotFound as a hit; instead map unexpected read errors to misses. In the Get function (the b.cache.View call and its returned err), set the "found" boolean to true only when err == nil, and return found=false for any error (including unexpected Badger errors); preserve the existing handling for badger.ErrKeyNotFound by returning found=false as well. Update the final return from using err != badger.ErrKeyNotFound to checking err == nil so callers and metrics correctly see misses on read failures.flashring/internal/fs/wrap_file.go-68-90 (1)
68-90:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPartial write not handled in
Pwrite.
syscall.Pwritemay returnn < len(buf)on a successful partial write (e.g., due to signals or disk pressure). The current code advances offsets byn, but the caller receives no indication that data was partially written, potentially leaving the file in an inconsistent state.Consider either retrying until all bytes are written or returning an error on short writes.
Proposed fix to handle partial writes
func (r *WrapAppendFile) Pwrite(buf []byte) (currentPhysicalOffset int64, err error) { if r.WriteDirectIO { if !isAlignedBuffer(buf, r.blockSize) { return 0, ErrBufNoAlign } } var startTime = time.Now() - n, err := syscall.Pwrite(r.WriteFd, buf, r.PhysicalWriteOffset) + written := 0 + for written < len(buf) { + n, err := syscall.Pwrite(r.WriteFd, buf[written:], r.PhysicalWriteOffset+int64(written)) + if err != nil { + return 0, err + } + if n == 0 { + return 0, fmt.Errorf("pwrite returned 0 bytes") + } + written += n + } metrics.Timing(metrics.KEY_PWRITE_LATENCY, time.Since(startTime), []string{}) - if err != nil { - return 0, err - } - r.PhysicalWriteOffset += int64(n) + r.PhysicalWriteOffset += int64(written) if r.PhysicalWriteOffset >= r.MaxFileSize {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/fs/wrap_file.go` around lines 68 - 90, The Pwrite implementation in WrapAppendFile can return after a partial write (n < len(buf)) without signaling the caller; update WrapAppendFile.Pwrite to handle short writes by looping/retrying until the entire buf is written or a non-recoverable error occurs: call syscall.Pwrite repeatedly (advancing the buffer slice and PhysicalWriteOffset by the bytes written each iteration), update metrics/Timing appropriately, and only return success when total bytes == len(buf) (or return the underlying error if retries fail). Ensure you still perform the existing alignment check (isAlignedBuffer) and the wrapping logic (setting wrapped, resetting PhysicalWriteOffset, advancing LogicalCurrentOffset, incrementing Stat.WriteCount) only once per total successful write, and surface errors immediately when syscall.Pwrite returns an error.flashring/internal/memtables/memtable.go-75-84 (1)
75-84:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPotential truncation when buffer length exceeds
uint16max.
Putcastslen(buf)touint16on line 83, but Go slices can exceed 65535 bytes. If a caller passes a buffer larger than 65535 bytes, the returnedlengthwill be silently truncated, causing data corruption when the value is later used to read back the data.Consider whether this is an intentional constraint (document/enforce it) or whether
lengthshould be a larger type.Proposed fix if constraint is intentional
func (m *Memtable) Put(buf []byte) (offset int, length uint16, readyForFlush bool) { + if len(buf) > math.MaxUint16 { + // Reject oversized buffers or handle appropriately + return -1, 0, false + } offset = m.currentOffset if offset+len(buf) > m.capacity {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/memtables/memtable.go` around lines 75 - 84, The Put method on Memtable silently truncates len(buf) to uint16 which can corrupt data; update the API and logic: either enforce/document a max-value size and check if len(buf) > math.MaxUint16 (set readyForFlush/return error) or change the returned length type and its callers from uint16 to a larger integer (e.g., int or uint32) so you return the full len(buf); locate Memtable.Put, its return signature (offset int, length uint16, readyForFlush bool), uses of length in currentOffset/page.Buf, and update all call sites to match the chosen approach.flashring/internal/memtables/memtable_bench_test.go-28-43 (1)
28-43:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHardcoded absolute path will fail on other systems.
The path
/media/a0d00kc/freedom/tmp/bench_memtable.datis developer-specific and will cause benchmark failures on CI/CD environments or other developer machines.🐛 Proposed fix using temp directory
func createBenchmarkFile(b *testing.B) *fs.WrapAppendFile { - filename := filepath.Join("/media/a0d00kc/freedom/tmp/bench_memtable.dat") + tmpDir := b.TempDir() + filename := filepath.Join(tmpDir, "bench_memtable.dat") config := fs.FileConfig{ Filename: filename,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/memtables/memtable_bench_test.go` around lines 28 - 43, The benchmark uses a hardcoded absolute path in createBenchmarkFile which will break on other machines/CI; change the filename to use a temporary directory provided by the test (use b.TempDir() and join "bench_memtable.dat") so createBenchmarkFile constructs filename := filepath.Join(b.TempDir(), "bench_memtable.dat"); keep the existing fs.FileConfig and fs.NewWrapAppendFile usage so only the filename construction is modified.
🟡 Minor comments (14)
flashring/pkg/cache/freecache.go-20-23 (1)
20-23:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPropagate or log errors from
Set.
freecache.Cache.Setcan return errors (e.g., when the value exceeds the maximum size). Silently ignoring this error may hide data loss issues from callers.Proposed fix
func (c *Freecache) Put(key string, value []byte, ttl time.Duration) error { - c.cache.Set([]byte(key), value, int(ttl.Seconds())) - return nil + return c.cache.Set([]byte(key), value, int(ttl.Seconds())) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/pkg/cache/freecache.go` around lines 20 - 23, The Put method currently ignores the error returned by c.cache.Set; update Freecache.Put to capture the returned error from c.cache.Set([]byte(key), value, int(ttl.Seconds())), and either return that error directly (since Put already has an error return) or wrap it with context (e.g., using fmt.Errorf) before returning so callers are notified when Set fails (e.g., value too large).flashring/README.md-86-89 (1)
86-89:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd language specifier and verify package reference.
The code block at line 86 is missing a language specifier. Additionally, line 89 references
pkg: github.com/Meesho/BharatMLStack/ssd-cachebut the module is nowflashring. Consider updating the benchmark output to reflect the current module name, or note that this is historical output.Proposed fix for language specifier
-``` +```text goos: linux🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/README.md` around lines 86 - 89, Add a language specifier to the fenced code block containing the benchmark metadata (the block starting with "goos: linux" / "goarch: amd64")—use "text" to prevent syntax highlighting—and update the pkg line that currently reads "pkg: github.com/Meesho/BharatMLStack/ssd-cache" to the current module name (e.g., replace with "pkg: github.com/Meesho/flashring" or add a note that this is historical output) so the README's benchmark output reflects the project's current module.flashring/internal/fs/fs.go-118-129 (1)
118-129:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winNo fallback if
O_DIRECTfails for read file descriptor.Unlike
createAppendOnlyWriteFileDescriptorandcreatePreAllocatedWriteFileDescriptorwhich useopenWithDirectIOfor graceful fallback, this function directly opens withO_DIRECTand fails if unsupported. This inconsistency may cause read operations to fail on filesystems that don't support direct I/O.🔧 Proposed fix
func createReadFileDescriptor(filename string) (int, *os.File, bool, error) { - flags := O_DIRECT | O_RDONLY - fd, err := syscall.Open(filename, flags, 0) + fd, directIO, err := openWithDirectIO(filename, O_RDONLY) if err != nil { return 0, nil, false, err } file, err := fdToFile(fd, filename) if err != nil { + syscall.Close(fd) return 0, nil, false, err } - return fd, file, true, nil + return fd, file, directIO, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/fs/fs.go` around lines 118 - 129, createReadFileDescriptor currently opens with O_DIRECT unconditionally and fails on filesystems that don't support direct I/O; change it to use the same graceful openWithDirectIO pattern as the write helpers. Call openWithDirectIO(filename, O_DIRECT|O_RDONLY, 0) (referencing O_DIRECT, O_RDONLY and openWithDirectIO), capture the fd and a boolean indicating whether direct IO was used, fall back to a regular syscall.Open without O_DIRECT if openWithDirectIO reports unsupported or returns an error due to direct IO, then convert the fd to *os.File with fdToFile and return the fd, file, the actual direct-IO-used bool, and any error (preserving existing error propagation semantics).flashring/pkg/metrics/metric.go-207-209 (1)
207-209:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPotential panic in
GetShardTagifshardIdxexceeds initialized shard count.If
BuildShardTagswas never called orshardIdx >= len(shardTags), this will panic with an index out of range error. Consider adding bounds validation or documenting the precondition.🛡️ Proposed defensive fix
func GetShardTag(shardIdx uint32) []string { + if int(shardIdx) >= len(shardTags) { + return BuildTag(NewTag(TAG_SHARD_IDX, strconv.Itoa(int(shardIdx)))) + } return shardTags[shardIdx : shardIdx+1] }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/pkg/metrics/metric.go` around lines 207 - 209, GetShardTag currently slices shardTags without validation and can panic if BuildShardTags wasn't called or shardIdx >= len(shardTags); update GetShardTag to validate bounds by comparing shardIdx to uint32(len(shardTags)) and return a safe empty slice (or nil) when out of range instead of slicing, or optionally log/propagate an error; reference the GetShardTag function and the shardTags variable and ensure callers are compatible with the chosen safe return value.flashring/internal/memtables/manager_bench_test.go-12-27 (1)
12-27:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHardcoded absolute path will fail on other machines.
The path
/media/a0d00kc/freedom/tmp/is a user-specific local path. This benchmark will fail on CI or other developer machines. Useos.TempDir()orb.TempDir()instead.🔧 Proposed fix
func createManagerBenchmarkFile(b *testing.B) *fs.WrapAppendFile { - filename := fmt.Sprintf("/media/a0d00kc/freedom/tmp/bench_memtable_%d.dat", time.Now().UnixNano()) + filename := fmt.Sprintf("%s/bench_memtable_%d.dat", b.TempDir(), time.Now().UnixNano())🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/memtables/manager_bench_test.go` around lines 12 - 27, The test helper createManagerBenchmarkFile currently hardcodes a user-specific absolute path in filename which will fail on other machines; change it to use a temporary directory (preferably b.TempDir() for per-benchmark isolation or os.TempDir() if b not available) when constructing filename so the file is created in a portable temp location, update the fs.FileConfig.Filename assignment accordingly (refer to createManagerBenchmarkFile and the filename variable) and ensure error handling remains the same.flashring/internal/iouring/iouring.go-520-542 (1)
520-542:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMetrics timing measures cumulative time, not per-operation latency.
startTimeis captured once before the completion loop, butmetrics.Timingis called for each CQE. This means:
- First CQE reports time for ~1 completion
- Last CQE reports time for all N completions
This produces misleading latency metrics. Consider either measuring the full batch latency once after the loop, or capturing per-operation start times.
🔧 Proposed fix: Report batch latency once
var startTime = time.Now() // Drain all CQEs (order may differ from submission) results := make([]int, n) for i := 0; i < n; i++ { cqe, err := r.waitCqe() if err != nil { return results, fmt.Errorf("io_uring waitCqe: %w", err) } idx := int(cqe.UserData) res := cqe.Res r.seenCqe() if res < 0 { return results, fmt.Errorf("io_uring pwrite errno %d (%s), fd=%d off=%d len=%d", -res, syscall.Errno(-res), fd, offsets[idx], len(bufs[idx])) } if idx >= 0 && idx < n { results[idx] = int(res) } - - metrics.Timing(metrics.KEY_PWRITE_LATENCY, time.Since(startTime), []string{}) } + metrics.Timing(metrics.KEY_PWRITE_LATENCY, time.Since(startTime), []string{}) return results, nil🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/iouring/iouring.go` around lines 520 - 542, The current code captures startTime once, then calls metrics.Timing inside the CQE loop, causing cumulative latency to be reported per completion; move the metrics.Timing call out of the loop so you record batch latency once after draining CQEs: keep startTime as currently set before the loop, remove the per-iteration metrics.Timing(...) invocation inside the for loop that processes waitCqe()/seenCqe(), and add a single metrics.Timing(metrics.KEY_PWRITE_LATENCY, time.Since(startTime), []string{}) immediately after the loop completes (ensuring it runs on success and/or on error paths as appropriate).flashring/main.go-44-45 (1)
44-45:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winBuffer is not actually page-aligned.
The comment says "Align buffer to page boundary" but
make([]byte, bufferSize)does not guarantee alignment. The Go runtime may allocate at any address.🔧 Proposed fix using unsafe pointer arithmetic
- // Align buffer to page boundary - buffer := make([]byte, bufferSize) + // Allocate extra space to allow page alignment + raw := make([]byte, bufferSize+PageSize4K) + addr := uintptr(unsafe.Pointer(&raw[0])) + offset := (PageSize4K - int(addr%PageSize4K)) % PageSize4K + buffer := raw[offset : offset+bufferSize]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/main.go` around lines 44 - 45, The comment is wrong: make([]byte, bufferSize) does not guarantee page alignment. Fix the allocation used for buffer (the variable named buffer created with bufferSize) by allocating a slightly larger byte slice (bufferSize + pageSize) and then computing a page-aligned start offset (using the runtime page size) to take a subslice that begins on a page boundary, or alternatively replace the Go heap allocation with a page-aligned mmap (syscall.Mmap) and use that region; update uses of buffer throughout to reference the aligned subslice or mmap region accordingly (use the buffer and bufferSize identifiers to locate and replace the allocation).flashring/main.go-201-208 (1)
201-208:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing build constraint for Linux-only code.
syscall.SYS_MSYNCand the direct syscall approach are Linux-specific. This file has no build tags, so it will fail to compile on other platforms (Windows, macOS).🔧 Add build constraint at the top of the file
+//go:build linux +// +build linux + package main🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/main.go` around lines 201 - 208, The Sync method uses Linux-specific syscall.SYS_MSYNC and must be limited to Linux builds; move or isolate this implementation into a Linux-only file (e.g., create a file containing the MemoryMappedWriter.Sync implementation) and add a build constraint like a //go:build linux (and corresponding +build line for older toolchains) at the top so non-Linux platforms don't try to compile syscall.SYS_MSYNC; ensure the symbol MemoryMappedWriter.Sync and syscall.SYS_MSYNC are implemented only in that Linux-constrained file and provide a fallback/no-op or alternative implementation for other platforms if needed.flashring/internal/maths/estimator.go-132-151 (1)
132-151:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPremature convergence detection in grid refinement.
The convergence check at lines 142-144 compares each candidate point's distance from the base against
epsilon. However, this distance equals|i| * deltaor|j| * delta, notdeltaitself.For
i=-1, j=-1: the distance isdelta, so ifdelta < epsilon, it returnsfalseimmediately—which is correct. But the conditionmath.Abs(wf-base.WFreq) < g.epsilon && math.Abs(la-base.WLA) < g.epsilonchecks if both distances are less than epsilon simultaneously, which only happens at the center point that's already skipped.This logic may not behave as intended. Consider checking
delta < epsilononce before the loop instead.🔧 Proposed fix: Check delta once before loop
func (g *GridSearchEstimator) GenerateRefinedGrid(base WeightTuple, steps int, delta float64) ([]WeightTuple, bool) { + // If delta is smaller than epsilon, we've converged + if delta < g.epsilon { + return nil, false + } refined := make([]WeightTuple, 0, (2*steps+1)*(2*steps+1)) for i := -steps; i <= steps; i++ { for j := -steps; j <= steps; j++ { if i == 0 && j == 0 { continue } wf := base.WFreq + float64(i)*delta la := base.WLA + float64(j)*delta - if math.Abs(wf-base.WFreq) < g.epsilon && math.Abs(la-base.WLA) < g.epsilon { - return refined, false - } if wf > 0 && la > 0 { refined = append(refined, WeightTuple{wf, la}) } } } return refined, true }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/maths/estimator.go` around lines 132 - 151, The convergence check inside GenerateRefinedGrid is wrong: remove the per-candidate check "math.Abs(wf-base.WFreq) < g.epsilon && math.Abs(la-base.WLA) < g.epsilon" and instead perform a single check at the start of GenerateRefinedGrid that returns (nil/empty, false) when delta < g.epsilon (or math.Abs(delta) < g.epsilon) to short-circuit refinement; keep the rest of the loop that skips the center (i==0 && j==0) and appends only positive wf/la WeightTuple values so behavior of GenerateRefinedGrid and returned refined slice stays correct.flashring/internal/memtables/manager_bench_test.go-51-53 (1)
51-53:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMetric calculation is incorrect for "MB/s".
The reported metric divides total bytes by 1024² to get MB, but does not divide by time—so it's reporting total MB, not throughput in MB/s. Use
b.SetBytes()for automatic throughput calculation, or manually compute the rate.🔧 Proposed fix using b.SetBytes
+ b.SetBytes(16 * 1024) // Let Go's benchmark framework compute throughput b.ResetTimer() for i := 0; i < b.N; i++ { // ... } - // b.ReportMetric(float64(manager.stats.Flushes), "flushes") - b.ReportMetric(float64(b.N*16*1024)/1024/1024, "MB/s") b.ReportAllocs()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/memtables/manager_bench_test.go` around lines 51 - 53, The benchmark is reporting total MB instead of throughput because b.ReportMetric(float64(b.N*16*1024)/1024/1024, "MB/s") does not divide by elapsed time; replace this with a proper bytes-per-op declaration using b.SetBytes(int64(16*1024)) (or the total bytes per operation value) so Go's testing framework can compute MB/s automatically, and remove the incorrect b.ReportMetric call (keep b.ReportAllocs()); alternatively, compute throughput manually by dividing total bytes (b.N * 16*1024) by b.Result().NsPerOp()/1e9 to produce MB/s if you prefer an explicit rate.flashring/internal/fs/README.md-85-89 (1)
85-89:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPackage path in benchmark output doesn't match current location.
The raw benchmark output references
github.com/Meesho/BharatMLStack/ssd-cache/internal/memtable, but this README is located inflashring/internal/fs/. This appears to be a remnant from a previous package structure or a copy-paste from another location.Consider updating the benchmark output to reflect the current package path, or add a note clarifying the source.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/fs/README.md` around lines 85 - 89, The README snippet contains stale benchmark output referencing github.com/Meesho/BharatMLStack/ssd-cache/internal/memtable; update that string in flashring/internal/fs/README.md to the current package path (e.g., flashring/internal/fs or the module's real import path) or add a brief note above the output clarifying it is from a different package copy; locate the block showing "go test -benchmem ..." and replace the package path text or append a one-line explanation so the README accurately reflects the current location.flashring/internal/fs/rolling_appendonly_file_test.go-491-502 (1)
491-502:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCleanup helper has same nil-safety concern as wrap_file_test.go.
Same issue as the other test file - consider adding nil guard for
rafparameter.Proposed fix
func cleanup(raf *RollingAppendFile) { + if raf == nil { + return + } if raf.WriteFile != nil { raf.WriteFile.Close() + os.Remove(raf.WriteFile.Name()) } if raf.ReadFile != nil { raf.ReadFile.Close() } - if raf.WriteFile != nil { - os.Remove(raf.WriteFile.Name()) - } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/fs/rolling_appendonly_file_test.go` around lines 491 - 502, The cleanup helper can panic if raf is nil; add a nil guard at the start of cleanup to return immediately when raf == nil, then keep the existing nil checks before closing ReadFile/WriteFile and removing the write file; reference the cleanup function and the RollingAppendFile fields (WriteFile, ReadFile) to locate and update the helper to be nil-safe.flashring/internal/fs/wrap_file_test.go-829-840 (1)
829-840:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCleanup helper doesn't check for nil before calling methods.
If
waf.WriteFileorwaf.ReadFileis nil (e.g., partial initialization failure), callingClose()orName()will panic. Consider adding nil checks.Proposed fix
func cleanupWrapFile(waf *WrapAppendFile) { + if waf == nil { + return + } if waf.WriteFile != nil { waf.WriteFile.Close() + os.Remove(waf.WriteFile.Name()) } if waf.ReadFile != nil { waf.ReadFile.Close() } - if waf.WriteFile != nil { - os.Remove(waf.WriteFile.Name()) - } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/internal/fs/wrap_file_test.go` around lines 829 - 840, The cleanupWrapFile helper should guard against nil file pointers to avoid panics: in the function cleanupWrapFile (operating on *WrapAppendFile), check that waf.WriteFile != nil before calling Close() and before calling waf.WriteFile.Name() for os.Remove, and check waf.ReadFile != nil before calling Close(); only invoke Close() or Name()/os.Remove when the respective pointer is non-nil so partial initialization won't panic.flashring/pkg/cache/cache.go-405-413 (1)
405-413:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFailed coalesced read submissions are silently ignored.
When
SubmitCoalescedReadAsyncfails (line 409), the loop continues without logging or marking affected keys as failed. This could make debugging difficult when MGet returns incomplete results.🔧 Proposed fix to log failures
for g := range groups { size := int(groups[g].alignedEnd - groups[g].alignedStart) pr, err := wc.shards[groups[g].shardIdx].SubmitCoalescedReadAsync( groups[g].alignedStart, size) if err != nil { + log.Debug().Err(err). + Int64("offset", groups[g].alignedStart). + Int("size", size). + Msg("MGet coalesced read submission failed") continue } groups[g].pending = pr }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashring/pkg/cache/cache.go` around lines 405 - 413, The loop currently swallows errors from wc.shards[...].SubmitCoalescedReadAsync, so update the loop to (1) log the error when SubmitCoalescedReadAsync returns err (use the cache's existing logger instance, e.g., wc.logger or wc.log) including the shard index and alignedStart/size for context, and (2) mark the affected group so MGet can treat it as failed (for example set groups[g].pending to a failed/errored future or set a groups[g].failed flag and store the error on the group struct) instead of silently continuing; change the block around SubmitCoalescedReadAsync in the loop that references groups[g], wc.shards and groups[g].pending accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 370ca5a4-cdc9-4273-b7ae-cf3c581f9b1e
⛔ Files ignored due to path filters (2)
flashring/go.sumis excluded by!**/*.sumflashring/internal/fs/profile.pngis excluded by!**/*.png
📒 Files selected for processing (54)
.gitignoreflashring/README.mdflashring/cmd/flashringtest/main.goflashring/cmd/flashringtest/mem.profflashring/cmd/flashringtest/plan_badger.goflashring/cmd/flashringtest/plan_freecache.goflashring/cmd/flashringtest/plan_random_gausian.goflashring/cmd/flashringtest/plan_readthrough_gausian.goflashring/cmd/flashringtest/plan_readthrough_gausian_batched.goflashring/go.modflashring/internal/allocators/allocators.goflashring/internal/allocators/slab_aligned_page_allocator.goflashring/internal/allocators/slab_aligned_page_allocator_test.goflashring/internal/fs/README.mdflashring/internal/fs/aligned_page.goflashring/internal/fs/file_bench_test.goflashring/internal/fs/fs.goflashring/internal/fs/rolling_appendonly_file.goflashring/internal/fs/rolling_appendonly_file_test.goflashring/internal/fs/wrap_file.goflashring/internal/fs/wrap_file_test.goflashring/internal/index/constant.goflashring/internal/index/delete_manager.goflashring/internal/index/encoder.goflashring/internal/index/index.goflashring/internal/index/ringbuffer.goflashring/internal/index/system.goflashring/internal/iouring/batch_writer.goflashring/internal/iouring/iouring.goflashring/internal/iouring/iouring_reader.goflashring/internal/iouring/iouring_test.goflashring/internal/iouring/iouring_writer.goflashring/internal/maths/estimator.goflashring/internal/maths/freq.goflashring/internal/maths/freq_test.goflashring/internal/maths/predictor.goflashring/internal/maths/predictor_test.goflashring/internal/memtables/manager.goflashring/internal/memtables/manager_bench_test.goflashring/internal/memtables/manager_test.goflashring/internal/memtables/memtable.goflashring/internal/memtables/memtable_bench_test.goflashring/internal/memtables/memtable_test.goflashring/internal/pools/leaky_pool.goflashring/internal/pools/pool.goflashring/internal/server/resp.goflashring/internal/shard/shard_cache.goflashring/main.goflashring/pkg/cache/badger.goflashring/pkg/cache/cache.goflashring/pkg/cache/freecache.goflashring/pkg/metrics/metric.goflashring/pkg/metrics/tag.goflashring/prep_ssd.sh
🔁 Pull Request Template – BharatMLStack
Context:
Give a brief overview of the motivation behind this change. Include any relevant discussion links (Slack, documents, tickets, etc.) that help reviewers understand the background and the issue being addressed.
Describe your changes:
Mention the changes made in the codebase.
Testing:
Please describe how you tested the code. If manual tests were performed - please explain how. If automatic tests were added or existing ones cover the change - please explain how did you run them.
Monitoring:
Explain how this change will be tracked after deployment. Indicate whether current dashboards, alerts, and logs are enough, or if additional instrumentation is required.
Rollback plan
Explain rollback plan in case of issues.
Checklist before requesting a review
📂 Modules Affected
horizon(Real-time systems / networking)online-feature-store(Feature serving infra)trufflebox-ui(Admin panel / UI)infra(Docker, CI/CD, GCP/AWS setup)docs(Documentation updates)___________✅ Type of Change
___________📊 Benchmark / Metrics (if applicable)
Summary by CodeRabbit
Release Notes
New Features
Documentation
Chores