Motivation
The current cache eviction mechanism relies primarily on traditional recency-based strategies. While effective in many scenarios, these approaches do not consider the actual value of cached entries in terms of retrieval cost and access patterns.
In Retrieval-Augmented Generation (RAG) systems, some cached entries are significantly more expensive to regenerate than others. Evicting these entries too early can increase latency and reduce overall cache efficiency.
Proposed Enhancement
Implement an adaptive cost-aware eviction policy that combines:
Access frequency
Recency of access
Retrieval/generation cost
Chunk size
Each cache entry will receive a dynamic score:
score = α * recency
+ β * frequency
+ γ * retrieval_cost
- δ * size_penalty
The eviction mechanism will remove entries with the lowest overall score.
Expected Benefits
Higher cache hit rate
Lower average retrieval latency
Better utilization of limited cache capacity
Improved throughput under realistic workloads
Evaluation Plan
Compare the proposed policy against the current baseline using:
-Cache hit rate
-Mean latency
-P95 latency
-Throughput
-Memory utilization
Motivation
The current cache eviction mechanism relies primarily on traditional recency-based strategies. While effective in many scenarios, these approaches do not consider the actual value of cached entries in terms of retrieval cost and access patterns.
In Retrieval-Augmented Generation (RAG) systems, some cached entries are significantly more expensive to regenerate than others. Evicting these entries too early can increase latency and reduce overall cache efficiency.
Proposed Enhancement
Implement an adaptive cost-aware eviction policy that combines:
Access frequency
Recency of access
Retrieval/generation cost
Chunk size
Each cache entry will receive a dynamic score:
score = α * recency
+ β * frequency
+ γ * retrieval_cost
- δ * size_penalty
The eviction mechanism will remove entries with the lowest overall score.
Expected Benefits
Higher cache hit rate
Lower average retrieval latency
Better utilization of limited cache capacity
Improved throughput under realistic workloads
Evaluation Plan
Compare the proposed policy against the current baseline using:
-Cache hit rate
-Mean latency
-P95 latency
-Throughput
-Memory utilization