Skip to content

Optimize the implementation of ReplayBuffer#1512

Open
YanhuiDua wants to merge 1 commit intoInternLM:rl_designfrom
YanhuiDua:dev/optim_buffer
Open

Optimize the implementation of ReplayBuffer#1512
YanhuiDua wants to merge 1 commit intoInternLM:rl_designfrom
YanhuiDua:dev/optim_buffer

Conversation

@YanhuiDua
Copy link
Collaborator

  1. Use StorageItem and QueryItem to replace StorageIndices and add DSLRule for QueryItem
  2. Split ReplayBuffer to StorageBackend and ReplayPolicy
  3. Add SyncReplayBufferConfig and AsyncReplayBufferConfig

…LRule for QueryItem; 2. split ReplayBuffer to StorageBackend and ReplayPolicy

async def get(self, batch_size: int, task_name: str, group_status: Status, **kwargs) -> list[list[RolloutState]]:
indices = StorageIndices(task_name=task_name, group_status=group_status, tags=kwargs)
async def get(self, batch_size: int, task_name: str, group_status: Status) -> list[list[RolloutState]]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReplayBuffer的get接口我想目前还是使用task_name和group_status,等后续有了其他的查询需求,再使用QueryItem

from xtuner.v1.utils.logger import get_logger


DSLOp: TypeAlias = Literal["$eq", "$ne", "$gt", "$gte", "$lt", "$lte", "$in", "$not_in", "$between"]
Copy link
Collaborator

@jayhenry jayhenry Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以参考 https://www.notion.so/AST-Spike-16c9d117bcaf80cc8c28dff1a2853910?source=copy_link 的设计做些调整:

  1. 将支持的Op定义成更完备的类,参考上面“抽象语法树”一节中的 Ast 节点类
  2. 不同后端做具体查询的过程,应该各不相同,不能复用下面DSLRue.match的逻辑。这个过程应该参考上面“语法制导翻译” 一节的 NodeVisitor,为每种存储后端定义各自的NodeVisitor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants