refactor(sunjx): refactor loss-filter implementation by Jiaxuan-Sun · Pull Request #17 · opendilab/LightRFT

Jiaxuan-Sun · 2026-01-01T17:00:57Z

Add new lightrft/trainer/filter_weight/ module with:

metrics.py - Metrics computation layer (entropy, difficulty, staleness, etc.)
filters.py - Sample filtering layer (length, reward value, entropy, difficulty filters, etc.)
weights.py - Loss weighting layer (length, entropy, difficulty, staleness weightings, etc.)
manager.py - Unified management layer (FilterWeightManager)

Note: The dynamic sampling feature has been tested. Other components are reserved for future extension.

…eighting

puyuan1996 · 2026-01-20T04:58:27Z

+                ret = {}
+                for k in all_keys:
+                    ret[k] = self.all_reduce(data.get(k, 0.0), op)
+                return ret


Why was this added? Does it cause an error without it?

This is to prevent deadlock in distributed all-reduce operations.
After dynamic sampling, the set of keys in the status dictionary may differ across ranks (some ranks have keys like kl and ptx_loss, while others do not). The all_reduce(dict) operation calls dist.all_reduce for each key individually. If the keys or their order differ between ranks, the collective operations will be inconsistent, causing the process to hang.

这个改动确实有效防止了 NCCL deadlock，但是，这里的实现有两个比较严重的隐患：

数学逻辑问题：如果 op="mean"，对于缺失 key 的 Rank 默认补 0.0，这会把 0.0 计入分子，并除以 world_size。这会严重拉低该指标的真实均值（比如只有一张卡有 KL=1.0，4卡平均后变成了 0.25）。
架构与性能问题：all_reduce 作为底层通信原语，内部高频调用 all_gather_object (依赖 pickle) 会带来性能损耗；而且在底层强行补 0.0 掩盖了上游状态不对齐的问题。

建议的修改方向：

最好不要在底层 all_reduce 中做 key 的对齐。我们应该在调用 all_reduce 之前的业务层（比如 metrics logging 处），显式地初始化所有可能的 keys。对于被 filter 掉的 rank，可以传 0.0 并配合一个 valid_count 掩码，最后用 sum(values) / sum(valid_counts) 来计算准确的 mean。

puyuan1996 · 2026-01-20T05:09:10Z

+                # If no valid actions or base log-probs are empty, skip KL safely.
+                if ((experience.action_mask is not None and experience.action_mask.sum().item() == 0)
+                        or (base_action_log_probs is not None and base_action_log_probs.numel() == 0)):
+                    kl = torch.zeros_like(


Have these null-check branches actually been hit during testing? If it's null, we should probably just throw an error directly.

Yes, an error occurred where a dimension mismatch was reported due to an action_mask value of 0 or baseline logprobs being empty (entering compute_approx_kl when base_action_log_probs was empty), indicating that these branches are actually triggered in such dynamic sampling and filtering scenarios.

If that's the case, we should probably figure out a way to avoid such issues during the upstream filter/weight stages (e.g., by filtering out these invalid batches early on), rather than just forcing the KL to 0 here. Setting it to 0 is more of a workaround and might mask underlying issues with the data flow or sampling logic.

如果是这样的话，那我们最好想办法在前置的 filter 或 weight 阶段就规避掉这类问题（比如提前把这些无效数据的 batch 过滤掉），而不是在这里强行让 KL 为 0。因为在这里直接置为 0 治标不治本，反而可能会掩盖潜在的数据流转或采样逻辑问题。

Jiaxuan-Sun · 2026-05-30T16:57:41Z

Need to be tested.

Jiaxuan-Sun added 2 commits December 31, 2025 20:07

refactor(sunjx): refactor loss-filter for sample filtering and loss w…

0ebcdbf

…eighting

refactor(sunjx): refactor loss-filter implementation

11e81ac

puyuan1996 requested changes Jan 4, 2026

View reviewed changes

Comment thread lightrft/trainer/filter_weight/__init__.py Outdated

Comment thread lightrft/trainer/filter_weight/filters.py

Comment thread lightrft/trainer/filter_weight/__init__.py Outdated

puyuan1996 reviewed Jan 4, 2026

View reviewed changes

Comment thread lightrft/trainer/filter_weight/__init__.py Outdated

puyuan1996 added enhancement New feature or request refactor Cleanup, formatting, or restructuring of existing code. labels Jan 4, 2026

puyuan1996 mentioned this pull request Jan 5, 2026

Roadmap for LightRFT v0.1.1 #19

Closed

Jiaxuan-Sun added 7 commits January 8, 2026 15:56

refactor(sunjx): Unify the comment style

008c90a

Merge remote-tracking branch 'opendilab/main' into refactor/loss-filter

ab61fef

refactor(sunjx): fix format/fcheck bugs

4d04e1d

feature(sunjx): fix dynamic_sampling bugs

a43ae21

Merge branch 'main' into refactor/loss-filter

d0346d0

refactor(sunjx): pass formt and fcheck

7d8dea4

refactor(sunjx): pass format and fcheck check

a659c00

puyuan1996 requested changes Jan 20, 2026

View reviewed changes

refactor(sunjx): Organize the code

97f8a92

puyuan1996 mentioned this pull request Jan 21, 2026

Roadmap for LightRFT v0.1.2 #28

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(sunjx): refactor loss-filter implementation#17

refactor(sunjx): refactor loss-filter implementation#17
Jiaxuan-Sun wants to merge 10 commits into
opendilab:mainfrom
Jiaxuan-Sun:refactor/loss-filter

Jiaxuan-Sun commented Jan 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puyuan1996 Jan 20, 2026

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Uh oh!

puyuan1996 Mar 18, 2026

Uh oh!

Uh oh!

puyuan1996 Jan 20, 2026

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Uh oh!

puyuan1996 Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jiaxuan-Sun commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jiaxuan-Sun commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puyuan1996 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

puyuan1996 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jiaxuan-Sun commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jiaxuan-Sun commented Jan 1, 2026 •

edited

Loading