refactor(nyz): audio language model RL pipeline by PaParaZz1 · Pull Request #58 · opendilab/LightRFT

PaParaZz1 · 2026-04-14T10:43:31Z

sglang pipeline
vllm pipeline
update doc
Qwen2-Audio/Qwen2.5-Omni adaption

puyuan1996 · 2026-04-24T09:27:47Z


    # Build question template (matches R1-AQA)
    choice_str = f"Please choose the answer from the following options: {multi_choice}."
+    # There should be a space between <answer> and </answer>


这个对结果影响很大吗？注释里面讲加不加空格对性能的影响是否更好呢

对 qwen2-audio 的初始收敛速度影响很大，should be 就是说明要加这个空格的意思

puyuan1996 · 2026-04-24T09:29:34Z

-    parser = argparse.ArgumentParser(
-        description="Preprocess AVQA dataset (R1-AQA format) for LightRFT training"
-    )
+    parser = argparse.ArgumentParser(description="Preprocess AVQA dataset (R1-AQA format) for LightRFT training")


是否可以加下AVQA的链接呢

这个文件一开头的注释有 AVQA 的链接

puyuan1996 · 2026-04-24T09:31:14Z

+    Each item returns ``(prompt_text, audio_payload, reference, label)`` where:
+    - ``prompt_text`` is rendered through the Qwen2-Audio chat template
+    - ``audio_payload`` is kept as raw waveform + sampling rate for rollout-side processing
+    - ``reference`` and ``label`` are passed through to reward computation


为什么删掉:param dataset:这些呢

puyuan1996 · 2026-04-24T09:32:17Z

-        all_videos: Optional[List] = None,
-        videos_num: Optional[List[int]] = None,
-    ) -> EasyDict:
+    @staticmethod


AudioMultimodalProcessor是没有用到了吗

对，重构删掉了

puyuan1996 · 2026-04-24T09:32:56Z

-        "Output the final answer in <answer></answer>."
-    )
+    question_template = (f"{obj_dict['question']} {choice_str} "
+                         "Output the final answer in <answer></answer>.")


这里的中间没有空格？

puyuan1996 · 2026-04-24T09:35:43Z

+        log_probs: Optional[torch.Tensor] = None,
+        old_log_probs: Optional[torch.Tensor] = None,
+        ratio: Optional[torch.Tensor] = None,
+    ) -> None:


增加下docstring?

puyuan1996 · 2026-04-24T09:40:18Z

-                            experience.sequences[0].unsqueeze(0), skip_special_tokens=True
-                        )
-                        self.strategy.print("collect phase: experience.sequences w skip_special_tokens: ", output)
-                        self.strategy.print(


启动脚本中加个debug的option 如果打开就print这里的信息吧？方便debug分析

这不是这个 PR 的功能，应该在其他 polish PR 弄

puyuan1996 · 2026-04-24T09:41:50Z

        else:
            sequences = experience.sequences
-
-            pixel_values = experience.pixel_values


为什么 pixel_values相关都删除了呀

是放在 _build_model_kwargs 方法里统一实现了，这样对各种模态都能处理

puyuan1996 · 2026-04-24T09:42:19Z

-
-                # [Protection measure 2] Per-token KL Clamping
-                # NOTE: Adding this causes svkng training to not converge
-                # kl = torch.clamp(kl, min=0.0, max=20.0)


这个保留吧？之前发现的关键信息

加回去了

puyuan1996 · 2026-04-24T09:43:32Z

-                    # Use wandb_log_counter to ensure eval has a unique system step
-                    # This prevents eval metrics from being overwritten by train metrics
-                    # The plots will still use eval/global_step as X-axis due to define_metric
                    self.wandb_log_counter += 1


关键的注释保留吧？

加回去了

puyuan1996 · 2026-04-24T10:16:42Z

+Audio RL now uses a dedicated rollout path in core LightRFT code:
+- raw audio payloads stay on the generation side and are passed to SGLang as `audio_data`
+- processed mel features are stored explicitly as `audio_values`
+- Qwen2-Audio feature masking is stored explicitly as `feature_attention_mask`


是否可以增加中文readme

niuyazhe added 4 commits April 13, 2026 16:37

polish(nyz): simplify r1-aqa script

991eac7

fix(nyz): clean data and add sglang audio data input

006b2e2

refactor(nyz): audio language rl pipeline

999a144

Merge remote-tracking branch 'origin/main' into dev/refactor-audio-rl

e451f5a

PaParaZz1 added the refactor Cleanup, formatting, or restructuring of existing code. label Apr 14, 2026

niuyazhe and others added 7 commits April 15, 2026 18:12

fix(nyz): fix sglang output reward bug

9e6d2d7

feature(nyz): add Qwen2.5 Omni audio support

54d41b8

fix(nyz): add training convergence version

d6d527b

style(nyz): correct format

9cf3360

fix(nyz): fix qwen2 audio compatibility bugs

3b93d57

polish(nyz): add qwen2-audio sglang pipeline

f17fac7

Merge branch 'main' into dev/refactor-audio-rl

afb4b9d

puyuan1996 requested changes Apr 24, 2026

View reviewed changes

puyuan1996 reviewed Apr 24, 2026

View reviewed changes

PaParaZz1 added 2 commits April 25, 2026 12:29

polish(nyz): docs and details

7ef597a

refactor(nyz): polish data layout materialize

f90b893

Conversation

PaParaZz1 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PaParaZz1 commented Apr 14, 2026 •

edited

Loading