Skip to content

refactor(nyz): audio language model RL pipeline#58

Open
PaParaZz1 wants to merge 13 commits into
mainfrom
dev/refactor-audio-rl
Open

refactor(nyz): audio language model RL pipeline#58
PaParaZz1 wants to merge 13 commits into
mainfrom
dev/refactor-audio-rl

Conversation

@PaParaZz1

@PaParaZz1 PaParaZz1 commented Apr 14, 2026

Copy link
Copy Markdown
Member
  • sglang pipeline
  • vllm pipeline
  • update doc
  • Qwen2-Audio/Qwen2.5-Omni adaption

@PaParaZz1 PaParaZz1 added the refactor Cleanup, formatting, or restructuring of existing code. label Apr 14, 2026

# Build question template (matches R1-AQA)
choice_str = f"Please choose the answer from the following options: {multi_choice}."
# There should be a space between <answer> and </answer>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个对结果影响很大吗?注释里面讲加不加空格对性能的影响是否更好呢

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对 qwen2-audio 的初始收敛速度影响很大,should be 就是说明要加这个空格的意思

parser = argparse.ArgumentParser(
description="Preprocess AVQA dataset (R1-AQA format) for LightRFT training"
)
parser = argparse.ArgumentParser(description="Preprocess AVQA dataset (R1-AQA format) for LightRFT training")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否可以加下AVQA的链接呢

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件一开头的注释有 AVQA 的链接

Each item returns ``(prompt_text, audio_payload, reference, label)`` where:
- ``prompt_text`` is rendered through the Qwen2-Audio chat template
- ``audio_payload`` is kept as raw waveform + sampling rate for rollout-side processing
- ``reference`` and ``label`` are passed through to reward computation

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么删掉:param dataset:这些呢

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补上了

all_videos: Optional[List] = None,
videos_num: Optional[List[int]] = None,
) -> EasyDict:
@staticmethod

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AudioMultimodalProcessor是没有用到了吗

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对,重构删掉了

Comment thread examples/r1_aqa/eval.py Outdated
"Output the final answer in <answer></answer>."
)
question_template = (f"{obj_dict['question']} {choice_str} "
"Output the final answer in <answer></answer>.")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的 中间没有空格?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改了

Comment thread lightrft/models/loss.py
log_probs: Optional[torch.Tensor] = None,
old_log_probs: Optional[torch.Tensor] = None,
ratio: Optional[torch.Tensor] = None,
) -> None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加下docstring?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加了

experience.sequences[0].unsqueeze(0), skip_special_tokens=True
)
self.strategy.print("collect phase: experience.sequences w skip_special_tokens: ", output)
self.strategy.print(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

启动脚本中加个debug的option 如果打开就print这里的信息吧?方便debug分析

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这不是这个 PR 的功能,应该在其他 polish PR 弄

else:
sequences = experience.sequences

pixel_values = experience.pixel_values

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么 pixel_values相关都删除了呀

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是放在 _build_model_kwargs 方法里统一实现了,这样对各种模态都能处理


# [Protection measure 2] Per-token KL Clamping
# NOTE: Adding this causes svkng training to not converge
# kl = torch.clamp(kl, min=0.0, max=20.0)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个保留吧?之前发现的关键信息

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加回去了

# Use wandb_log_counter to ensure eval has a unique system step
# This prevents eval metrics from being overwritten by train metrics
# The plots will still use eval/global_step as X-axis due to define_metric
self.wandb_log_counter += 1

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关键的注释保留吧?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加回去了

Comment thread examples/r1_aqa/README.md
Audio RL now uses a dedicated rollout path in core LightRFT code:
- raw audio payloads stay on the generation side and are passed to SGLang as `audio_data`
- processed mel features are stored explicitly as `audio_values`
- Qwen2-Audio feature masking is stored explicitly as `feature_attention_mask`

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否可以增加中文readme

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor Cleanup, formatting, or restructuring of existing code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants