From 1611e6d01519708f8b603f25dddd8974f1a5ff5c Mon Sep 17 00:00:00 2001
From: HaD0Yun <HaD0Yun@users.noreply.github.com>
Date: Sun, 22 Mar 2026 19:25:46 +0900
Subject: [PATCH] Capture the paper clarifications from issue #13

Adds a small paper-clarification note based on the maintainer answers already given in issue #13.

Constraint: Keep this branch scoped to issue #13 only
Rejected: Folding this into a multi-issue combined PR | the requested delivery shape is one PR per issue
Confidence: high
Scope-risk: narrow
Directive: Keep this note fact-limited to what the public repository and issue thread currently support
Tested: git diff --check --cached
Not-tested: Runtime feature execution for the requested capability
---
 docs/issues/issue-13-paper-questions.md | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
 create mode 100644 docs/issues/issue-13-paper-questions.md

diff --git a/docs/issues/issue-13-paper-questions.md b/docs/issues/issue-13-paper-questions.md
new file mode 100644
index 0000000..b070922
--- /dev/null
+++ b/docs/issues/issue-13-paper-questions.md
@@ -0,0 +1,23 @@
+# Issue #13 — Paper clarification note
+
+## Summary
+This note turns the maintainer answers from issue #13 into a short reference for readers of the paper.
+
+## Clarifications already provided in the issue thread
+
+### 1) Lookahead shift `K`
+The maintainer response states that acoustic features are shifted by **5 positions**. In practice, that means TADA uses **5 text-token lookahead** during TTS generation.
+
+### 2) Evaluation setup for Tables 5 and 6
+The maintainer response states that the evaluation is the **voice cloning setup from Table 2**, using **PPL as in Table 4**.
+
+### 3) Cross-entropy / KD losses for the TTS setting
+The maintainer response states that removing the CE and KD losses did **not significantly improve TTS performance**. The ablations in Table 6 were run at a smaller scale before the final main model, so the team does not currently report a separate "base" number for the fully removed-loss setting.
+
+## Follow-up questions that remain open in the thread
+The issue still contains follow-up questions that are not yet answered in the repository docs:
+- how the text/audio pair construction handles the final lookahead positions during training
+- whether text prediction remains active under the hood during inference in the TTS path
+
+## Why this note exists
+The GitHub issue already contains useful maintainer answers, but they are easy to miss if a reader only consults the repository files.