feat: add integration tests for ASR functionality using sample audio files#120
feat: add integration tests for ASR functionality using sample audio files#120
Conversation
📝 WalkthroughWalkthroughThis pull request adds integration tests for ASR functionality that work with actual audio files. It introduces a new test suite with synchronous and asynchronous variants, including fixtures and test cases that validate transcription accuracy, timestamp handling, and text content verification. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds new ASR integration coverage that transcribes a known, committed sample audio file (samples/jfk.wav) to validate end-to-end ASR behavior (sync + async) against expected English content.
Changes:
- Introduce
SAMPLES_DIRand ajfk.wav-backed fixture for integration testing. - Add sync ASR integration tests validating transcription text and segment timing.
- Add an async ASR integration test validating transcription text.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def test_asr_from_file_with_timestamps(self, client, jfk_audio): | ||
| """Test transcription with timestamps from a known audio file.""" | ||
| result = client.asr.transcribe(audio=jfk_audio, language="en") |
There was a problem hiding this comment.
test_asr_from_file_with_timestamps relies on the default include_timestamps=True. Making this explicit in the call would prevent the test from silently changing meaning if the client default ever changes.
| result = client.asr.transcribe(audio=jfk_audio, language="en") | |
| result = client.asr.transcribe(audio=jfk_audio, language="en", include_timestamps=True) |
| def test_asr_from_file_without_timestamps(self, client, jfk_audio): | ||
| """Test transcription without timestamps from a known audio file.""" | ||
| result = client.asr.transcribe(audio=jfk_audio, include_timestamps=False) | ||
|
|
||
| assert isinstance(result, ASRResponse) | ||
| assert result.text |
There was a problem hiding this comment.
test_asr_from_file_without_timestamps doesn’t currently assert anything about timestamps/segments (it only checks result.text). To validate include_timestamps=False, assert that result.segments is empty (or that segment timing fields are absent, depending on the API contract).
| @pytest.fixture | ||
| def jfk_audio(self): | ||
| """Load the JFK sample audio file.""" | ||
| return (SAMPLES_DIR / "jfk.wav").read_bytes() |
There was a problem hiding this comment.
The jfk_audio fixture is duplicated in both sync and async test classes. Consider defining it once (e.g., at module level, optionally with a broader scope like module/session) to avoid repetition and repeated disk I/O.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/integration/test_asr_integration.py (1)
55-58: Extractjfk_audiointo one module-level fixture.The fixture is duplicated verbatim in both classes. Keeping the file read in one place avoids drift and makes future sample-path changes a single edit.
♻️ Proposed refactor
SAMPLES_DIR = Path(__file__).resolve().parents[2] / "samples" + + +@pytest.fixture +def jfk_audio(): + """Load the JFK sample audio file.""" + return (SAMPLES_DIR / "jfk.wav").read_bytes() class TestASRFromFileIntegration: """Test ASR using a sample audio file with known content.""" - - `@pytest.fixture` - def jfk_audio(self): - """Load the JFK sample audio file.""" - return (SAMPLES_DIR / "jfk.wav").read_bytes() def test_asr_from_file(self, client, jfk_audio): """Test transcription of a known audio file.""" result = client.asr.transcribe(audio=jfk_audio, language="en") @@ class TestAsyncASRFromFileIntegration: """Test async ASR using a sample audio file.""" - - `@pytest.fixture` - def jfk_audio(self): - """Load the JFK sample audio file.""" - return (SAMPLES_DIR / "jfk.wav").read_bytes() `@pytest.mark.asyncio` async def test_async_asr_from_file(self, async_client, jfk_audio):Also applies to: 91-94
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_asr_integration.py` around lines 55 - 58, Move the duplicated pytest fixture jfk_audio out of the test classes and define it once at module scope (a single top-level fixture that reads SAMPLES_DIR / "jfk.wav"). Remove the class-level jfk_audio methods in both test classes so tests simply depend on the module-level jfk_audio fixture; keep the fixture name unchanged so no test signatures need updating.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/test_asr_integration.py`:
- Around line 80-85: Update the test_asr_from_file_without_timestamps test to
also assert that the ASRResponse contains no segments when
include_timestamps=False and to pass language="en" to transcribe; specifically,
call client.asr.transcribe(audio=jfk_audio, include_timestamps=False,
language="en") and then assert isinstance(result, ASRResponse), assert
result.text, and assert that result.segments is empty (e.g.,
len(result.segments) == 0 or result.segments == []), referencing the ASRResponse
and client.asr.transcribe usage.
- Around line 70-72: The test function test_asr_from_file_with_timestamps should
explicitly pass include_timestamps=True to the SDK call so it exercises the
timestamped-transcription path regardless of SDK defaults; update the call to
client.asr.transcribe(audio=jfk_audio, language="en", include_timestamps=True)
to ensure the test asserts timestamp behavior deterministically.
---
Nitpick comments:
In `@tests/integration/test_asr_integration.py`:
- Around line 55-58: Move the duplicated pytest fixture jfk_audio out of the
test classes and define it once at module scope (a single top-level fixture that
reads SAMPLES_DIR / "jfk.wav"). Remove the class-level jfk_audio methods in both
test classes so tests simply depend on the module-level jfk_audio fixture; keep
the fixture name unchanged so no test signatures need updating.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 45c8bbea-b008-4ed7-b91b-9ac8f0d76f00
⛔ Files ignored due to path filters (1)
samples/jfk.wavis excluded by!**/*.wav
📒 Files selected for processing (1)
tests/integration/test_asr_integration.py
| def test_asr_from_file_with_timestamps(self, client, jfk_audio): | ||
| """Test transcription with timestamps from a known audio file.""" | ||
| result = client.asr.transcribe(audio=jfk_audio, language="en") |
There was a problem hiding this comment.
Pass include_timestamps=True explicitly.
Line 72 currently depends on the SDK default. If that default changes, this test stops exercising the timestamped path and becomes a false positive.
🛠️ Suggested fix
- result = client.asr.transcribe(audio=jfk_audio, language="en")
+ result = client.asr.transcribe(
+ audio=jfk_audio,
+ language="en",
+ include_timestamps=True,
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/test_asr_integration.py` around lines 70 - 72, The test
function test_asr_from_file_with_timestamps should explicitly pass
include_timestamps=True to the SDK call so it exercises the
timestamped-transcription path regardless of SDK defaults; update the call to
client.asr.transcribe(audio=jfk_audio, language="en", include_timestamps=True)
to ensure the test asserts timestamp behavior deterministically.
| def test_asr_from_file_without_timestamps(self, client, jfk_audio): | ||
| """Test transcription without timestamps from a known audio file.""" | ||
| result = client.asr.transcribe(audio=jfk_audio, include_timestamps=False) | ||
|
|
||
| assert isinstance(result, ASRResponse) | ||
| assert result.text |
There was a problem hiding this comment.
Assert that timestamps are actually omitted.
This test still passes if include_timestamps=False is ignored and segments are returned, so it doesn't verify the behavior its name promises. src/fishaudio/types/asr.py:20-31 documents segments as empty when timestamps are disabled. While touching this, pass language="en" here too so the test only varies the timestamp flag.
🛠️ Suggested fix
- result = client.asr.transcribe(audio=jfk_audio, include_timestamps=False)
+ result = client.asr.transcribe(
+ audio=jfk_audio,
+ language="en",
+ include_timestamps=False,
+ )
assert isinstance(result, ASRResponse)
assert result.text
+ assert not result.segments🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/test_asr_integration.py` around lines 80 - 85, Update the
test_asr_from_file_without_timestamps test to also assert that the ASRResponse
contains no segments when include_timestamps=False and to pass language="en" to
transcribe; specifically, call client.asr.transcribe(audio=jfk_audio,
include_timestamps=False, language="en") and then assert isinstance(result,
ASRResponse), assert result.text, and assert that result.segments is empty
(e.g., len(result.segments) == 0 or result.segments == []), referencing the
ASRResponse and client.asr.transcribe usage.
Summary by CodeRabbit