Skip to content

Commit 1310310

Browse files
committed
docs: add comprehensive spectrogram API documentation
- Create docs/api/spectrogram.rst with SpectrogramData class documentation - Add usage examples for visualization and matplotlib overlays - Include technical details (CQT, frequency range, bins per octave) - Update docs/api/index.rst to include spectrogram in toctree and quick reference - Update docs/index.rst to list spectrogram feature
1 parent 3b30cfd commit 1310310

3 files changed

Lines changed: 99 additions & 1 deletion

File tree

docs/api/index.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Musical transcription data models:
2222

2323
transcription-models
2424
audio-models
25+
spectrogram
2526

2627
Utilities
2728
---------
@@ -54,4 +55,10 @@ Audio Management
5455

5556
* :class:`idtap.AudioMetadata` - Audio file metadata
5657
* :class:`idtap.AudioUploadResult` - Upload response
57-
* :class:`idtap.Musician` - Performer information
58+
* :class:`idtap.Musician` - Performer information
59+
60+
Spectrogram Analysis
61+
~~~~~~~~~~~~~~~~~~~~
62+
63+
* :class:`idtap.SpectrogramData` - CQT spectrogram data and visualization
64+
* :data:`idtap.SUPPORTED_COLORMAPS` - Available colormap names

docs/api/spectrogram.rst

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
Spectrogram Analysis
2+
====================
3+
4+
Spectrogram data access and visualization for audio analysis.
5+
6+
.. currentmodule:: idtap
7+
8+
SpectrogramData
9+
---------------
10+
11+
The :class:`SpectrogramData` class provides comprehensive access to Constant-Q Transform (CQT)
12+
spectrograms for computational musicology and audio analysis.
13+
14+
.. autoclass:: SpectrogramData
15+
:members:
16+
:undoc-members:
17+
:show-inheritance:
18+
19+
Key Features
20+
~~~~~~~~~~~~
21+
22+
* **Constant-Q Transform (CQT)** - Log-spaced frequency bins for musical analysis
23+
* **Intensity Transformation** - Power-law contrast enhancement (1.0-5.0)
24+
* **Colormap Support** - 35+ matplotlib colormaps
25+
* **Frequency/Time Cropping** - Extract specific frequency ranges or time segments
26+
* **Matplotlib Integration** - Plot on existing axes for overlays with pitch contours
27+
* **Image Export** - Save as PNG, JPEG, WebP, etc.
28+
29+
Quick Examples
30+
~~~~~~~~~~~~~~
31+
32+
Load and display a spectrogram::
33+
34+
from idtap import SwaraClient, SpectrogramData
35+
36+
client = SwaraClient()
37+
spec = SpectrogramData.from_audio_id("audio_id_here", client)
38+
39+
# Save basic visualization
40+
spec.save("output.png", power=2.0, cmap='viridis')
41+
42+
Create matplotlib overlay with pitch contour::
43+
44+
import matplotlib.pyplot as plt
45+
46+
# Load spectrogram and piece data
47+
spec = SpectrogramData.from_piece(piece, client)
48+
49+
# Create figure
50+
fig, ax = plt.subplots(figsize=(12, 6))
51+
52+
# Plot spectrogram as underlay with transparency
53+
im = spec.plot_on_axis(ax, power=2.0, cmap='viridis', alpha=0.7, zorder=0)
54+
55+
# Overlay pitch contour
56+
times = [traj.start_time for traj in piece.trajectories]
57+
pitches = [traj.pitch_contour[0] for traj in piece.trajectories]
58+
ax.plot(times, pitches, 'r-', linewidth=2, zorder=1)
59+
60+
# Configure axes
61+
ax.set_xlabel('Time (s)')
62+
ax.set_ylabel('Frequency (Hz)')
63+
plt.colorbar(im, ax=ax, label='Intensity')
64+
65+
plt.savefig('overlay.png', dpi=150, bbox_inches='tight')
66+
67+
Crop to specific region::
68+
69+
# Extract 200-800 Hz range, first 10 seconds
70+
cropped = spec.crop_frequency(200, 800).crop_time(0, 10)
71+
cropped.save("cropped.png", power=2.5, cmap='magma')
72+
73+
Supported Colormaps
74+
~~~~~~~~~~~~~~~~~~~
75+
76+
.. autodata:: SUPPORTED_COLORMAPS
77+
:annotation:
78+
79+
Available colormaps include: viridis, plasma, magma, inferno, hot, cool, gray, and many more.
80+
See the matplotlib colormap documentation for visual examples.
81+
82+
Technical Details
83+
~~~~~~~~~~~~~~~~~
84+
85+
* **Algorithm**: Essentia NSGConstantQ (Non-Stationary Gabor Constant-Q Transform)
86+
* **Default Frequency Range**: 75-2400 Hz
87+
* **Default Bins Per Octave**: 72 (high resolution for microtonal analysis)
88+
* **Data Format**: uint8 grayscale (0-255), gzip-compressed
89+
* **Time Resolution**: ~0.0116 seconds per frame (typical)
90+
* **Frequency Scale**: Logarithmic (perceptually-uniform for music)

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ Features
4545
* **OAuth Authentication** - Secure Google OAuth integration with token storage
4646
* **Rich Data Models** - Comprehensive classes for musical transcription data
4747
* **Audio Management** - Upload, download, and manage audio files
48+
* **Spectrogram Analysis** - CQT spectrogram visualization with matplotlib integration
4849
* **Export Capabilities** - Export transcriptions to JSON and Excel formats
4950
* **Permissions System** - Manage public/private visibility and sharing
5051

0 commit comments

Comments
 (0)