Out-of-band host memory logger for forge workloads, with optional device DRAM sampling.
- Get quick PNG snapshots to review memory behavior after a run, even if the process OOMs
- Correlate RSS, swap, device DRAM trends over time to identify memory pressure and instability due to specific runtime behaviours
- Get a live dashboard of memory dynamics in my local browser while the process is still running.
- Process RSS over time
- Swap usage over time
- OOM score over time
- Optional device DRAM usage over time (from tt-metal SHM regions)
- PNG (
--png): static plots for quick capture/sharing - HTML (
--html): interactive Plotly figure - Live dashboard (
--live): browser-based live Plotly view served from the remote machine
pip install psutil matplotlib plotlypython3 memory_logger.py --name workerpython3 memory_logger.py --pid 12345python3 memory_logger.py --name worker --device-dramWarning: device DRAM SHM files can be stale if a previous process exited uncleanly. If DRAM numbers look wrong (e.g. in SPMD mode, all devices should show lockstep allocation so min/max/avg should be the same), clean up stale regions before re-running:
rm -rf /dev/shm/tt_device_*_memorypython3 memory_logger.py --from-csv pid_memory.csv --png
python3 memory_logger.py --from-csv pid_memory.csv --htmlpython3 memory_logger.py --from-csv run_a.csv --from-csv run_b.csvLive mode monitors a single process and serves a local web dashboard from the remote machine:
python3 memory_logger.py --name worker --liveThen access from your local browser via SSH tunnel.
Optional advanced flags (only if overriding defaults): --live-host, --live-port, --live-refresh-ms.
ssh -L 8765:localhost:8765 <user>@<remote-host>Open:
http://localhost:8765
ssh -J <jump-user>@<jump-host> -L 8765:localhost:8765 <user>@<remote-host>ssh -J exabox -L 8765:localhost:8765 jameszianxu@bh-glx-110-c01u02.exabox.tenstorrent.com--liveis for active monitoring mode and cannot be combined with--from-csv.- Multi-CSV render currently supports PNG output.
- Legacy CSVs with extra columns are tolerated during parsing.
Plotly HTML:
PNG (and example analysis)