Skip to content

jameszianxuTT/memory_logger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

memory_logger

Out-of-band host memory logger for forge workloads, with optional device DRAM sampling.

Use Cases

  • Get quick PNG snapshots to review memory behavior after a run, even if the process OOMs
  • Correlate RSS, swap, device DRAM trends over time to identify memory pressure and instability due to specific runtime behaviours
  • Get a live dashboard of memory dynamics in my local browser while the process is still running.

Metrics

  • Process RSS over time
  • Swap usage over time
  • OOM score over time
  • Optional device DRAM usage over time (from tt-metal SHM regions)

Output modes

  • PNG (--png): static plots for quick capture/sharing
  • HTML (--html): interactive Plotly figure
  • Live dashboard (--live): browser-based live Plotly view served from the remote machine

Install

pip install psutil matplotlib plotly

Common usage

1) Monitor by exact process name (default PNG output)

python3 memory_logger.py --name worker

2) Monitor by PID (when needed)

python3 memory_logger.py --pid 12345

3) Enable device DRAM sampling

python3 memory_logger.py --name worker --device-dram

Warning: device DRAM SHM files can be stale if a previous process exited uncleanly. If DRAM numbers look wrong (e.g. in SPMD mode, all devices should show lockstep allocation so min/max/avg should be the same), clean up stale regions before re-running:

rm -rf /dev/shm/tt_device_*_memory

4) Render from an existing CSV (eg. useful if proc under profiling crashes from OOM or host dies)

python3 memory_logger.py --from-csv pid_memory.csv --png
python3 memory_logger.py --from-csv pid_memory.csv --html

5) Render multiple CSVs in one PNG (stacked subplots, unified x-axis)

python3 memory_logger.py --from-csv run_a.csv --from-csv run_b.csv

Live mode

Live mode monitors a single process and serves a local web dashboard from the remote machine:

python3 memory_logger.py --name worker --live

Then access from your local browser via SSH tunnel.

Optional advanced flags (only if overriding defaults): --live-host, --live-port, --live-refresh-ms.

SSH tunnel (direct)

ssh -L 8765:localhost:8765 <user>@<remote-host>

Open:

http://localhost:8765

SSH tunnel via jump host

ssh -J <jump-user>@<jump-host> -L 8765:localhost:8765 <user>@<remote-host>

Exabox example (with extra jump)

ssh -J exabox -L 8765:localhost:8765 jameszianxu@bh-glx-110-c01u02.exabox.tenstorrent.com

Notes

  • --live is for active monitoring mode and cannot be combined with --from-csv.
  • Multi-CSV render currently supports PNG output.
  • Legacy CSVs with extra columns are tolerated during parsing.

Example output

Plotly HTML:

image

PNG (and example analysis)

image

About

Python tool for monitoring process memory usage (RSS, swap, OOM score) over time with CSV logging and plot generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages