Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
90338d5
Fixed strace
A-Tarraf Apr 25, 2025
b6e8c70
Added docs
A-Tarraf Apr 25, 2025
c39a662
Merge branch 'main' into development
A-Tarraf Apr 25, 2025
fe643b6
Added FTIO page and FTIO checkbox
Tim-Dieringer Jun 27, 2025
0ac4d16
GET stored Ftio output
Tim-Dieringer Jul 17, 2025
8e26e71
Changed Seconds to Date Format
Tim-Dieringer Jul 17, 2025
34f3c01
Fixed README spelling
Tim-Dieringer Jul 17, 2025
e948d75
Implemented basic ftio output display
Tim-Dieringer Jul 21, 2025
62c1b73
Added Zoom/Pan, FTIO model transfer optimization and increased FTIO d…
Tim-Dieringer Aug 25, 2025
ea88ecb
Implemented FTIO parameter storage and webserver support to modify them
Tim-Dieringer Sep 24, 2025
d386db1
Implemented initial zmq connection to ftio and increased parameter su…
Tim-Dieringer Sep 30, 2025
6bbe0d2
Fixed amplitude scaling for webserver trace gui and adjusted zmq comm…
Tim-Dieringer Oct 9, 2025
438c094
Implemented display of multiple ftio frequencies and refactored some …
Tim-Dieringer Oct 13, 2025
9f20507
Improved display of multiple ftio frequencies and added trace size di…
Tim-Dieringer Oct 16, 2025
ac0e47a
Changed serialization to msgpack for zmq communication
Tim-Dieringer Oct 20, 2025
cf07724
Changed amount of time labels displayed in trace graph
Tim-Dieringer Oct 27, 2025
88b9fb8
Added support for custom ftio args, added tooltips for args and fixed…
Tim-Dieringer Oct 28, 2025
fc7bbae
Added ability to override predefined args with custom ones, and impro…
Tim-Dieringer Nov 4, 2025
603ef50
Added signal reconstruction, added analyzing a single metric with fti…
Tim-Dieringer Nov 18, 2025
fedd8e4
Implemented option to change branch amount of TBON and self repairing…
Tim-Dieringer Dec 1, 2025
c184a7b
Added Ftio logs to webserver
Tim-Dieringer Dec 1, 2025
350a5d1
Added dynamic port changing for FTIO
Tim-Dieringer Dec 2, 2025
e0f42e7
Added relative time toggle, completely removed JSON reliance during z…
Tim-Dieringer Dec 6, 2025
0451491
Implemented slight gui tweaks and slightly modified ftio communication
Tim-Dieringer Dec 18, 2025
ec605e8
Initial Instrumentation added
Tim-Dieringer Dec 18, 2025
83aca3b
Added binomial tree and instrumentation addition
Tim-Dieringer Dec 22, 2025
ee57a16
Added information to contributing.md
Tim-Dieringer Dec 22, 2025
f8fc520
Merge pull request #1 from A-Tarraf/feature_ftio
A-Tarraf Jan 8, 2026
0542b93
Add malleability support: graceful leave, TBON self-repair, auto-join…
A-Tarraf Jun 26, 2026
7de4da5
Instalation fix:
A-Tarraf Jun 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
337 changes: 328 additions & 9 deletions Cargo.lock

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ which = "6.0.2"
lazy_static = "1.5.0"
proc-maps = "0.4.0"
elf = "0.7.4"
zmq = "0.10.0"
rmp-serde = "1.3.0"
ctrlc = { version = "3.4", features = ["termination"] }

[lib]
name = "proxyclient"
Expand Down
65 changes: 64 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ cd proxy_v2
# Add the prefix to your path
export PATH=$HOME/metric-proxy/bin:$PATH
# Run the server (and keep it running)
proxy-v2
proxy_v2
# Run the client in another shell1
proxy_run -j testls -- ls -R /
```
Expand Down Expand Up @@ -410,6 +410,69 @@ It consists in such JSON:
]
```

## Malleability Support (TBON Expand / Shrink / Graceful Leave)

The proxy supports dynamic changes to the tree-based overlay network (TBON) at runtime. This is useful for malleable HPC jobs where nodes are added to or removed from an allocation while the proxy tree is live.

### How the TBON Works

Each node runs one `proxy_v2` process. One node is the **root** (no `--root-proxy`); the others are **children** (`--root-proxy <root-addr>`). Children register with the root at startup via `/join`, and the root periodically scrapes them.

### Auto-Discovery (`--auto-root` / `--root-url-dir`)

On a shared filesystem, the root proxy writes its URL to `<target-prefix>/root.url` at startup. Child proxies launched with `--auto-root` read this file instead of requiring a hardcoded `--root-proxy` address:

```bash
# Root node — writes root.url into its profile directory
proxy_v2 --port 1337 --target-prefix /shared/proxy/root

# Child node — discovers root from the shared file
proxy_v2 --port 1337 \
--target-prefix /shared/proxy/$(hostname) \
--auto-root \
--root-url-dir /shared/proxy/root
```

`--root-url-dir` overrides the directory to search for `root.url`. Without it, `--auto-root` reads from `<target-prefix>/root.url`. The root URL can also be injected via the `PROXY_ROOT_URL` environment variable (takes precedence over `--auto-root`).

### Graceful Leave (`/leave` endpoint)

When a child proxy receives SIGTERM, it sends a `/leave?from=<my-url>` request to the root before exiting. The root immediately removes the departing node from the TBON — no waiting for a missed scrape.

```
GET http://<root>:<port>/leave?from=<child-url>
```

This is handled automatically by the signal handler; no user action is needed. In the worst case (SIGKILL / crash), the existing self-repair mechanism takes over (see below).

### TBON Self-Repair (Shrink)

If the root fails to scrape a child proxy (connection refused / timeout), it removes the dead node from the topology. The repair happens within one sampling period (`--sampling-period`, default 1000 ms).

### Using the Proxy with DMR (Dynamic Resource Manager)

Each node in the DMR allocation should run one proxy:

| Role | Command |
|------|---------|
| Root node | `proxy_v2 --port 1337 --target-prefix <shared-fs>/root` |
| Worker nodes (static) | `proxy_v2 --port 1337 --root-proxy <root-addr>:1337` |
| Worker nodes (malleable) | `proxy_v2 --port 1337 --auto-root --root-url-dir <shared-fs>/root` |

The instrumented application communicates with the local proxy via the UNIX socket (`proxy_run` or the `libproxyclient.so` LD_PRELOAD). No special environment variable is needed beyond `PROXY_JOB_ID` (or the SLURM / MPI job ID picked up automatically by `proxy_run`).

> **Note on `libproxyclient.so` LD_PRELOAD**: The ELF constructor that auto-connects the client library on load may not fire in all build configurations. If metrics are not appearing, call `proxy_init()` explicitly early in your application or use `proxy_run` as the launcher wrapper.

### Malleability Experiment

An automated end-to-end test lives in `experiment/run_malleability_test.sh`. It uses a 4-node Docker cluster (`dmr01`–`dmr04`) to exercise all three scenarios in sequence:

1. **Graceful leave** — SIGTERM a child; `/leave` triggers immediate TBON repair
2. **Shrink** — SIGKILL a child; root detects the missing scrape and self-repairs
3. **Expand** — restart the killed child with `--auto-root`; it reads `root.url` and self-registers

See [`experiment/`](experiment/) for setup instructions and expected output.

## Acknowledgments

This project has received funding from the European Union’s Horizon 2020 JTI-EuroHPC research and innovation programme with grant Agreement number: 956748
Expand Down
94 changes: 94 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Contributing to Proxy
Thank you for considering contributing to Proxy.
We welcome contributions of all kinds, including bug fixes, new features, documentation improvements, and more.

## Getting Started
> [!note]
> If you are a student from TU Darmstadt, kindly see these [instructions](/docs/students_contribute.md).

### Step 1: Fork the Repository
1. Visit the [Proxy GitHub repository](https://github.com/A-Tarraf/proxy_v2).
2. Click the **Fork** button in the top-right corner to create a copy of the repository under your GitHub account.

### Step 2: Clone Your Fork
Clone the forked repository to your local machine:
```bash
git clone https://github.com/<your-username>/Proxy.git
```

Replace `<your-username>` with your GitHub username.

### Step 3: Navigate to the Project Directory
```bash
cd Proxy
```

### Step 4: Build the Project in Debug Mode
Compile the project using the `make debug` command:
```bash
# allows to directly test the changes made
make debug
```

This will generate a debug build of the project, useful for development and troubleshooting.

### Step 5: Sync with the Original Repository (Optional)
To stay up-to-date with the latest changes from the main repository:
```bash
git remote add upstream https://github.com/A-Tarraf/proxy_v2.git
git fetch upstream
git merge upstream/main
```

### Step 6: Create an Issue for Your Contribution
Before starting your work, create an issue on the repository to describe the feature, bug fix, or enhancement you plan to implement. This helps us track contributions and avoids duplicate work.

1. Go to the **Issues** tab in the [Proxy repository](https://github.com/A-Tarraf/proxy_v2).
2. Click **New Issue** and provide a clear title and description.
3. Label the issue appropriately (e.g., `enhancement`, `bug`, or `question`).

### Step 7: Make Your Changes
1. Create a new branch for your changes:
```bash
git checkout -b <your-feature-branch>
```
Replace `<your-feature-branch>` with a descriptive name for your branch.

2. Make your desired changes and commit them:
```bash
git add .
git commit -m "Description of your changes"
```

### Step 8: Push Your Changes
Push your changes to your forked repository:
```bash
git push origin <your-feature-branch>
```


### Step 9: Create a Pull Request to the `development` Branch
1. Navigate to the original Proxy repository on GitHub.
2. Click the **Pull Requests** tab, then click **New Pull Request**.
3. Set the target branch to `development`:
- **Base Repository:** `A-Tarraf/proxy_v2`
- **Base Branch:** `development`
- **Compare Branch:** `<your-feature-branch>`
4. Provide a detailed description of your changes, referencing the issue you created earlier (e.g., `Fixes #123`).
5. Submit your pull request and wait for feedback from the maintainers.

We look forward to your contributions! 🎉

<p align="right"><a href="#Proxy">⬆</a></p>


## License

By contributing, you agree that your contributions will be licensed under the same license as this project.

# List Of Contributors

We sincerely thank the following contributors for their valuable contributions:
- [Jean-Baptiste Bensard](https://github.com/besnardjb)
- [Ahmad Tarraf](https://github.com/a-tarraf)
- [Tim Dieringer](https://github.com/Tim-Dieringer): bachelor thesis: topology expansion and ftio integration
Loading