Skip to content

Phase 26: Adaptive Runtime Agent & Hybrid Execution#39

Merged
t81dev merged 1 commit intomainfrom
phase-26-adaptive-runtime-agent-8366195443512061802
Jan 29, 2026
Merged

Phase 26: Adaptive Runtime Agent & Hybrid Execution#39
t81dev merged 1 commit intomainfrom
phase-26-adaptive-runtime-agent-8366195443512061802

Conversation

@t81dev
Copy link
Copy Markdown
Owner

@t81dev t81dev commented Jan 29, 2026

This PR implements Phase 26 of the TFMBS roadmap: the Adaptive Runtime Agent and Hybrid Execution paths.

Key features:

  1. Adaptive Decision Engine: Located in src/libtfmbs_device.c, the agent monitors kernel sparsity in real-time using an EMA and decides whether to offload to the Ternary Fabric or fallback to the host CPU.
  2. Hysteresis & Recovery: To avoid being "stuck" in a CPU-fallback state when sparsity drops temporarily, the agent implements a periodic probe mechanism that forces a Fabric execution every $N$ kernels to re-evaluate performance.
  3. Realistic Driver API: Pre-emptively matured the Mock Driver (Phase 22 prep) by adding IOCTLs for residency querying and memory pinning, aligning the software stack with the upcoming physical hardware requirements.
  4. Telemetry & Benchmarking: Added new metrics for tracking offload/fallback ratios and included a dedicated stress test (test_adaptive.c) to verify the switching logic.

Configuration is handled via environment variables:

  • TFMBS_ADAPTIVE_POLICY: offload, fallback, or sparsity.
  • TFMBS_SPARSITY_THRESHOLD: Float value (default 0.3).
  • TFMBS_EMA_ALPHA: Smoothing factor for sparsity tracking.

Verified with test_adaptive and existing test_device suites.


PR created automatically by Jules for task 8366195443512061802 started by @t81dev

- Introduced Adaptive Runtime Agent in `libtfmbs_device.c` to dynamically switch between Fabric and CPU.
- Implemented sparsity-based decision logic using Exponential Moving Average (EMA).
- Added CPU fallback path for GEMV.
- Implemented hysteresis/probing to prevent stuck fallback states.
- Refactored Mock Driver (Phase 22 Prep) with new IOCTLs (Residency Query, Pinning).
- Extended telemetry with `fallback_count` and `offload_count`.
- Added comprehensive test case `tests/test_adaptive.c`.
- Updated `ROADMAP.md` and `BENCHMARKS.md`.

Co-authored-by: t81dev <207451414+t81dev@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@t81dev t81dev merged commit 327548d into main Jan 29, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant