Language: English | 한국어
Release: v0.1.2
InferEdgeOrchestrator is a post-deployment runtime operation-control layer and lightweight scheduler for constrained edge devices. It controls multiple inference tasks after deployment, using per-task priority, latency budgets, bounded queues, load shedding, and telemetry so high-priority workloads stay responsive when backlog and latency spikes appear.
It is not a Triton or DeepStream replacement. The project is a runtime operation-control layer that makes overload-control decisions explicit, testable, and explainable.
The goal is not maximum-throughput serving. The goal is controllable inference behavior under constrained edge workloads.
Portfolio positioning: post-deployment runtime operation control, not Triton/DeepStream replacement or throughput serving.
Portfolio brief: PORTFOLIO.md (한국어)
- Solves the post-deployment operation problem: what runs first, what gets dropped, and why, when edge inference tasks contend for limited resources.
- Protects high-priority workloads with priority/deadline-aware scheduling, bounded queues, and adaptive load shedding.
- Does not silently drop work: overload decisions, drop reasons, and protected tasks are recorded as structured telemetry evidence.
- Connects Forge
agent_manifest.jsonand Runtimeresult.agentmetadata to aninferedge-orchestration-summary-v1scheduling evidence contract. - Validated with local pytest, GitHub Actions package/CLI smoke, synthetic overload comparison, Jetson dummy/ONNX smoke, and Jetson TensorRT-backed contention evidence.
| Runtime concern | Implementation |
|---|---|
| Multi-task inference | Config-driven task registration for detector/classifier/OCR-style workloads |
| Priority control | Priority and deadline-aware scheduling based on priority and latency_budget_ms |
| Backlog control | Bounded per-task queues with drop_oldest, drop_newest, and low-priority shedding behavior |
| Overload stability | Adaptive load shedding limits low-priority work to protect high-priority latency |
| Worker abstraction | Shared worker interface with dummy, onnxruntime, and TensorRT-backed workers |
| Runtime evidence | Telemetry JSON records executed/dropped counts, latency, backlog, result events, resource snapshots, and policy decisions |
| Agent contract bridge | Optional task references to Forge agent manifests and Runtime agent results, exported as orchestration summary evidence |
| Jetson smoke coverage | Jetson Orin Nano smoke scripts exercise CLI, telemetry, tegrastats parsing, ONNX Runtime execution, and TensorRT-backed contention |
Input Source
-> Frame Router
-> Bounded Task Queues
-> Priority + Deadline-Aware Scheduler
-> Inference Worker
-> Result Aggregator
-> Telemetry Logger
Each task is defined by operational policy:
{
"name": "detector",
"model_path": "models/detector.onnx",
"priority": 100,
"target_fps": 15,
"latency_budget_ms": 80,
"queue_size": 4,
"drop_policy": "drop_oldest",
"worker": "dummy"
}The scheduler's job is not to run every frame. It decides which task should run next, which frames are stale enough to drop, and when low-priority work should be limited so high-priority latency remains inside budget.
InferEdge validates deployability. InferEdgeEnv records whether benchmark evidence can be trusted and compared. InferEdgeOrchestrator controls deployed workloads under load.
flowchart LR
subgraph Validation["Validation Layer"]
Forge["InferEdgeForge\nmodel conversion\nbuild provenance"]
Runtime["InferEdge-Runtime\ndevice execution\nresult.json"]
Lab["InferEdgeLab\ncomparison\ndeployment decision"]
AIGuard["InferEdgeAIGuard\noptional anomaly/risk\nrecommendation"]
end
subgraph Comparability["Experiment Hygiene / Comparability Layer"]
Env["InferEdgeEnv\nrun evidence registry\ncomparability judgement"]
end
subgraph Operation["Operation Layer"]
Orchestrator["InferEdgeOrchestrator\npriority scheduling\nload shedding\nruntime telemetry"]
end
Forge --> Runtime --> Lab
Lab -. optional guard analysis .-> AIGuard
Runtime -. benchmark evidence .-> Env
Lab -->|"deployable model + result.json"| Orchestrator
AIGuard -. risk signals .-> Lab
The boundary is intentional:
- InferEdge answers whether a model is safe and reasonable to deploy.
- InferEdgeEnv answers whether benchmark evidence can be trusted and compared.
- InferEdgeOrchestrator controls how deployed inference tasks behave together.
- Orchestrator integration is file-based through
result.json, not direct imports.
| Phase | Delivered capability | Evidence |
|---|---|---|
| Phase 1: Scheduler Core | Config schema, dummy frame source, bounded queues, priority/deadline scheduler, dummy worker, load shedding, telemetry export | Pytest coverage for scheduler, queue, shedding, and telemetry |
| Phase 2: ONNX Runtime Worker | Config-selectable ONNX Runtime worker, identity ONNX smoke model, image/video input path support | configs/phase2_onnx_demo.json, scripts/create_identity_onnx.py |
| Phase 3: Overload Scenario | FIFO baseline vs scheduler/load-shedding comparison | python3 -m inferedge_orchestrator compare-overload ... |
| Phase 4: Jetson Smoke | Jetson CLI smoke, telemetry generation, resource snapshots, optional tegrastats parsing |
scripts/smoke_jetson_dummy.sh, scripts/smoke_jetson_onnx.sh |
| Phase 5: InferEdge Handoff | result.json latency signal converted into Orchestrator task config |
python3 -m inferedge_orchestrator from-inferedge ... |
| Agent Runtime Contract | Vision / Voice-Command / Safety-Monitor dummy workload with Forge agent manifest and Runtime result.agent references |
configs/agent_3_workload_demo.json, docs/agent_orchestration_summary_contract.md |
These results are lifecycle evidence, not benchmark claims. Smoke runs prove the runtime paths execute on edge hardware; the synthetic overload run proves the scheduler policy; the InferEdge handoff proves the validation-to-operation file boundary.
| Evidence | Key result | Artifact |
|---|---|---|
| Jetson dummy smoke | nano01 generated telemetry, resource snapshots, and low-priority drops: detector 20/0, classifier 2/18 executed/dropped |
examples/telemetry/jetson_smoke_dummy_sample.json |
| Jetson ONNX Runtime smoke | onnxruntime worker executed identity ONNX on Jetson with CPUExecutionProvider, output shape [1, 2], 13 tegrastats samples |
examples/telemetry/jetson_onnx_smoke_sample.json |
| Jetson TensorRT inference smoke | Built models/identity_fp16.plan from identity ONNX on Jetson, executed one TensorRT identity frame, and confirmed runtime telemetry metadata: PASS_TENSORRT_INFERENCE, PASS_TENSORRT_TELEMETRY |
docs/validation_evidence.md |
| Jetson TensorRT contention smoke | Ran high-priority and low-priority TensorRT tasks through scheduler/load-shedding contention: PASS_TENSORRT_CONTENTION |
examples/telemetry/jetson_tensorrt_contention_sample.json |
| Jetson TensorRT diverse contention smoke | Ran distinct generated detector/classifier TensorRT engines through scheduler/load-shedding contention: detector 6/0, classifier 1/5 executed/dropped, 5 overload events, PASS_TENSORRT_DIVERSE_CONTENTION |
examples/telemetry/jetson_tensorrt_diverse_contention_sample.json |
| Synthetic overload comparison | Detector p95 end-to-end latency improved from 782.0ms FIFO baseline to 8.0ms with scheduler + shedding; classifier dropped 16 low-priority frames |
examples/telemetry/phase3_overload_sample.json |
| InferEdge result handoff | Sample expected_latency_ms=42.2 produced recommended latency_budget_ms=64.0 without importing InferEdge internals |
configs/from_inferedge.json |
Versioned sample telemetry artifacts are available in
examples/telemetry/.
For the full evidence index, see
docs/validation_evidence.md.
CAPTURE_TEGRASTATS=1 scripts/smoke_jetson_dummy.shPYTHON_BIN=$HOME/miniconda3/envs/yolo_env/bin/python \
CAPTURE_TEGRASTATS=1 \
scripts/smoke_jetson_onnx.shLatest device records:
| Smoke | Device | OS / L4T | Python | Result | Note |
|---|---|---|---|---|---|
| Dummy scheduler smoke | nano01 |
Ubuntu 22.04.5 LTS, L4T R36.4.7 |
3.10.12 |
PASS |
CLI, telemetry, resource snapshots, low-priority drops |
| ONNX Runtime smoke | nano01 |
Ubuntu 22.04.5 LTS, L4T R36.4.7 |
3.10.12 |
PASS |
ONNX Runtime 1.23.2, CPUExecutionProvider, output metadata recorded |
These smoke records validate worker, scheduler, telemetry, and Jetson execution paths. They are not TensorRT/GPU throughput benchmarks.
python3 -m inferedge_orchestrator compare-overload \
--config configs/phase3_overload.json \
--output reports/phase3_overload.json \
--frames 20| Mode | Detector executed | Detector dropped | Detector p95 end-to-end latency | Classifier executed | Classifier dropped | Overload events |
|---|---|---|---|---|---|---|
| FIFO baseline | 20 | 0 | 782.0ms | 20 | 0 | 0 |
| Scheduler + load shedding | 20 | 0 | 8.0ms | 4 | 16 | 16 |
This is the core runtime operation-control story: low-priority classifier work is intentionally dropped under overload so the high-priority detector stays within latency budget, and the reason is visible in telemetry.
python3 -m inferedge_orchestrator from-inferedge \
--result examples/inferedge_result_sample.json \
--output configs/from_inferedge.json \
--task-name detector \
--model-path models/detector.onnx \
--priority 100 \
--target-fps 15 \
--queue-size 4The helper reads InferEdge result.json latency signals and recommends an
initial latency_budget_ms for Orchestrator task policy. This keeps validation
and operation control connected by artifacts while keeping the repositories
separate.
Install the local package with test dependencies:
python3 -m pip install -e '.[dev]'Run the tests:
python3 -m pytestRun the scheduler demo:
python3 -m inferedge_orchestrator run \
--config configs/phase1_demo.json \
--output reports/phase1_demo.json \
--frames 12Run the ONNX Runtime demo:
python3 -m pip install -e '.[onnx,dev]'
python3 scripts/create_identity_onnx.py --output models/identity.onnx
python3 -m inferedge_orchestrator run \
--config configs/phase2_onnx_demo.json \
--output reports/phase2_onnx_demo.json \
--frames 1Print a telemetry summary:
python3 -m inferedge_orchestrator report --input reports/phase1_demo.jsonFor more detail, see: