Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions tools/scx_forge_agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,9 +200,11 @@ before Phase 2 spends rounds on writing new code.
context, but the model is free to choose the next self-contained experiment
on its own technical merits rather than exploiting kept directions or cooling
down regressed ones.
When `trace-cmd` is available, the harness also records a small curated sched
profile while the workload runs and passes only a compact trace summary in
the verdict JSON to the planner model.
When `perf` is available, the harness also records the configured perf events
while the workload runs and passes only a compact sample summary in the
verdict JSON to the planner model, including observed event counts, global
retained-window sample rates, and a compact CPU-rate summary. For hardware PMU
events, these are perf sample records rather than raw hardware event totals.
5. The crate is edited in place on the current branch, no branch is created and
no commits are made. Accepted edits accumulate in the working tree; the final
report (markdown or `--json`) shows the per-round history and the winner, and
Expand Down
22 changes: 12 additions & 10 deletions tools/scx_forge_agent/resources/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -443,10 +443,9 @@ process; there is no separate script):
scheduler is already attached, launches it on the host as root, drives a
workload, verifies `/sys/kernel/sched_ext/state` is still `enabled`, extracts
one metric, always tears the scheduler down (verifying the state returns to
`disabled`), optionally records a small `trace-cmd` sched profile
(`sched:sched_wakeup`, `sched:sched_wakeup_new`, `sched:sched_switch`,
`sched:sched_migrate_task`) while the workload runs, and produces a single
JSON verdict. It needs root: run as root, with passwordless sudo, or set
`disabled`), optionally records the configured `perf` events while the
workload runs, and produces a single JSON verdict. It needs root: run as root,
with passwordless sudo, or set
`SCX_SUDO_PASSWORD_FILE` to a file containing the sudo password.
- `tools/scx_forge_agent/spec.toml` documents the spec: `[scheduler]` (package -
which `scheds/rust/<name>` crate to optimize - plus profile), `[system]`
Expand All @@ -456,8 +455,8 @@ process; there is no separate script):
claude/codex/opencode/cursor-agent,
and the planner/coder can use different backends. API keys come from
`$SCX_FORGE_API_KEY` / `$SCX_FORGE_CODING_API_KEY`),
`[tracing]` (`enable_tracing`, `trace_events` passed to trace-cmd record, plus
`max_trace_size` capping the recorded trace.dat, e.g. `256M`),
`[tracing]` (`enable_tracing`, `trace_events` passed to perf record with `-e`,
plus `max_trace_size` passed to `perf record --max-size`, e.g. `256M`),
`[workload]` (`command`, `duration` in seconds, repeated measurement
count `runs`), and `[goal]` (a plain-language `prompt`, `direction`, and
`accept_threshold_stddev`; the reported metric name is always `score`).
Expand All @@ -484,10 +483,13 @@ Agent loop (driven by the controller, not by you):
(`stage = "metric"`), because that often means the candidate disrupted the
workload. Read `metric.value` (vs the previous best value), normalized
`improvement`, the `scheduler_log_tail` (the scheduler stats deltas), and
`sched_trace` if present (summary counts plus top switch/wakeup/migration
tasks and CPUs from trace-cmd, not raw trace data) to understand *why* the
number moved. Improvement is always measured against the starting (round 0)
scheduler, never the default kernel scheduler.
`sched_trace` if present (observed perf event/sample counts, global
retained-window sample rates, and a compact CPU-rate summary from the
configured perf events, not raw perf script data) to understand *why* the
number moved. Tracepoint events are sampled per occurrence; hardware PMU events
are sampled according to perf's configured/default sampling period, so they are
not raw hardware event totals. Improvement is always measured against the
starting (round 0) scheduler, never the default kernel scheduler.
6. Keep a short running log of (change -> metric value) so the search is
auditable. The controller can append a completed run to a state file with
`--save <path>` and load that memory into future prompts with
Expand Down
18 changes: 10 additions & 8 deletions tools/scx_forge_agent/spec.toml
Original file line number Diff line number Diff line change
Expand Up @@ -69,21 +69,23 @@ skip_knobs = false
cross_scheduler_refs = false

[tracing]
# Enable optional trace-cmd tracing during workloads (only used when trace-cmd
# is available).
# Enable optional perf recording during workloads (only used when perf is
# available).
enable_tracing = true
# Trace events passed to trace-cmd record with -e.
# Events passed to perf record with -e. Tracepoints such as sched:sched_switch are
# recorded per occurrence; hardware PMU events such as cache-misses are sampled
# according to perf's configured/default sampling period.
trace_events = [
"sched:sched_wakeup",
"sched:sched_wakeup_new",
"sched:sched_switch",
"sched:sched_migrate_task",
"cache-references",
"cache-misses",
]
# Cap on the combined trace.dat size per recording. Accepts a plain byte count or
# a human-readable size with a binary suffix (K, M, G; an optional trailing B is
# allowed), e.g., "256M" or "1G". trace-cmd records into a circular file once
# the cap is reached, keeping the most recent data so the file stays bounded
# even for long or busy workloads.
# Cap passed to perf record --max-size for perf.data. Accepts a plain byte count
# or a human-readable size with a binary suffix (K, M, G; an optional trailing B
# is allowed), e.g., "256M" or "1G".
max_trace_size = "256M"

[scheduler]
Expand Down
2 changes: 1 addition & 1 deletion tools/scx_forge_agent/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1767,7 +1767,7 @@ async fn optimize(args: OptimizeArgs) -> Result<()> {
println!(
"tracing : {}",
if spec.tracing.enable_tracing {
"enabled when trace-cmd is available"
"enabled when perf is available"
} else {
"disabled"
}
Expand Down
16 changes: 7 additions & 9 deletions tools/scx_forge_agent/src/spec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
//!
//! Mirrors `tools/scx_forge_agent/spec.toml`: `[scheduler]` (what to build and run),
//! `[system]` (host/runtime settings), `[ai]` (model selection), `[tracing]`
//! (optional trace-cmd profiling, event list, and size cap), `[workload]`
//! (optional perf profiling, event list, and size cap), `[workload]`
//! (the load to apply, the numeric metric to emit, and how many times to repeat
//! the measurement), and `[goal]` (what the number means, which direction is
//! better, and the accept threshold).
Expand Down Expand Up @@ -140,18 +140,16 @@ impl Ai {
#[derive(Debug, Clone, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct Tracing {
/// Enable optional trace-cmd tracing during workloads (only used when
/// trace-cmd is available). On by default.
/// Enable optional perf recording during workloads (only used when perf is
/// available). On by default.
#[serde(default = "default_true")]
pub enable_tracing: bool,
/// Trace events passed to `trace-cmd record` with `-e`.
/// Events passed to `perf record` with `-e`.
#[serde(default = "default_trace_events")]
pub trace_events: Vec<String>,
/// Cap on the combined `trace.dat` size for a recording. Accepts a plain
/// byte count or a human-readable size with a binary suffix (`K`, `M`, `G`,
/// optionally followed by `B`), e.g. `256M`, `1G`. trace-cmd records into a
/// circular file once the cap is hit, so the recording keeps the most recent
/// data and the file stays bounded even for long or busy workloads.
/// Cap passed to `perf record --max-size` for perf.data. Accepts a plain byte
/// count or a human-readable size with a binary suffix (`K`, `M`, `G`,
/// optionally followed by `B`), e.g. `256M`, `1G`.
#[serde(default = "default_max_trace_size")]
pub max_trace_size: String,
}
Expand Down
Loading
Loading