Skip to content

ai-summary: detect GitHub timeout-minutes kills via log completion marker#134

Open
ppetrovicTT wants to merge 1 commit into
ppetrovic/ai-summary-detection-patterns-yamlfrom
ppetrovic/ai-summary-log-finish-line
Open

ai-summary: detect GitHub timeout-minutes kills via log completion marker#134
ppetrovicTT wants to merge 1 commit into
ppetrovic/ai-summary-detection-patterns-yamlfrom
ppetrovic/ai-summary-log-finish-line

Conversation

@ppetrovicTT

@ppetrovicTT ppetrovicTT commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Problem

A step killed by GitHub timeout-minutes leaves no trace in the tee'd log — the ##[error]... has timed out line exists only in the runner log, which the tool never sees. The log just truncates mid-stream: no error pattern, no failed tests, no exit code. Result: the job classifies as a false 🟢 SUCCESS.

Real example: tt-metal run 27315488300 / job 80696095660bh_qb_DeepSeek_PREFILL hit the 10-min step timeout; its AI job summary reported SUCCESS.

In-container signal delivery is unreliable (the runner kills the host-side docker exec shim; pytest never receives SIGINT — verified: no KeyboardInterrupt banner in the final 14s of the log), and API reconciliation is race-prone. So the signal is made in-band: absence of a caller-written final log line.

Mechanism

The caller's test wrapper appends a completion marker as the log's final line:

[==log-finish-line==] exit_code=0

Token presence is the contract; the exit_code=N payload is optional (group 1) and is the only exit-code source for tee'd logs, which lack the GitHub Process completed with exit code line. The regex lives in analysis.yaml as log_complete_marker (default on); a caller can override it in the config it already passes to ai_summary/job.

Classification (marker absent ⇒ shell hard-killed):

Situation Final status
clean log, marker present as today; exit code from marker
clean log, marker absent, all dirs present TIMEOUT (no LLM) — the false-green case
log shows crash / tests failed, marker absent unchanged status; truncation reported as a possibly independent issue (_job.log_complete: false, report note, INCOMPLETE LOG section in LLM context)
a log dir missing, marker absent infra:partial_logs — missing dir wins over marker absence

⚠️ Rollout note (default on)

With the marker default on, any wired workflow whose wrapper does not yet emit the line will see its clean jobs flip green→TIMEOUT. Adoption is staged in tt-metal (blackhole-e2e / t3000-e2e / galaxy-deepseek-prefill / demo-sp-release); other callers must pass "log_complete_marker": null until their wrappers emit it.

Testing

  • New test_log_complete.py (extraction, anchoring, exit-code parsing, status table, report notes, config) + TestTimeoutMarkerAbsent in test_cli.py (no-LLM TIMEOUT short-circuit, missing-dir precedence, marker-disabled).
  • Full suite: 234 passed.

Follow-ups: patterns→analysis.yaml refactor (next PR), tt-metal wrapper adoption.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 11, 2026 17:03
@ppetrovicTT ppetrovicTT requested a review from a team as a code owner June 11, 2026 17:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optional “log completion marker” mechanism to detect cases where a step is hard-killed by GitHub timeout-minutes and the tee’d log truncates without any explicit timeout/error line, preventing false “SUCCESS” classifications.

Changes:

  • Introduces log_complete_marker config (default null/disabled) and propagates it through config loading and CLI into log extraction.
  • Extends extraction/status logic to treat a clean log missing the completion marker as TIMEOUT, and surfaces “INCOMPLETE LOG” / summary warnings when the marker is absent.
  • Adds a dedicated test suite covering marker extraction, status outcomes, report notes, and config default/overlay behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
.github/actions/ai_summary/tool/ai_job_summary/extract.py Adds log_complete field, parses optional completion marker, and uses marker absence to classify clean logs as TIMEOUT; adds “INCOMPLETE LOG” context section.
.github/actions/ai_summary/tool/ai_job_summary/summarize.py Adds a warning line to the markdown summary when logs are incomplete (marker missing).
.github/actions/ai_summary/tool/ai_job_summary/config/analysis.yaml Documents and defaults log_complete_marker: null in bundled config.
.github/actions/ai_summary/tool/ai_job_summary/config.py Ensures log_complete_marker defaults to None when loading config.
.github/actions/ai_summary/tool/ai_job_summary/cli.py Includes log_complete_marker in test_patterns passed to extraction and emits log_complete in JSON output.
.github/actions/ai_summary/job/README.md Documents the new optional log_complete_marker input and its behavioral implications.
.github/actions/ai_summary/tool/ai_job_summary/tests/test_log_complete.py Adds tests for marker extraction, status classification, report notes, and config behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +441 to +452
# Completion marker: the caller's wrapper appends it as the log's final
# line; absence means the shell was hard-killed (GitHub timeout-minutes),
# which leaves no trace in the tee'd log itself.
marker_pattern = (test_patterns or {}).get("log_complete_marker")
if marker_pattern:
if match := re.search(marker_pattern, full_text, re.MULTILINE):
result.log_complete = True
# group 1 (optional) = wrapped command's exit code
if match.groups() and match.group(1) is not None and match.group(1).isdigit():
result.exit_code = int(match.group(1))
else:
result.log_complete = False
Comment on lines +62 to +66
def test_marker_must_anchor_line_start(self, tmp_path):
# embedded in other output (e.g. echoed cmd) must not count
lines = CLEAN_LINES + ["echo [==log-finish-line==] exit_code=0"]
e = _extract(tmp_path, lines)
assert e.log_complete is False
vmilosevic
vmilosevic previously approved these changes Jun 15, 2026

@vmilosevic vmilosevic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppetrovicTT you can adjust codeowners for ai_summery path so you can merge faster

@ppetrovicTT ppetrovicTT force-pushed the ppetrovic/ai-summary-log-finish-line branch from 2f64ba2 to 22213ed Compare June 15, 2026 15:07
Copilot AI review requested due to automatic review settings June 17, 2026 08:36
@ppetrovicTT ppetrovicTT force-pushed the ppetrovic/ai-summary-log-finish-line branch from 22213ed to 88d708e Compare June 17, 2026 08:36
@ppetrovicTT ppetrovicTT changed the base branch from main to ppetrovic/ai-summary-detection-patterns-yaml June 17, 2026 08:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comment on lines +427 to +435
marker_pattern = (test_patterns or {}).get("log_complete_marker")
if marker_pattern:
if match := re.search(marker_pattern, full_text, re.MULTILINE):
result.log_complete = True
# group 1 (optional) = wrapped command's exit code
if match.groups() and match.group(1) is not None and match.group(1).isdigit():
result.exit_code = int(match.group(1))
else:
result.log_complete = False
Comment on lines +188 to +192
# Marker absent on an otherwise clean log = GitHub timeout-minutes kill.
# Crash/failure branches above win: truncation is then reported as a
# possibly separate issue, not as the status.
if extracted_log.log_complete is False:
return JobStatus(False, "RED", "TIMEOUT")
Comment on lines +389 to +393
if extracted_log and extracted_log.log_complete is False:
if job_status.status_text == "TIMEOUT":
md += "⚠️ Log ended without its completion marker — step killed by GitHub `timeout-minutes` (test ran too long or hung).\n"
else:
md += "⚠️ Log is incomplete — step killed by GitHub `timeout-minutes` after the failure below; possibly an independent hang in a later test.\n"
@ppetrovicTT ppetrovicTT force-pushed the ppetrovic/ai-summary-detection-patterns-yaml branch from b881057 to 2cd0ec4 Compare June 17, 2026 08:42
…rker

A step killed by GitHub timeout-minutes leaves no trace in the tee'd log
(the ##[error] line lives only in the runner log), so a timed-out job
classified as a false SUCCESS — e.g. tt-metal run 27315488300/job/80696095660.

The caller's test wrapper appends a completion marker as the final log line,
'[==log-finish-line==] exit_code=N'. Token presence is the contract; the
exit_code payload is optional (and the only exit-code source for tee'd logs,
which lack the GitHub "Process completed" line). Marker absent => the shell
was hard-killed:
  - clean log, all dirs present  => TIMEOUT (no LLM)
  - log already shows crash/failure => that status wins, truncation surfaced
    as a possibly independent issue (report note + _job.log_complete=false +
    INCOMPLETE LOG section for the LLM)
  - a missing log dir still wins over marker absence (infra:partial_logs)

The marker regex lives in analysis.yaml (log_complete_marker), default on;
callers whose wrapper does not write it must override with null.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ppetrovicTT ppetrovicTT force-pushed the ppetrovic/ai-summary-log-finish-line branch from 88d708e to d072889 Compare June 17, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants