Skip to content

Latest commit

 

History

History
82 lines (67 loc) · 5.18 KB

File metadata and controls

82 lines (67 loc) · 5.18 KB

AGENTS.md

Project purpose

This repository contains customer-facing support scripts and binaries for collecting diagnostics or applying narrow workarounds on customer VMs. Assume the operator is a customer or support engineer running the tool locally on a machine we do not control and cannot access directly.

Monorepo model

  • First-class projects live under customers/<name>/. Each project has its own AGENTS.md with verification commands, stack-specific rules, and links to CODEMAP.md / ARCHITECTURE.md as needed.
  • When changing code under a project directory, follow that project's AGENTS.md for how to build, test, and review.
  • New projects may use any stack (Go, Node, Python, shell, etc.). The root file stays policy and discovery—not a catalog of every toolchain's commands.

Cross-project contracts

  • customers/vm-troubleshooting/ (gather-info) produces diagnostic archives (manifest, report stream, triage data, schemas).
  • customers/vm-troubleshooting-dashboard/ consumes those archives for ingest and UI.
  • Authoritative compatibility and versioning rules: customers/vm-troubleshooting/SCHEMA-COMPATIBILITY.md. Producer or consumer changes that affect archive shape, schema majors, or mirrored JSON files usually belong in a coordinated change (both sides + that doc when applicable).
  • Dashboard schemas/ mirrors collector schemas/; keep them aligned per each project's AGENTS.md checklists.

Operating constraints

  • Prefer self-contained tooling with minimal external dependencies.
  • Assume common Linux distributions first: Ubuntu 20.04/22.04/24.04. Treat other distros as best-effort unless explicitly supported.
  • Tools must continue to provide useful output when some commands, drivers, services, or files are missing.
  • Never assume outbound internet access.
  • Never assume package managers are usable.
  • Never assume GPUs, kernel modules, systemd, sudo, or container runtimes are present.

Safety and privacy

  • Do not collect secrets or high-risk data.
  • Never collect private keys, tokens, passwords, shell history, environment variables, cloud credentials, SSH material, or application secrets.
  • Prefer explicit allowlists of collected files/commands over broad filesystem collection.
  • Redact sensitive values if collection is unavoidable.
  • Any remediation script must be narrowly scoped, reversible where possible, and clearly logged.

UX requirements

  • Output must be understandable by a customer and useful to support.
  • Print what is being collected and why when practical.
  • Fail per section, not all-or-nothing.
  • Exit codes must be meaningful.
  • Write deterministic output paths and filenames.
  • Prefer a single archive or directory that the customer can send back.

Implementation rules

  • Prefer read-only diagnostics.
  • For any mutating action, require explicit user intent and document risk.
  • Use timeouts for subprocesses and slow probes.
  • Handle missing commands gracefully.
  • Avoid distro-specific parsing when a portable interface exists.
  • Avoid brittle text parsing of commands whose output changes across versions.
  • Prefer machine-readable sources when available.
  • Do not silently ignore errors that affect support value; record them in output.

Verification (repository root only)

Before considering work complete, run the narrowest relevant checks for what you changed.

Root-level assets only (e.g. scripts in the repo root):

  • Bash lint: shellcheck nvidia-drm-disable-modeset.sh
  • Bash syntax: bash -n nvidia-drm-disable-modeset.sh

Anything under customers/: use that project's AGENTS.md verification section (Go, frontend, etc.).

Repo structure

  • customers/: shipped or support-facing tools and assets; each subfolder is a project with its own AGENTS.md.
  • Root scripts: focused one-off support or remediation utilities (verify with root-only commands above).
  • customers/vm-troubleshooting/: diagnostics collector (gather-info). Maps: CODEMAP.md, ARCHITECTURE.md.
  • customers/vm-troubleshooting-dashboard/: dashboard (Go API + UI). Map: CODEMAP.md.
  • docs/: planning notes and indexes (docs/README.md, docs/architecture.md point to project-local maps).

Doc maintenance (cross-cutting)

When architecture or boundaries change materially, update the relevant project orientation docs in the same change:

  • Collector: customers/vm-troubleshooting/CODEMAP.md; ARCHITECTURE.md when pipeline/types/modes narratives change.
  • Dashboard: customers/vm-troubleshooting-dashboard/CODEMAP.md when package layout or ingest/API flow changes.

For Go-based diagnostics behavior (modular collectors, timeouts, static builds, graceful skips), follow the collector project's AGENTS.md; do not duplicate those rules here.

Optional: CI at scale

If you add continuous integration, path-scoped jobs (build/test only projects whose paths changed) are a practical way to keep feedback fast as customers/ grows. This is optional operational practice, not a requirement of this repository.

Done means

A change is not done until:

  • the relevant checks pass,
  • the tool behavior is explained in customer-safe terms,
  • output remains useful on partially broken systems,
  • and no new sensitive data is collected.