leanprover-community · jstoobysmith · Jun 5, 2026 · Jun 5, 2026 · Jun 14, 2026 · jfindlay
diff --git a/AI-POLICY.md b/AI-POLICY.md
@@ -0,0 +1,65 @@
+# AI policy
+
+## 1. Introduction
+
+1.1. Physlib generally welcomes AI-assisted contributions.
+
+1.2. If you use an AI tool to help author a pull request to Physlib, you must follow the guidelines in this file.
+
+1.3. Failure to follow these guidelines may result in your PR being closed without comment after a brief triage. Repeated failures may result in being banned from contributing to the project. This is to protect the project and the time of the maintainers.
+
+1.4. Throughout this document, "must" denotes a hard requirement and "should" denotes a strong expectation that is not strictly enforced.
+
+1.5. You must also follow any explicit instructions in [docs/ReviewGuidelines.md](docs/ReviewGuidelines.md).
+
+1.6. These guidelines apply to any contribution where an AI tool produced more than trivial assistance.
+
+1.7. The human author is fully responsible for every line of the PR.
-1.7. The human author is fully responsible for every line of the PR.
+1.7. The human author(s) are fully responsible for every line of the PR, however it was produced.  A human must vouch that each definition, theorem statement, and proof step means what it claims.  A clean Lean build proved what was written, but only the human can certify what was written is what they meant.
-1.7. The human author is fully responsible for every line of the PR.
+1.7. The human author(s) are fully responsible for every line of the PR, however it was produced.  A human must vouch that each definition, theorem statement, and proof step means what it claims.  A clean Lean build proved what was written, but only the human can certify what was written is what they meant.
+
+## 2. Rules for the content of PRs
+
+2.1. `theorem` must only be used for well-known results in the physics literature; otherwise `lemma` must be used.
+
+2.2. Added results must be placed in sections with appropriate titles. Sections must be numbered using the scheme `# A. ...`, `## A.1. ...`, etc. See [Physlib/ClassicalMechanics/HarmonicOscillator/Basic.lean](Physlib/ClassicalMechanics/HarmonicOscillator/Basic.lean) for an example.
+
+2.3. Results must be placed in the appropriate file, with the existing library structure in mind. Do not introduce new files without good reason.
+
+2.4. Long proofs (over 50 LoC) should, where sensible, be split into smaller lemmas of general applicability.
-2.4. Long proofs (over 50 LoC) should, where sensible, be split into smaller lemmas of general applicability.
+2.4. Proof structuring
+
+These are reviewer-judgment questions, not mechanical rules. An agent **should surface a recommendation** when one of the conditions below is met and let a human make the structural call; an agent should **not** split or relocate lemmas unilaterally.
+
+A proof can mix concerns in three distinct ways — **depth**, **breadth**, and **computation/conceptual**. These can co-occur in a single proof; flag whichever discriminator fires and let a human resolve any overlap.
+
+2.4.1. Topic and layer separation
+
+**Type 1 — Depth mixing** (general plumbing inlined under a physics statement). A self-contained, lower-abstraction sub-argument doing work inside a domain proof.
+
+**Type 2 — Breadth mixing** (two physics topics at the same level). Two distinct physics topics reasoned about in one proof, where each half stands alone.
+
+**Type 3 — Computation vs. conceptual mixing.** A long mechanical calculation welded onto a structural (existence / uniqueness / limit) argument.
+
+**Discriminator table.**
+
+| Type | Discriminator | What the extracted lemma's *statement* looks like |
+|---|---|---|
+| **Depth** | Contiguous block cites only general (`Finset.*`, `ring`, Mathlib) lemmas, untouched by physics constants | Contains **zero** physics names |
+| **Breadth** | Proof cites lemmas from **two disjoint physics topic namespaces** | Each piece sits in **one** topic namespace |
+| **Computation/conceptual** | Large contiguous `calc`/`ring_nf`/`field_simp`/`positivity` slab under a structural top-level tactic | A **calculation** lemma (may keep physics constants) |
+| **None — leave inline** | Candidate isn't a contiguous subgoal-closing block; or it's one indivisible argument | n/a |
+
+> General rule: if the candidate block cannot be cleanly cut as a contiguous run that closes its own subgoal, do not recommend splitting it.
+
+2.4.2. Independent sub-results
+
+Distinct from 2.4.1 (which is about *mixing kinds* of reasoning), this asks whether the proof contains a self-contained sub-result that is *independently meaningful* — one another lemma might cite. If so, recommend promoting it to its own lemma. Weigh this against the cost: each new lemma adds a node to the dependency graph and a name to maintain. Promote when the sub-result earns its name through reuse or conceptual significance; leave it inline when it is merely an intermediate step of this one proof.
+
+2.4.3. Length as a smell
+
+Length is a prompt to ask 2.4.1 and 2.4.2, not an independent reason to split. If a proof is long *and* cannot be decomposed by the principles above flag it for a reviewer but do not manufacture artificial sub-lemmas to shorten it, especially for foundational or core-dependency results where added inter-lemma structure can introduce more fragility than the length costs in readability.
+
-2.4. Long proofs (over 50 LoC) should, where sensible, be split into smaller lemmas of general applicability.
+2.4. Proof structuring
+
+These are reviewer-judgment questions, not mechanical rules. An agent **should surface a recommendation** when one of the conditions below is met and let a human make the structural call; an agent should **not** split or relocate lemmas unilaterally.
+
+A proof can mix concerns in three distinct ways — **depth**, **breadth**, and **computation/conceptual**. These can co-occur in a single proof; flag whichever discriminator fires and let a human resolve any overlap.
+
+2.4.1. Topic and layer separation
+
+**Type 1 — Depth mixing** (general plumbing inlined under a physics statement). A self-contained, lower-abstraction sub-argument doing work inside a domain proof.
+
+**Type 2 — Breadth mixing** (two physics topics at the same level). Two distinct physics topics reasoned about in one proof, where each half stands alone.
+
+**Type 3 — Computation vs. conceptual mixing.** A long mechanical calculation welded onto a structural (existence / uniqueness / limit) argument.
+
+**Discriminator table.**
+
+| Type | Discriminator | What the extracted lemma's *statement* looks like |
+|---|---|---|
+| **Depth** | Contiguous block cites only general (`Finset.*`, `ring`, Mathlib) lemmas, untouched by physics constants | Contains **zero** physics names |
+| **Breadth** | Proof cites lemmas from **two disjoint physics topic namespaces** | Each piece sits in **one** topic namespace |
+| **Computation/conceptual** | Large contiguous `calc`/`ring_nf`/`field_simp`/`positivity` slab under a structural top-level tactic | A **calculation** lemma (may keep physics constants) |
+| **None — leave inline** | Candidate isn't a contiguous subgoal-closing block; or it's one indivisible argument | n/a |
+
+> General rule: if the candidate block cannot be cleanly cut as a contiguous run that closes its own subgoal, do not recommend splitting it.
+
+2.4.2. Independent sub-results
+
+Distinct from 2.4.1 (which is about *mixing kinds* of reasoning), this asks whether the proof contains a self-contained sub-result that is *independently meaningful* — one another lemma might cite. If so, recommend promoting it to its own lemma. Weigh this against the cost: each new lemma adds a node to the dependency graph and a name to maintain. Promote when the sub-result earns its name through reuse or conceptual significance; leave it inline when it is merely an intermediate step of this one proof.
+
+2.4.3. Length as a smell
+
+Length is a prompt to ask 2.4.1 and 2.4.2, not an independent reason to split. If a proof is long *and* cannot be decomposed by the principles above flag it for a reviewer but do not manufacture artificial sub-lemmas to shorten it, especially for foundational or core-dependency results where added inter-lemma structure can introduce more fragility than the length costs in readability.
+
+
+2.5. New lemmas must not be trivial rewrites of existing lemmas in Mathlib or Physlib, unless they add new or different physics context.
+
+2.6. All content must pass the linters described in [scripts/README.md](scripts/README.md).
+
+2.7. Contributions must not contain `axiom` declarations. ch.
-2.7. Contributions must not contain `axiom` declarations. ch.
+2.7. Contributions must not contain `axiom` declarations.
-2.7. Contributions must not contain `axiom` declarations. ch.
+2.7. Contributions must not contain `axiom` declarations.
+
+2.8. Any bibliographic references included must be verified by a human for correctness (existence of the work, accuracy of the cited statement, page numbers, etc.).
+
+## 3. Rules for the structure of the PR
+
+3.1. A pull request that adds new content (as opposed to refactoring existing content) should contain less than 250 lines of diff. Split larger contributions into multiple PRs.
-3.1. A pull request that adds new content (as opposed to refactoring existing content) should contain less than 250 lines of diff. Split larger contributions into multiple PRs.
+3.1. Pull request scope
+
+A pull request should add a **single coherent concept**. Cohesion is the governing criterion, **not** line count.
+
+Prefer small PRs whenever possible. A large PR that develops one concept to completion is preferable to a small PR that mixes two unrelated concerns. The relevant questions are:
+
+- **Is this one concept?** Every definition and lemma should either *be* that concept or supply the minimal API needed to state and prove it.
+- **Does it match the surrounding house pattern?** Where a directory has an established granularity (e.g. one concept per file, developed with its full API), conform to it.
+- **Is the code internally navigable?** A long file that has a module docstring with an overview, key results, and a table of contents provides the reviewability that splitting would otherwise provide without fragmenting one concept across multiple review cycles.
+
+**Line count is a smell, not a gate.** Physlib's per-PR length rules live in the [Review Guidelines](docs/ReviewGuidelines.md). Treat those thresholds as a prompt to *ask the cohesion questions above*, not as a hard limit:
+
+- When a PR passes the "very large" band, ask whether it is really one concept or several, and split **only along conceptual seams**.
+- The thresholds are calibrated for **new content**. Pure refactors, file reorganizations, and documentation PRs may safely run larger, since reviewer load per line is far lower.
+- An agent **should surface** an oversized-PR concern for a human to weigh; it should not split a PR unilaterally to satisfy a line count.
-3.1. A pull request that adds new content (as opposed to refactoring existing content) should contain less than 250 lines of diff. Split larger contributions into multiple PRs.
+3.1. Pull request scope
+
+A pull request should add a **single coherent concept**. Cohesion is the governing criterion, **not** line count.
+
+Prefer small PRs whenever possible. A large PR that develops one concept to completion is preferable to a small PR that mixes two unrelated concerns. The relevant questions are:
+
+- **Is this one concept?** Every definition and lemma should either *be* that concept or supply the minimal API needed to state and prove it.
+- **Does it match the surrounding house pattern?** Where a directory has an established granularity (e.g. one concept per file, developed with its full API), conform to it.
+- **Is the code internally navigable?** A long file that has a module docstring with an overview, key results, and a table of contents provides the reviewability that splitting would otherwise provide without fragmenting one concept across multiple review cycles.
+
+**Line count is a smell, not a gate.** Physlib's per-PR length rules live in the [Review Guidelines](docs/ReviewGuidelines.md). Treat those thresholds as a prompt to *ask the cohesion questions above*, not as a hard limit:
+
+- When a PR passes the "very large" band, ask whether it is really one concept or several, and split **only along conceptual seams**.
+- The thresholds are calibrated for **new content**. Pure refactors, file reorganizations, and documentation PRs may safely run larger, since reviewer load per line is far lower.
+- An agent **should surface** an oversized-PR concern for a human to weigh; it should not split a PR unilaterally to satisfy a line count.
+
+3.2. PRs should be split into atomic commits where it makes sense.
+
+3.3. Commit titles must describe the lemmas or definitions added or changed.
+
-
+
+3.4. Commit messages should include [a conventional](https://git-scm.com/docs/SubmittingPatches.html#sign-off) `Co-authored-by: [agent-name] [agent-version] <no-reply+[agent-name]@[agent-provider].[tld]>` trailer.
-
+
+3.4. Commit messages should include [a conventional](https://git-scm.com/docs/SubmittingPatches.html#sign-off) `Co-authored-by: [agent-name] [agent-version] <no-reply+[agent-name]@[agent-provider].[tld]>` trailer.
+3.4. PRs must be focused and have well-defined scope.
-3.4. PRs must be focused and have well-defined scope.
+3.4. PRs must be focused and have well-defined scope.  For new work, try to include only a single main result and only the intermediate lemmas strictly needed for it.  For revisions and refactors, keep the PR focused on a single main result.
-3.4. PRs must be focused and have well-defined scope.
+3.4. PRs must be focused and have well-defined scope.  For new work, try to include only a single main result and only the intermediate lemmas strictly needed for it.  For revisions and refactors, keep the PR focused on a single main result.
+
+## 4. Rules for the PR description
+
+4.1. The PR description must list all lemmas and definitions added or removed, the file in which each appears, and a brief explanation of each.
+
+4.2. The PR description should include a reviewer map: a brief guide indicating the order in which the reviewer should look at the changes.
+
+4.3. The PR description should state the author's expertise in the area of the PR.
+
+4.4. The PR description must disclose that an AI was used and describe the extent of its involvement.
+
+4.5. The PR description must state whether a human has checked the correctness of any included references.
+
+4.6. The PR must be concise.
+
+## 5. Rules for the review process
+
+5.1. All communication with human reviewers must be conducted by humans, not by an AI agent.
+
+5.2. If an AI agent is used to implement reviewer feedback, the author must independently verify that the feedback has been addressed correctly before requesting re-review. This must not be left to the reviewer to check.