feat: AI-policy by jstoobysmith · Pull Request #1149 · leanprover-community/physlib

jstoobysmith · 2026-06-05T12:51:14Z

Added an AI-policy.

This was human generated, with an LLM used to improve grammar and wording.

jfindlay

Sorry for all the comments.

jfindlay · 2026-06-15T06:06:30Z

@@ -0,0 +1,65 @@
+# AI policy


I still strongly suggest splitting the guidance into human/agent files. The human guidance can go into an existing file somewhere or stay here, but the agent instructions should go into ./AGENTS.md. For example, GPT 5 and Opus 4.8 have this to say:

how likely for a code agent to automatically find AI-POLICY.md in the root of a project without hints, guidance, or prompting?

[GPT 5] Short answer: unlikely in general — roughly 5–30% chance without hints, depending on the agent design.

[Opus 4.8] Short answer: low and highly conditional — without any hint, guidance, or prompting, a code agent is unlikely to read AI-POLICY.md as a matter of course. The probability is dominated by a few structural factors, not by the file's mere existence.

jfindlay · 2026-06-15T06:13:29Z

+3.2. PRs should be split into atomic commits where it makes sense.
+
+3.3. Commit titles must describe the lemmas or definitions added or changed.
+


Suggested change

3.4. Commit messages should include [a conventional](https://git-scm.com/docs/SubmittingPatches.html#sign-off) `Co-authored-by: [agent-name] [agent-version] <no-reply+[agent-name]@[agent-provider].[tld]>` trailer.

jfindlay · 2026-06-15T06:17:14Z

+
+3.3. Commit titles must describe the lemmas or definitions added or changed.
+
+3.4. PRs must be focused and have well-defined scope.


Suggested change

3.4. PRs must be focused and have well-defined scope.

3.4. PRs must be focused and have well-defined scope. For new work, try to include only a single main result and only the intermediate lemmas strictly needed for it. For revisions and refactors, keep the PR focused on a single main result.

jfindlay · 2026-06-15T06:23:27Z

+
+1.6. These guidelines apply to any contribution where an AI tool produced more than trivial assistance.
+
+1.7. The human author is fully responsible for every line of the PR.


[Human language]

Suggested change

1.7. The human author is fully responsible for every line of the PR.

1.7. The human author(s) are fully responsible for every line of the PR, however it was produced. A human must vouch that each definition, theorem statement, and proof step means what it claims. A clean Lean build proved what was written, but only the human can certify what was written is what they meant.

jfindlay · 2026-06-15T06:28:21Z

+
+2.2. Added results must be placed in sections with appropriate titles. Sections must be numbered using the scheme `# A. ...`, `## A.1. ...`, etc. See [Physlib/ClassicalMechanics/HarmonicOscillator/Basic.lean](Physlib/ClassicalMechanics/HarmonicOscillator/Basic.lean) for an example.
+
+2.3. Results must be placed in the appropriate file, with the existing library structure in mind. Do not introduce new files without good reason.


"appropriate" is doing a lot of work here that I feel like could be further unpacked. Physics is attractive for it's self-evident organization, but collaboration invariably finds competing taxonomies. Perhaps there is some established reference we could point to for precedent, like PDG or similar?

jfindlay · 2026-06-15T07:28:51Z

+
+2.3. Results must be placed in the appropriate file, with the existing library structure in mind. Do not introduce new files without good reason.
+
+2.4. Long proofs (over 50 LoC) should, where sensible, be split into smaller lemmas of general applicability.


[Agent language]

Suggested change

2.4. Long proofs (over 50 LoC) should, where sensible, be split into smaller lemmas of general applicability.

2.4. Proof structuring

These are reviewer-judgment questions, not mechanical rules. An agent **should surface a recommendation** when one of the conditions below is met and let a human make the structural call; an agent should **not** split or relocate lemmas unilaterally.

A proof can mix concerns in three distinct ways — **depth**, **breadth**, and **computation/conceptual**. These can co-occur in a single proof; flag whichever discriminator fires and let a human resolve any overlap.

2.4.1. Topic and layer separation

**Type 1 — Depth mixing** (general plumbing inlined under a physics statement). A self-contained, lower-abstraction sub-argument doing work inside a domain proof.

**Type 2 — Breadth mixing** (two physics topics at the same level). Two distinct physics topics reasoned about in one proof, where each half stands alone.

**Type 3 — Computation vs. conceptual mixing.** A long mechanical calculation welded onto a structural (existence / uniqueness / limit) argument.

**Discriminator table.**

| Type | Discriminator | What the extracted lemma's *statement* looks like |

|---|---|---|

| **Depth** | Contiguous block cites only general (`Finset.*`, `ring`, Mathlib) lemmas, untouched by physics constants | Contains **zero** physics names |

| **Breadth** | Proof cites lemmas from **two disjoint physics topic namespaces** | Each piece sits in **one** topic namespace |

| **Computation/conceptual** | Large contiguous `calc`/`ring_nf`/`field_simp`/`positivity` slab under a structural top-level tactic | A **calculation** lemma (may keep physics constants) |

| **None — leave inline** | Candidate isn't a contiguous subgoal-closing block; or it's one indivisible argument | n/a |

> General rule: if the candidate block cannot be cleanly cut as a contiguous run that closes its own subgoal, do not recommend splitting it.

2.4.2. Independent sub-results

Distinct from 2.4.1 (which is about *mixing kinds* of reasoning), this asks whether the proof contains a self-contained sub-result that is *independently meaningful* — one another lemma might cite. If so, recommend promoting it to its own lemma. Weigh this against the cost: each new lemma adds a node to the dependency graph and a name to maintain. Promote when the sub-result earns its name through reuse or conceptual significance; leave it inline when it is merely an intermediate step of this one proof.

2.4.3. Length as a smell

Length is a prompt to ask 2.4.1 and 2.4.2, not an independent reason to split. If a proof is long *and* cannot be decomposed by the principles above flag it for a reviewer but do not manufacture artificial sub-lemmas to shorten it, especially for foundational or core-dependency results where added inter-lemma structure can introduce more fragility than the length costs in readability.

jfindlay · 2026-06-15T07:29:22Z

+
+2.6. All content must pass the linters described in [scripts/README.md](scripts/README.md).
+
+2.7. Contributions must not contain `axiom` declarations. ch.


typo?

Suggested change

2.7. Contributions must not contain `axiom` declarations. ch.

2.7. Contributions must not contain `axiom` declarations.

jfindlay · 2026-06-15T07:49:53Z

+
+## 3. Rules for the structure of the PR
+
+3.1. A pull request that adds new content (as opposed to refactoring existing content) should contain less than 250 lines of diff. Split larger contributions into multiple PRs.


Suggested change

3.1. A pull request that adds new content (as opposed to refactoring existing content) should contain less than 250 lines of diff. Split larger contributions into multiple PRs.

3.1. Pull request scope

A pull request should add a **single coherent concept**. Cohesion is the governing criterion, **not** line count.

Prefer small PRs whenever possible. A large PR that develops one concept to completion is preferable to a small PR that mixes two unrelated concerns. The relevant questions are:

- **Is this one concept?** Every definition and lemma should either *be* that concept or supply the minimal API needed to state and prove it.

- **Does it match the surrounding house pattern?** Where a directory has an established granularity (e.g. one concept per file, developed with its full API), conform to it.

- **Is the code internally navigable?** A long file that has a module docstring with an overview, key results, and a table of contents provides the reviewability that splitting would otherwise provide without fragmenting one concept across multiple review cycles.

**Line count is a smell, not a gate.** Physlib's per-PR length rules live in the [Review Guidelines](docs/ReviewGuidelines.md). Treat those thresholds as a prompt to *ask the cohesion questions above*, not as a hard limit:

- When a PR passes the "very large" band, ask whether it is really one concept or several, and split **only along conceptual seams**.

- The thresholds are calibrated for **new content**. Pure refactors, file reorganizations, and documentation PRs may safely run larger, since reviewer load per line is far lower.

- An agent **should surface** an oversized-PR concern for a human to weigh; it should not split a PR unilaterally to satisfy a line count.

jfindlay · 2026-06-15T08:27:06Z

or, all together, this is what I would recommend: #1187.

feat: AI-policy

ceec406

jstoobysmith marked this pull request as draft June 5, 2026 12:51

typo

67bccc7

jstoobysmith added the awaiting-author A reviewer has asked the author a question or requested changes label Jun 11, 2026

Update AI-POLICY.md

8b4e9ae

jstoobysmith removed the awaiting-author A reviewer has asked the author a question or requested changes label Jun 14, 2026

jfindlay reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: AI-policy#1149

feat: AI-policy#1149
jstoobysmith wants to merge 3 commits into
leanprover-community:masterfrom
jstoobysmith:AI-Policy

jstoobysmith commented Jun 5, 2026

Uh oh!

jfindlay left a comment

Uh oh!

jfindlay Jun 15, 2026

Uh oh!

jfindlay Jun 15, 2026

Uh oh!

jfindlay Jun 15, 2026

Uh oh!

jfindlay Jun 15, 2026

Uh oh!

jfindlay Jun 15, 2026

Uh oh!

jfindlay Jun 15, 2026

Uh oh!

jfindlay Jun 15, 2026

Uh oh!

jfindlay Jun 15, 2026

Uh oh!

jfindlay commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		3.2. PRs should be split into atomic commits where it makes sense.

		3.3. Commit titles must describe the lemmas or definitions added or changed.



	3.4. Commit messages should include [a conventional](https://git-scm.com/docs/SubmittingPatches.html#sign-off) `Co-authored-by: [agent-name] [agent-version] <no-reply+[agent-name]@[agent-provider].[tld]>` trailer.


		3.3. Commit titles must describe the lemmas or definitions added or changed.

		3.4. PRs must be focused and have well-defined scope.


		1.6. These guidelines apply to any contribution where an AI tool produced more than trivial assistance.

		1.7. The human author is fully responsible for every line of the PR.

	1.7. The human author is fully responsible for every line of the PR.
	1.7. The human author(s) are fully responsible for every line of the PR, however it was produced. A human must vouch that each definition, theorem statement, and proof step means what it claims. A clean Lean build proved what was written, but only the human can certify what was written is what they meant.


		2.2. Added results must be placed in sections with appropriate titles. Sections must be numbered using the scheme `# A. ...`, `## A.1. ...`, etc. See [Physlib/ClassicalMechanics/HarmonicOscillator/Basic.lean](Physlib/ClassicalMechanics/HarmonicOscillator/Basic.lean) for an example.

		2.3. Results must be placed in the appropriate file, with the existing library structure in mind. Do not introduce new files without good reason.


		2.3. Results must be placed in the appropriate file, with the existing library structure in mind. Do not introduce new files without good reason.

		2.4. Long proofs (over 50 LoC) should, where sensible, be split into smaller lemmas of general applicability.

-.4. Long proofs (over 50 LoC) should, where sensible, be split into smaller lemmas of general applicability.
+.4. Proof structuring
+These are reviewer-judgment questions, not mechanical rules. An agent **should surface a recommendation** when one of the conditions below is met and let a human make the structural call; an agent should **not** split or relocate lemmas unilaterally.
+A proof can mix concerns in three distinct ways — **depth**, **breadth**, and **computation/conceptual**. These can co-occur in a single proof; flag whichever discriminator fires and let a human resolve any overlap.
+.4.1. Topic and layer separation
+**Type 1 — Depth mixing** (general plumbing inlined under a physics statement). A self-contained, lower-abstraction sub-argument doing work inside a domain proof.
+**Type 2 — Breadth mixing** (two physics topics at the same level). Two distinct physics topics reasoned about in one proof, where each half stands alone.
+**Type 3 — Computation vs. conceptual mixing.** A long mechanical calculation welded onto a structural (existence / uniqueness / limit) argument.
+**Discriminator table.**
+| Type | Discriminator | What the extracted lemma's *statement* looks like |
+|---|---|---|
+| **Depth** | Contiguous block cites only general (`Finset.*`, `ring`, Mathlib) lemmas, untouched by physics constants | Contains **zero** physics names |
+| **Breadth** | Proof cites lemmas from **two disjoint physics topic namespaces** | Each piece sits in **one** topic namespace |
+| **Computation/conceptual** | Large contiguous `calc`/`ring_nf`/`field_simp`/`positivity` slab under a structural top-level tactic | A **calculation** lemma (may keep physics constants) |
+| **None — leave inline** | Candidate isn't a contiguous subgoal-closing block; or it's one indivisible argument | n/a |
+> General rule: if the candidate block cannot be cleanly cut as a contiguous run that closes its own subgoal, do not recommend splitting it.
+.4.2. Independent sub-results
+Distinct from 2.4.1 (which is about *mixing kinds* of reasoning), this asks whether the proof contains a self-contained sub-result that is *independently meaningful* — one another lemma might cite. If so, recommend promoting it to its own lemma. Weigh this against the cost: each new lemma adds a node to the dependency graph and a name to maintain. Promote when the sub-result earns its name through reuse or conceptual significance; leave it inline when it is merely an intermediate step of this one proof.
+.4.3. Length as a smell
+Length is a prompt to ask 2.4.1 and 2.4.2, not an independent reason to split. If a proof is long *and* cannot be decomposed by the principles above flag it for a reviewer but do not manufacture artificial sub-lemmas to shorten it, especially for foundational or core-dependency results where added inter-lemma structure can introduce more fragility than the length costs in readability.


		2.6. All content must pass the linters described in [scripts/README.md](scripts/README.md).

		2.7. Contributions must not contain `axiom` declarations. ch.


		## 3. Rules for the structure of the PR

		3.1. A pull request that adds new content (as opposed to refactoring existing content) should contain less than 250 lines of diff. Split larger contributions into multiple PRs.

-.1. A pull request that adds new content (as opposed to refactoring existing content) should contain less than 250 lines of diff. Split larger contributions into multiple PRs.
+.1. Pull request scope
+A pull request should add a **single coherent concept**. Cohesion is the governing criterion, **not** line count.
+Prefer small PRs whenever possible. A large PR that develops one concept to completion is preferable to a small PR that mixes two unrelated concerns. The relevant questions are:
+- **Is this one concept?** Every definition and lemma should either *be* that concept or supply the minimal API needed to state and prove it.
+- **Does it match the surrounding house pattern?** Where a directory has an established granularity (e.g. one concept per file, developed with its full API), conform to it.
+- **Is the code internally navigable?** A long file that has a module docstring with an overview, key results, and a table of contents provides the reviewability that splitting would otherwise provide without fragmenting one concept across multiple review cycles.
+**Line count is a smell, not a gate.** Physlib's per-PR length rules live in the [Review Guidelines](docs/ReviewGuidelines.md). Treat those thresholds as a prompt to *ask the cohesion questions above*, not as a hard limit:
+- When a PR passes the "very large" band, ask whether it is really one concept or several, and split **only along conceptual seams**.
+- The thresholds are calibrated for **new content**. Pure refactors, file reorganizations, and documentation PRs may safely run larger, since reviewer load per line is far lower.
+- An agent **should surface** an oversized-PR concern for a human to weigh; it should not split a PR unilaterally to satisfy a line count.

Conversation

jstoobysmith commented Jun 5, 2026

Uh oh!

jfindlay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jfindlay commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants