Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 24 additions & 18 deletions docs/developer/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Technical documentation for developers working on the QuantEcon Style Guide Chec
┌─────────────────────────────────────────────────────┐
│ action.yml │
Sets up Python, installs deps, invokes action.py │
Installs uv, syncs deps, invokes action.py via uv
└──────────────────────────┬──────────────────────────┘
┌────────────┴────────────┐
Expand All @@ -32,18 +32,16 @@ Technical documentation for developers working on the QuantEcon Style Guide Chec
┌───────────────────┼──────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌─────────────┐
│prompt_loader │ │ fix_applier │ │Anthropic API│
│Load prompts │ │Apply fixes │ │(Claude) │
│Load rules │ │Validate │ └─────────────┘
└──────┬───────┘ └──────────────┘
┌──────┴───────┐
▼ ▼
┌──────────┐ ┌──────────┐
│prompts/ │ │ rules/ │
│(8 files) │ │(8 files) │
└──────────┘ └──────────┘
│ prompts/ │ │ fix_applier │ │Anthropic API│
│ prompt.md │ │Apply fixes │ │(Claude) │
│ + rules/*.md │ │Validate │ └─────────────┘
└──────────────┘ └──────────────┘
```

`categories.py` is the single source of truth for the 8 category names
(`writing`, `math`, `code`, `jax`, `figures`, `references`, `links`,
`admonitions`); every other module imports `VALID_CATEGORIES` from it.

## Two Entry Points, One Engine

- **`action.py`**: GitHub Action entry point. Reads files via GitHub API, creates PRs with fixes.
Expand Down Expand Up @@ -87,15 +85,21 @@ Key classes:
- `AnthropicProvider` — Claude API wrapper with extended thinking and streaming fallback
- `StyleReviewer` — Main review orchestrator

### Prompt Loader (`prompt_loader.py`)
### Prompt Construction (`reviewer.create_single_rule_prompt`)

Loads and combines category-specific prompts and rules:
For each rule, the reviewer builds an LLM prompt as:

```
[Category Prompt] + [Style Guide Rules] + [Lecture Content] → LLM
[Shared base prompt (prompts/prompt.md)]
+ [Single rule definition from rules/{category}-rules.md]
+ [Lecture content]
→ LLM
```

The prompt is rule-agnostic — all 8 category prompts are identical. Scope and analysis context come from the rule definitions themselves. This prevents signal dilution from category-specific instructions.
The base prompt is rule-agnostic — a single `prompts/prompt.md` file is
shared across all 8 categories. Scope and analysis context come from the
rule definitions themselves, which prevents signal dilution from
category-specific instructions.

### Fix Applier (`fix_applier.py`)

Expand Down Expand Up @@ -228,16 +232,18 @@ Depends on lecture length and violations found.
```
action-style-guide/
├── action.yml # GitHub Action definition
├── pyproject.toml # Package + dep manifest (uv-managed)
├── uv.lock # Reproducible dep lockfile
├── style_checker/ # Main package
│ ├── __init__.py # Version (__version__)
│ ├── categories.py # Single source of truth for VALID_CATEGORIES
│ ├── cli.py # Local CLI entry point (qestyle)
│ ├── action.py # GitHub Action entry point
│ ├── reviewer.py # LLM review engine (shared)
│ ├── fix_applier.py # Apply fixes to files (shared)
│ ├── github_handler.py # GitHub API (action only)
│ ├── prompt_loader.py # Load prompts + rules (shared)
│ ├── prompts/ # Minimal rule-agnostic prompts
│ └── rules/ # Category-specific rule definitions
│ ├── prompts/ # Single shared prompt.md (+ v0.6.1 archive)
│ └── rules/ # Per-category rule definitions
├── tests/ # Test suite
├── docs/ # Documentation (this site)
└── examples/ # Example workflows
Expand Down
11 changes: 5 additions & 6 deletions docs/developer/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,16 +88,15 @@ Rules are in `style_checker/rules/` and are read directly by the LLM — **no co
[Good and bad examples]
```

3. Update corresponding prompt file in `style_checker/prompts/` if needed
3. The base prompt (`prompts/prompt.md`) is shared across all categories; usually no edit needed there.
4. Test with real lecture files

### Adding a New Category

1. Create `prompts/category-prompt.md`
2. Create `rules/category-rules.md`
3. Add category to `VALID_CATEGORIES` in `github_handler.py` and `prompt_loader.py`
4. Add to category list in `review_lecture_smart()`
5. Test end-to-end
1. Create `style_checker/rules/{category}-rules.md`
2. Add the new name to `VALID_CATEGORIES` in `style_checker/categories.py`
3. Add an entry for it in `RULE_EVALUATION_ORDER` in `style_checker/reviewer.py` (the test suite will fail loudly if the keys drift)
4. Test end-to-end

## Pull Request Process

Expand Down
2 changes: 1 addition & 1 deletion docs/developer/extended-thinking.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,5 +117,5 @@ This is 40 lines vs the previous 120-line category-specific prompts, and it prod
|----------|-----------|
| `thinking_budget=10000` | Enough for careful analysis, not excessive cost |
| `temperature=1.0` | Required by Anthropic for extended thinking |
| 8 identical prompt files (for now) | Consolidation to single file planned (validated on writing, pending other categories) |
| Single shared `prompts/prompt.md` | Consolidated from 8 byte-identical files (validated on writing, then rolled out across all categories) |
| Archive v0.6.1 prompts | Reference for regression testing and comparison |
4 changes: 2 additions & 2 deletions docs/developer/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The project is in **active development**. Breaking changes are acceptable — th
### Phase 3: Test Suite Improvements ✅

- Fixed `test_parsing.py` to test real methods
- Added tests for `fix_applier.py`, `prompt_loader.py`, `reviewer.py`
- Added tests for `fix_applier.py`, `reviewer.py` (incl. RULE_EVALUATION_ORDER drift detection and prompt file existence)
- Set up CI pipeline (GitHub Actions, ruff linting, Python 3.11/3.12/3.13)

## In Progress
Expand All @@ -44,7 +44,7 @@ Focus: reduce LLM hallucinations, improve fix accuracy, move mechanical rules to
| 4.3 Deterministic Checkers | ~13 mechanical rules via regex (zero hallucination risk) | Planned |
| 4.4 Rule Clarity | Improve 12 rule descriptions to reduce misinterpretation | Planned |
| 4.5 Scope Reduction | Reduce noise from overly subjective rules | Planned |
| 4.6 Prompt Consolidation | Merge 8 identical prompt files into single `prompt.md` | Planned |
| 4.6 Prompt Consolidation | Merge 8 identical prompt files into single `prompt.md` | **Done** (PR #17) |
| 4.7 Extended Thinking | Claude reasons internally → 0% false positives | **Done** (v0.7.0) |

### Phase 5: Style Suggestion UX
Expand Down
17 changes: 7 additions & 10 deletions docs/developer/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ tests/
├── test_github_handler.py # GitHub API interaction, comment parsing
├── test_markdown_parser.py # LLM response parsing
├── test_parsing.py # Comment trigger pattern matching
├── test_prompt_loader.py # Prompt/rules file loading
├── test_reviewer.py # Rule extraction and evaluation order
├── test_reviewer.py # Rule extraction, RULE_EVALUATION_ORDER, prompt file
├── test_llm_integration.py # Real LLM API calls (@integration)
└── test_cli.py # CLI argument parsing
```
Expand All @@ -43,8 +42,7 @@ Run automatically with `pytest`:
| `test_github_handler.py` | GitHub API interaction, comment parsing |
| `test_markdown_parser.py` | LLM response parsing |
| `test_parsing.py` | Comment trigger pattern matching (real method) |
| `test_prompt_loader.py` | Prompt/rules file loading |
| `test_reviewer.py` | Rule extraction and evaluation order |
| `test_reviewer.py` | Rule extraction, RULE_EVALUATION_ORDER consistency, prompt file existence |
| `test_cli.py` | CLI argument parsing |

### Integration Tests (Slow, Costs Money)
Expand All @@ -69,15 +67,14 @@ pytest --cov=style_checker --cov-report=html
open htmlcov/index.html
```

Current coverage:
Current coverage (approximate — re-measure with `pytest --cov`):

| File | Coverage |
|------|----------|
| `fix_applier.py` | 92% |
| `prompt_loader.py` | 86% |
| `github_handler.py` | 55% |
| `reviewer.py` | 47% |
| `action.py` | 0% (needs integration mocking) |
| `fix_applier.py` | high |
| `github_handler.py` | medium |
| `reviewer.py` | medium |
| `action.py` | 0% (needs integration mocking — tracked in TECHNICAL-REVIEW §6.1) |

## CI Pipeline

Expand Down
21 changes: 21 additions & 0 deletions style_checker/categories.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""
Single source of truth for the style-rule category names.

Every category here must have a corresponding `{name}-rules.md` file in
`style_checker/rules/` and a matching entry in
Comment thread
mmcky marked this conversation as resolved.
`reviewer.RULE_EVALUATION_ORDER`. The consistency between this tuple and
`RULE_EVALUATION_ORDER.keys()` is enforced by a test in `tests/test_reviewer.py`.
"""

# Ordered — index is the default category processing order used by
# `StyleReviewer.review_lecture_smart`.
VALID_CATEGORIES = (
"writing",
"math",
"code",
"jax",
"figures",
"references",
"links",
"admonitions",
)
14 changes: 4 additions & 10 deletions style_checker/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,10 @@
from datetime import datetime

from style_checker import __version__
from style_checker.categories import VALID_CATEGORIES
from style_checker.reviewer import StyleReviewer


# All available categories (matches reviewer.RULE_EVALUATION_ORDER keys)
ALL_CATEGORIES = [
"writing", "math", "code", "jax",
"figures", "references", "links", "admonitions",
]


def display_width(s: str) -> int:
"""Calculate terminal display width, accounting for wide/emoji characters."""
w = 0
Expand Down Expand Up @@ -278,13 +272,13 @@ def main():
# Parse categories
if args.categories:
categories = [c.strip() for c in args.categories.split(",")]
invalid = [c for c in categories if c not in ALL_CATEGORIES]
invalid = [c for c in categories if c not in VALID_CATEGORIES]
if invalid:
print(f"Error: invalid categories: {', '.join(invalid)}", file=sys.stderr)
print(f"Valid categories: {', '.join(ALL_CATEGORIES)}", file=sys.stderr)
print(f"Valid categories: {', '.join(VALID_CATEGORIES)}", file=sys.stderr)
sys.exit(1)
else:
categories = list(ALL_CATEGORIES)
categories = list(VALID_CATEGORIES)

# API key
api_key = args.api_key or os.environ.get("ANTHROPIC_API_KEY")
Expand Down
12 changes: 4 additions & 8 deletions style_checker/github_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,12 @@
from github import Github, GithubException
from datetime import datetime

from .categories import VALID_CATEGORIES


class GitHubHandler:
"""Handles GitHub API interactions for PR and issue management"""

# Valid category names (must match files in style_checker/rules/)
VALID_CATEGORIES = {
'writing', 'math', 'code', 'jax',
'figures', 'references', 'links', 'admonitions'
}

def __init__(self, token: str, repository: str):
"""
Initialize GitHub handler
Expand Down Expand Up @@ -79,10 +75,10 @@ def extract_lecture_from_comment(self, comment_body: str) -> Optional[Tuple[str,

# Validate categories
if categories != ['all']:
invalid = [c for c in categories if c not in self.VALID_CATEGORIES]
invalid = [c for c in categories if c not in VALID_CATEGORIES]
if invalid:
print(f"⚠️ Invalid categories: {', '.join(invalid)}")
print(f" Valid categories: {', '.join(sorted(self.VALID_CATEGORIES))}")
print(f" Valid categories: {', '.join(sorted(VALID_CATEGORIES))}")
return None
else:
categories = ['all'] # Default to all categories
Expand Down
Loading
Loading