Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
1735edc
Update CI workflows and Taskfile configurations to standardize port u…
stewartshea May 26, 2026
895a5cb
Enhance workspace artifact management and documentation
stewartshea May 27, 2026
3f9bf76
Refactor Azure resource-type specs and enhance selective indexing
stewartshea May 28, 2026
7aa81af
Remove outdated documentation and assets
stewartshea May 28, 2026
c9c16e7
Enhance Azure resource indexing and documentation
stewartshea May 28, 2026
0ed9cf5
Revise README for clarity and feature updates
stewartshea May 28, 2026
39ca6e5
Merge remote-tracking branch 'origin/main' into feat/RW-1123
stewartshea May 28, 2026
3b21328
test(azure): switch standard fixtures to azureapi indexer + add no-AK…
stewartshea May 28, 2026
51f5477
fix(azureapi): use list_by_subscription for Redis + correct terraform…
stewartshea May 28, 2026
85f09f2
fix(azureapi): resolve typed-spec selection through registry aliases
stewartshea May 28, 2026
7d61f23
fix(azureapi-test): drop tee /dev/tty in cleanup wrapper
stewartshea May 29, 2026
e10baa9
remove uplaod test until it is refactored after crdless
stewartshea May 29, 2026
e845126
Enhance README and documentation for built-in MCP server
stewartshea May 29, 2026
84f2646
Add GCP indexer documentation and update component initialization
stewartshea May 29, 2026
d12fdca
Add AWS indexer support and documentation
stewartshea May 29, 2026
d5bec7c
Enhance GCP indexer functionality and documentation
stewartshea May 29, 2026
f8c3ba2
Refactor Azure DevOps SDK imports for lazy loading
stewartshea May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions .cursor/rules/discovery-indexers.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
description: Contract for RunWhen Local discovery indexers (native SDK + CloudQuery). Enforces registry parity, collector registration, normalizer shape, and docs/test/generation-rule definition of done.
globs: src/indexers/**,src/enrichers/azure.py,src/enrichers/gcp.py,src/enrichers/aws.py,scripts/azure/**,scripts/gcp/**
alwaysApply: false
---

# Discovery indexer contract

RunWhen Local discovers resources via native SDK indexers (`azureapi`,
`gcpapi`, future `awsapi`) plus the legacy `cloudquery` indexer. The native
indexers exist to remove the CloudQuery dependency entirely. When editing any
indexer, enforce every rule below.

For full workflows use the `extend-discovery-type` and `add-discovery-type`
skills. Worked examples: `docs/architecture/{azure,gcp}-indexer-internals.md`.

## CloudQuery parity is the public contract

- The **CloudQuery table name** (e.g. `gcp_compute_instances`) is the
`resource_type` used in generation rules. Native indexers MUST normalize
payloads into the same CloudQuery-shaped dict so rules don't change when the
backend flips.
- Every CloudQuery table for a platform MUST resolve via the registry by its
canonical name or a registered alias. Don't drop parity to simplify.

## Generated registry — never hand-edit

`src/indexers/*_resource_type_registry.yaml` is GENERATED. To change a mapping:

1. Edit the overrides YAML (`scripts/<platform>/<platform>_resource_type_overrides.yaml`).
2. Re-run `scripts/<platform>/sync_<platform>_resource_type_registry.py`.
3. Commit both the overrides and the regenerated YAML; the sync `--dry-run`
must report 0 drift.

Use `null` for the native type of tables with no API equivalent so the generic
pass skips them.

## Typed vs generic collectors

- The **generic collector** (Azure `resources.list()`, GCP Cloud Asset
Inventory, AWS Cloud Control) provides broad parity.
- **Typed collectors** are enrichment for high-value types. Register them in
`_TYPED_COLLECTORS` keyed by the canonical CloudQuery table name. A typed type
is automatically excluded from the generic pass — write each resource once.
- Lazy-import SDK clients inside the collector so importing the module never
hard-requires the SDK.

## Discovery is selective + anchored

- Collect only types referenced by loaded generation rules, plus the mandatory
anchor (GCP `gcp_projects`, Azure `azure_resources_resource_groups`).
- Emit the anchor first so child resources link to it at parse time.
- Respect Level-of-Detail: scopes resolving to `none` are skipped entirely.

## Normalizer shape

Every resource flows: `normalize -> tag filter -> handler.parse_resource_data
-> writer.add_resource`. Normalizers MUST map cloud labels/tags into the
canonical `tags` field and preserve the qualifier fields the tag/hierarchy
templates expect. Persist only via the `ResourceWriter`, never directly.

## Definition of done (every indexer change)

A change is incomplete until ALL of these are updated together:

- [ ] Code + registry (regenerated, not hand-edited)
- [ ] Unit tests: registry loader, normalizer+handler round-trip, selective dispatch
- [ ] Docs: `docs/authoring/indexed-resources/<platform>.md` and the indexer-internals doc
- [ ] Generation-rule contract verified: the type resolves by CloudQuery table
name and the normalized dict carries the fields the templates need
- [ ] `cd src && python -m unittest discover -s indexers -p 'test_*.py'` passes
38 changes: 38 additions & 0 deletions .cursor/rules/resource-type-registry.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
description: Guardrail for generated resource-type registry YAMLs and their sync scripts. Edit overrides + re-run sync; never hand-edit the generated file.
globs: src/indexers/*_resource_type_registry.yaml,scripts/azure/*_overrides.yaml,scripts/gcp/*_overrides.yaml,scripts/azure/sync_*.py,scripts/gcp/sync_*.py,scripts/azure/*_cloudquery_tables.txt,scripts/gcp/*_cloudquery_tables.txt
alwaysApply: false
---

# Resource-type registry is generated

`src/indexers/*_resource_type_registry.yaml` is produced by
`scripts/<platform>/sync_<platform>_resource_type_registry.py` from three
inputs. **Do not hand-edit the generated YAML.**

## To change a mapping

1. Edit the right input:
- **Native-type wrong / missing** -> `*_overrides.yaml`
(`arm_type_overrides` for Azure, `cai_type_overrides` for GCP; use `null`
when there is no native API equivalent).
- **Alternate name** -> `aliases` in the overrides.
- **Rich payload wanted** -> add the table to `typed_collectors`.
- **Service -> API host remap** -> `service_api_hosts` (GCP).
- **New parity tables** -> refresh the `*_cloudquery_tables.txt` snapshot
from the CloudQuery hub (record source + version in its header).
2. Regenerate:

```bash
python scripts/gcp/sync_gcp_resource_type_registry.py # or azure/...
```

3. Review the `added / removed / changed` summary and the YAML diff; commit the
overrides AND the regenerated YAML together.

## Parity invariant

Every CloudQuery table for the platform must resolve via the registry by
canonical name or alias. The sync `--dry-run` must report 0 drift after your
change. Removing a table from parity requires an explicit reason, not
convenience.
158 changes: 158 additions & 0 deletions .cursor/skills/add-discovery-type/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
name: add-discovery-type
description: >-
Stand up a brand-new RunWhen Local discovery platform / indexer from scratch
(a new cloud or resource source) following the native SDK pattern used by
azureapi and gcpapi. Use when asked to add a new platform, new cloud provider,
or new top-level discovery source. Covers the parity registry, indexer
modules, platform handler, tag/hierarchy templates, pipeline wiring, tests,
and docs.
---

# Add a new discovery type

Use this to add a **new platform** (a new cloud provider or discovery source).
To add a single resource type to an existing indexer, use the
`extend-discovery-type` skill instead.

`gcpapi` is the most recent end-to-end example — read
`docs/architecture/gcp-indexer-internals.md` first; it documents every piece you
are about to replicate. `azureapi` is the second reference. The mandate is to
build native SDK indexers so CloudQuery can eventually be removed entirely.

## Architecture (what you are building)

```
scripts/<platform>/
├── <platform>_cloudquery_tables.txt # parity source (CloudQuery hub table list)
├── <platform>_resource_type_overrides.yaml # hand-curated overrides
└── sync_<platform>_resource_type_registry.py # registry generator

src/indexers/
├── <platform>_resource_type_registry.yaml # GENERATED catalog (never hand-edit)
├── <platform>_resource_type_registry.py # read-only loader
├── <platform>_common.py # credentials + scope resolution + tag filters
├── <platform>api_normalizers.py # raw SDK/API -> CloudQuery-shaped dict
├── <platform>api_resource_types.py # generic collector + typed collectors + specs
├── <platform>api.py # orchestration loop
└── test_<platform>*.py # unit tests

src/enrichers/<platform>.py # PlatformHandler.parse_resource_data
src/templates/<platform>-tags.yaml # tag template
src/templates/<platform>-hierarchy.yaml # hierarchy template
docs/architecture/<platform>-indexer-internals.md
docs/authoring/indexed-resources/<platform>.md
```

## Core principles

1. **Parity first.** Seed the registry from the upstream CloudQuery plugin's
table list (use the CloudQuery hub, not the old in-container plugin version).
Every CloudQuery table must resolve via the registry by canonical name/alias.
2. **CloudQuery table name is the public contract.** Generation rules reference
it as `resource_type`. The native indexer normalizes payloads into the same
shape so rules don't change when the backend flips.
3. **Generic collector is the workhorse**, typed collectors are enrichment. Pick
the broadest first-party API for the generic pass (Azure
`resources.list()`, GCP Cloud Asset Inventory, AWS Cloud Control API).
4. **Selective, generation-rule-driven discovery.** Only collect types
referenced by loaded generation rules (plus the mandatory anchor), and
respect Level-of-Detail scoping.
5. **Never hand-edit the generated registry YAML.**

## Workflow

```
- [ ] 1. Obtain upstream CloudQuery table list -> scripts/<p>/<p>_cloudquery_tables.txt
- [ ] 2. Author the overrides YAML (type remaps, aliases, typed flags, anchor, nulls)
- [ ] 3. Write the sync script (heuristic table-name -> native-type mapping)
- [ ] 4. Generate the registry YAML + write the read-only loader
- [ ] 5. Add SDK deps to pyproject + regenerate lock in python:3.14-slim
- [ ] 6. Write <p>_common.py (credentials, scope/LOD, tag filters)
- [ ] 7. Write <p>api_normalizers.py (raw -> CloudQuery-shaped dict)
- [ ] 8. Write <p>api_resource_types.py (generic collector + typed + specs)
- [ ] 9. Write <p>api.py orchestrator (phases: bootstrap -> anchors -> typed -> generic)
- [ ] 10. Add/confirm the enricher PlatformHandler.parse_resource_data
- [ ] 11. Add src/templates/<p>-tags.yaml + <p>-hierarchy.yaml
- [ ] 12. Wire pipeline: component.py, run.sh, run.py setting, cloudquery skip
- [ ] 13. Tests: registry + normalizers + selective
- [ ] 14. Docs: indexer-internals + indexed-resources + READMEs
- [ ] 15. Run the full indexer test suite
```

### 1–4. Registry pipeline

Mirror `scripts/gcp/`:

- **Table list**: scrape the CloudQuery hub plugin table reference into a
`.txt` with a header noting source + version. This is the parity source.
- **Overrides**: a YAML with native-type remaps, `aliases`, `typed_collectors`,
the **mandatory anchor** type (e.g. GCP `gcp_projects`, Azure
`azure_resources_resource_groups`), and `null` native types for tables with
no API equivalent.
- **Sync script**: heuristic `<plugin>_<service>_<entity_plural>` ->
`<host>/<EntitySingularPascal>`, plus override application. Copy
`sync_gcp_resource_type_registry.py` and adapt the heuristic.
- **Loader** (`<platform>_resource_type_registry.py`): copy the GCP loader;
provide `load_registry`, `find`, and a `find_by_<native>_type` lookup.

### 6–9. Indexer modules

- **`<platform>_common.py`**: resolve credentials (K8s secret, inline key, ADC /
env) and the account/project/subscription scope; provide `has_included_tags` /
`has_excluded_tags` label filters.
- **`<platform>api_normalizers.py`**: convert raw SDK/API objects to the
CloudQuery-shaped dict the handler expects; hoist the full payload, map cloud
labels/tags to `tags`, normalize names/IDs/regions. Include a
`make_<anchor>_resource_data` helper for the synthesized anchor.
- **`<platform>api_resource_types.py`**: a `*ResourceTypeSpec` dataclass, the
generic collector over the broad API, typed collectors (lazy-import SDK
clients), `_TYPED_COLLECTORS` keyed by canonical table name, and
`find_spec` / `find_spec_by_<native>_type`.
- **`<platform>api.py`** `index(ctx)` phases:
1. Bootstrap: check `<platform>IndexerBackend`, resolve creds + scope, mirror
into the enricher, resolve LOD (skip scopes with LOD `none`).
2. Phase 0: emit the synthesized **anchor** resource first so children link to
it at parse time.
3. Phase 1: typed collectors for accessed types that have one.
4. Phase 2: one generic pass per scope, filtered to accessed native types not
already covered by a typed collector.
Every resource flows: normalize -> tag filter -> `handler.parse_resource_data`
-> `writer.add_resource`.

### 10–11. Handler + templates

- Confirm/extend `src/enrichers/<platform>.py` `parse_resource_data` consumes
the normalized dict and links children to the anchor.
- Add `src/templates/<platform>-tags.yaml` and `-hierarchy.yaml` following the
**`template-tags-hierarchy` rule** exactly (hierarchy ends with
`resource_name`; emit `platform`, `resource_type`, `resource_name`,
`child_resource`; specificity-ordered `resource_name`; dedup loop).

### 12. Pipeline wiring

- `src/component.py`: add `"<platform>api"` to the `INDEXER` stage.
- `src/run.sh`: include `<platform>api` in `COMPONENTS`.
- `src/run.py`: coalesce `<platform>IndexerBackend` from `workspaceInfo` or
`WB_<PLATFORM>_INDEXER_BACKEND` into request data.
- `src/indexers/cloudquery.py`: skip this platform when the native backend is
selected (mutual exclusion).

### 13–14. Tests + docs (required)

- Tests mirroring `test_gcp*.py`: registry loader contract, normalizer +
handler round-trip, selective discovery + generic-pass filter dispatch.
- `docs/architecture/<platform>-indexer-internals.md` (link it from
`docs/architecture/README.md`).
- `docs/authoring/indexed-resources/<platform>.md` documenting both backends,
the matchable types, and the stable generation-rule contract.
- Update top-level `README.md` / docs indexes if the platform is newly listed.

### 15. Verify

```bash
cd src && python -m unittest discover -s indexers -p 'test_*.py'
python scripts/<platform>/sync_<platform>_resource_type_registry.py --dry-run
```

The sync dry-run must report 0 drift (registry matches overrides + table list).
Loading
Loading