Skip to content

Feat/RW-1123#800

Draft
stewartshea wants to merge 15 commits into
mainfrom
feat/RW-1123
Draft

Feat/RW-1123#800
stewartshea wants to merge 15 commits into
mainfrom
feat/RW-1123

Conversation

@stewartshea
Copy link
Copy Markdown
Contributor

@stewartshea stewartshea commented May 28, 2026

  • Updated the Azure resource type registry to include new resource types such as Azure Cosmos SQL databases, MySQL flexible servers, PostgreSQL databases, and Redis caches, improving the breadth of resource discovery.
  • Refactored the Azure API indexer to support selective indexing, ensuring that only relevant resources are processed based on workspace configuration.
  • Improved error handling for Azure Service Principal configuration, providing clearer feedback when required fields are missing.
  • Enhanced documentation to clarify the new resource types and indexing logic, ensuring developers have up-to-date information on the indexing process and available resources.
  • Updated Dockerfile and Python dependencies to support the new Azure SDK features, ensuring compatibility and stability in resource management.

Note

Low Risk
Changes are limited to workflows, test harnesses, and Helm values; no production application logic in this diff.

Overview
This PR aligns RunWhen Local integration tests and CI with the Workspace Builder on port 8000 (replacing 8081), drops MkDocs cheat-sheet stamping from image build workflows, and adds live Azure azureapi indexer validation.

Port and packaging: Docker/Helm/Taskfile fixtures map host ports to 8000; chart comments describe the REST service instead of a separate cheat sheet. EKS Helm values drop cheatSheet.disabled.

CI: ado-ci-test path filters include azureapi, azure_common, and resource_writer sources. merge_to_main and pr_open no longer run sed on mkdocs.yml for version/date.

Azure backend tests: Adds diff_resource_dump.py to compare resource-dump.yaml from cloudquery vs azureapi (ignoring _cq_* metadata). multi-subscription-aks gains run-backend-equivalence-test. New .test/azure/no-aks-resources fixture (cheap Terraform RGs/storage/KV) with Taskfile CI for baseline discovery, selective per-RG LOD, excludeTags, and cross-backend equivalence via resources.sqlite and indexer log counters.

Reviewed by Cursor Bugbot for commit 39ca6e5. Bugbot is set up for automated code reviews on this repo. Configure here.

…sage and enhance documentation

- Updated CI workflows to include additional indexers for Azure API resources.
- Changed port mappings from 8081 to 8000 across various Taskfiles and documentation for consistency.
- Revised documentation to reflect the new architecture and clarify the role of the Workspace Builder and REST API.
- Removed references to the Cheat Sheet in favor of Discovery Output in documentation for improved clarity.
- Added functionality to persist workspace artifacts in the SQLite database, including a new `workspace_artifacts` table to store various rendered outputs.
- Implemented checks for the presence of workspace artifacts and SLX files during validation, improving error handling and user feedback.
- Updated documentation to include details on the new `ResourceWriter` and `Resource store query API`, enhancing clarity for developers.
- Enhanced the Workspace Explorer UI to better display indexed resources and artifacts, improving user experience.
- Refactored related code to streamline the handling of Skill overlays and artifact rendering, ensuring consistency across the application.
- Updated the `AzureResourceTypeSpec` class to support two collector methods: `collector_all` for subscription-wide listings and `collector_in_rg` for resource-group scoped listings.
- Enhanced documentation to clarify the purpose and usage of the new collector methods and the process for adding new Azure resource types.
- Implemented selective indexing logic to drop resources with an effective Level of Detail (LOD) of NONE before reaching the writer, improving efficiency and clarity in resource management.
- Added functions to extract resource-group and subscription IDs from ARM IDs, facilitating better resource scoping and management.
- Improved overall code structure and readability, ensuring compatibility with existing callers while introducing new functionality.
- Deleted the GitHub Issues documentation for requesting commands, reporting bugs, and requesting features.
- Removed the roadmap documentation that outlined project plans.
- Cleared the SUMMARY file which contained the table of contents for the documentation.
- Eliminated the introduction documentation for the User Guide.
- Deleted various image assets and diagrams that were no longer in use.
- Added new documentation for container development and high-level architecture, enhancing clarity on the project's structure and usage.
- Introduced workspace generation statistics documentation to provide insights into the resource discovery and generation process.
- Implemented support for namespace-level LODs in AKS clusters, allowing for more granular control over resource management.
- Updated the Azure resource type registry to include new resource types such as Azure Cosmos SQL databases, MySQL flexible servers, PostgreSQL databases, and Redis caches, improving the breadth of resource discovery.
- Refactored the Azure API indexer to support selective indexing, ensuring that only relevant resources are processed based on workspace configuration.
- Improved error handling for Azure Service Principal configuration, providing clearer feedback when required fields are missing.
- Enhanced documentation to clarify the new resource types and indexing logic, ensuring developers have up-to-date information on the indexing process and available resources.
- Updated Dockerfile and Python dependencies to support the new Azure SDK features, ensuring compatibility and stability in resource management.
@stewartshea stewartshea marked this pull request as ready for review May 28, 2026 18:49
- Updated Python version badge from 3.10 to 3.14.
- Reworked the project description to emphasize its role as a discovery tool for cloud and Kubernetes infrastructure, introducing the concept of "Skills."
- Restructured the Table of Contents for better navigation and clarity.
- Added detailed sections on discovery, skill tailoring, and the local explorer UI, enhancing user understanding of functionality.
- Removed outdated sections and streamlined content to focus on current features and usage.


class DiffError(Exception):
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused DiffError exception class never raised

Low Severity

DiffError is defined but never raised or referenced anywhere in diff_resource_dump.py. It appears to be leftover scaffolding from an earlier design where errors were raised rather than collected into the differences list.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0ed9cf5. Configure here.

Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	src/VERSION
@stewartshea stewartshea marked this pull request as draft May 28, 2026 19:12
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 39ca6e5. Configure here.

fi
done
# Verify the indexer logged that selective discovery actually ran.
if grep -qE "selective discovery, in-scope RGs" run_sh_output.log container_logs.log 2>/dev/null; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions grep missing container_logs.log file

Medium Severity

The assert-selective and assert-tag-filter tasks grep both run_sh_output.log and container_logs.log for required indexer log patterns (e.g. selective discovery, skipped_tag_filter, skipped_lod_filter). However, the run-rwl-discovery task never creates container_logs.log — unlike the ADO fixture which explicitly runs docker logs "$CONTAINER_NAME" > container_logs.log 2>&1. If the indexer summary line is emitted by the container's main process rather than the exec'd run.sh, it won't appear in run_sh_output.log, and the assertion will incorrectly fail because container_logs.log doesn't exist.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 39ca6e5. Configure here.

stewartshea and others added 8 commits May 28, 2026 19:37
…S smoke workflow

- Have every workspaceInfo.yaml emitted by .test/azure/aks-and-k8s/,
  .test/azure/aks-helm-installed-mi/, and .test/azure/aks-helm-installed-sp/
  set 'azureIndexerBackend: azureapi' so the existing AKS/helm matrices
  exercise the native indexer instead of the legacy CloudQuery path.
- Make .test/azure/no-aks-resources/'s build-rwl tasks use a relative
  Dockerfile path so they work in CI runners as well as the dev container.
- Add .github/workflows/test-azure-indexer.yaml: a single-job, AKS-free
  smoke test that provisions the no-aks-resources terraform fixture,
  runs the existing ci-test-azureapi-baseline / -selective / -tag-filter
  tasks, then tears the infra down. Triggers on src/** + fixture changes
  and is also workflow_dispatch-able for ad-hoc verification.

Co-authored-by: Cursor <cursoragent@cursor.com>
… invocation in no-aks asserts

- src/indexers/azureapi_resource_types.py: the subscription-wide
  `_collect_redis_caches_all` collector was calling
  `RedisManagementClient.redis.list()`, which doesn't exist on
  `RedisOperations`. Switch to `list_by_subscription()`, the actual
  pager exposed by current azure-mgmt-redis. This was surfacing as a
  non-fatal "no attribute 'list'" warning during full-coverage runs.
- .test/azure/no-aks-resources/Taskfile.yaml: the `assert-baseline`,
  `assert-selective`, `assert-tag-filter`, and `generate-selective-config`
  tasks were running `terraform show -json terraform/terraform.tfstate`
  from the test root, but `terraform init` only ran inside ./terraform/.
  Without the provider plugin cache in the working dir, terraform aborts
  with "Failed to load plugin schemas" and jq errors out parsing the
  empty stdout. Switch to `terraform -chdir=terraform show -json
  terraform.tfstate` and capture the JSON once per task instead of
  re-shelling out per output. Discovery itself was already passing
  end-to-end (76 SLXs against the new azureapi indexer); this fixes the
  assertion step in the new Discovery Azure Indexer Tests workflow.

Co-authored-by: Cursor <cursoragent@cursor.com>
The Azure SDK indexer was deciding which typed (rich-payload) collectors
to invoke by walking AZURE_RESOURCE_TYPE_SPECS and checking whether each
spec's resource_type_name or cloudquery_table_name appeared in the set
of names referenced by loaded gen rules. That misses any gen rule that
references a registered *alias* of a typed spec.

Concretely: contrib gen rules reference `azure_keyvault_keyvault`, but
the Key Vault typed spec is canonicalized as `azure_keyvault_vaults`
(rwl alias) with `azure_keyvault_keyvaults` as its CQ table. Neither
matches, so the typed Key Vault collector was being skipped, and Key
Vault resources never landed in the resource store. The same pattern
silently affected any other aliased typed type.

Refactor the selection loop to dispatch every accessed name through
``find_spec`` (which already canonicalizes aliases) and bucket the
result as either typed or generic. Mandatory typed specs (today just
resource_group) are still seeded up-front. Behaviour for non-aliased
gen-rule references is unchanged.

Also captures docker-side indexer stdout into container_logs.log in
.test/azure/no-aks-resources/Taskfile.yaml so the assert-* tasks have
a fresh, deterministic file to grep for "selective discovery" /
"skipped_*_filter" markers (the FastAPI process emits those to docker
stdout, not to run_sh_output.log).

Locally, all three end-to-end scenarios now pass against live infra:
  task ci-test-azureapi-baseline    -> 5/5 resources (incl. KeyVault)
  task ci-test-azureapi-selective   -> keep present, drop absent, mode logged
  task ci-test-azureapi-tag-filter  -> keep present, drop absent, skipped>0

Co-authored-by: Cursor <cursoragent@cursor.com>
The check-and-cleanup-terraform wrapper used `tee /dev/tty` to mirror the
infra-check output to the human running the task. GitHub Actions runners
have no controlling tty, so the pipe failed with `tee: /dev/tty: No such
device or address`, the wrapper exited non-zero, and `terraform destroy`
never ran. The job was left red despite all three azureapi assertions
passing, and Azure resources from the run were leaked.

Replace `| tee /dev/tty` with `echo "$out"` after capturing the check
output, which works in both interactive shells and CI.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Added a section in the README to introduce the built-in Model Context Protocol (MCP) server, detailing its functionality and usage for AI agents.
- Updated the documentation structure to include references to the MCP server in relevant sections, improving discoverability.
- Introduced a lifespan management for the MCP server in the FastAPI application, allowing for better integration and control over its lifecycle.
- Added new dependencies related to the MCP server in the poetry.lock and pyproject.toml files, ensuring compatibility with the latest features.
- Introduced documentation for the GCP indexer, detailing its functionality and configuration options, including the new `gcpIndexerBackend` settings.
- Updated the `README.md` to include a link to the GCP indexer internals.
- Modified the component initialization in `component.py` to include the `gcpapi` indexer.
- Enhanced the `run.py` and `run.sh` scripts to support the new GCP indexer backend configuration.
- Updated dependencies in `pyproject.toml` and `poetry.lock` to include necessary Google Cloud libraries.
- Introduced the native AWS indexer (`awsapi`) alongside the existing CloudQuery-backed indexer, allowing for more direct resource discovery using the AWS Cloud Control API and `boto3` SDK.
- Updated the `aws.md` documentation to detail the new indexing options and configuration in `workspaceInfo.yaml`.
- Enhanced the `Taskfile.yaml` with new tasks for generating baseline configurations and asserting AWS resource discovery.
- Modified component initialization to include the `awsapi` indexer and updated relevant scripts to support the new backend configuration.
- Added links to AWS indexer internals in the architecture documentation for better discoverability.
- Improved handling of AWS credentials in the indexing process to support both file paths and inline content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant