Skip to content

feat(aws): add serverless filters for DynamoDB, CloudWatch, S3 transfer, Secrets Manager#644

Closed
JamieCressey wants to merge 6 commits intortk-ai:developfrom
JamieCressey:feat/aws-serverless-filters
Closed

feat(aws): add serverless filters for DynamoDB, CloudWatch, S3 transfer, Secrets Manager#644
JamieCressey wants to merge 6 commits intortk-ai:developfrom
JamieCressey:feat/aws-serverless-filters

Conversation

@JamieCressey
Copy link
Copy Markdown

@JamieCressey JamieCressey commented Mar 16, 2026

Summary

Adds specialized RTK filters for high-frequency AWS serverless commands, plus scoped pipe-safety to prevent breaking downstream consumers.

New filters

  • DynamoDB scan/query/get-item: Recursive type flattening strips {S/N/BOOL/NULL/L/M/SS/NS/BS} wrappers. Preserves LastEvaluatedKey (pagination token) and ConsumedCapacity (RCU cost) so LLMs know when results are truncated (~40%+ savings)
  • CloudWatch Logs filter-log-events: Timestamps include date (MM-DD HH:MM:SS), logStreamName shown per event, consecutive duplicates collapsed with [xN] counts, metadata stripped (~40%+ savings)
  • CloudWatch Logs get-query-results: Compact field=value format per row, internal @ptr field filtered out
  • S3 sync/cp: Summarize upload/download/delete/copy counts from text output, preserve error and warning lines verbatim. Pass through short output (<10 lines) unchanged (~60%+ savings)
  • Secrets Manager get-secret-value: Extract Name + SecretString only, compact-print JSON secrets, strip ARN, VersionId, VersionStages, CreatedDate (~60%+ savings)

Removed

  • Lambda invoke: Removed — no meaningful token savings possible without stripping essential debugging info (LogResult contains base64-encoded execution logs, memory/duration stats). Falls through to the generic JSON schema compressor instead.

Pipe safety

  • Scoped pipe rewriting via PIPE_UNSAFE_PREFIXES: commands like aws dynamodb scan | python are not rewritten (RTK's compressed output would break json.load()), while text-based commands like git log | grep are still rewritten for savings
  • This replaces the previous global "skip all pipe rewrites" approach with targeted per-command scoping

All new filters are added as match arms in aws_cmd::run(), following the existing pattern for STS/S3/EC2/ECS/RDS/CloudFormation. Unmatched subcommands continue to fall through to the generic JSON schema compressor.

Test plan

  • 46 unit tests for AWS filters (type flattening, filter output, edge cases, empty results, token savings)
  • 158 registry tests including pipe-safety assertions for AWS
  • Token savings assertions verified per filter
  • cargo fmt --all --check passes
  • cargo clippy --all-targets — 0 errors
  • cargo test --all — 963 tests pass, 0 failures
  • Manual test: rtk aws dynamodb scan --table-name <table>
  • Manual test: rtk aws logs filter-log-events --log-group-name <group>
  • Manual test: rtk aws s3 sync <src> <dst>
  • Manual test: aws dynamodb scan ... | python passes raw JSON through (pipe-safe)

pszymkowiak and others added 3 commits March 16, 2026 14:58
* fix: P1 exit codes, grep regex perf, SQLite concurrency

Exit code propagation (same pattern as existing modules):
- wget_cmd: run() and run_stdout() now exit on failure
- container: docker_logs, kubectl_pods/services/logs now check
  status before parsing JSON (was showing "No pods found" on error)
- pnpm_cmd: replace bail!() with eprint + process::exit in
  run_list and run_install

Performance:
- grep_cmd: compile context regex once before loop instead of
  per-line in clean_line() (was N compilations per grep call)

Data integrity:
- tracking: add PRAGMA journal_mode=WAL and busy_timeout=5000
  to prevent SQLite corruption with concurrent Claude Code instances

Signed-off-by: Patrick <patrick@rtk.ai>
Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>

* fix: address review findings on P1 fixes

- tracking: WAL pragma non-fatal (NFS/read-only compat)
- wget: forward raw stderr on failure, track raw==raw (no fake savings)
- container: remove stderr shadow in docker_logs, add empty-stderr
  guard on all 4 new exit code paths for consistency with prisma pattern

Signed-off-by: Patrick <patrick@rtk.ai>
Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>

---------

Signed-off-by: Patrick <patrick@rtk.ai>
Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
… (rtk-ai#630)

* fix: raise output caps for grep, git status, and parser fallback (rtk-ai#617, rtk-ai#618, rtk-ai#620)

- grep: per-file match cap 10 → 25, global max 50 → 200
- git status: file list caps 5/5/3 → 15/15/10
- parser fallback: truncate 500 → 2000 chars across all modules

These P0 bugs caused LLM retry loops when RTK returned less signal
than the raw command, making RTK worse than not using it.

Fixes rtk-ai#617, rtk-ai#618, rtk-ai#620

Signed-off-by: Patrick <patrick@rtk.ai>
Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>

* fix: update README example and add truncation tests for modified/untracked

- parser/README.md: update example from 500 → 2000 to match code
- git.rs: add test_format_status_modified_truncation (cap 15)
- git.rs: add test_format_status_untracked_truncation (cap 10)

Signed-off-by: Patrick <patrick@rtk.ai>
Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>

* refactor: extract output caps into [limits] config section

Move hardcoded caps into config.toml so users can tune them:

  [limits]
  grep_max_results = 200      # global grep match limit
  grep_max_per_file = 25      # per-file match limit
  status_max_files = 15       # staged/modified file list cap
  status_max_untracked = 10   # untracked file list cap
  passthrough_max_chars = 2000 # parser fallback truncation

All 8 modules now read from config::limits() instead of hardcoded
values. Defaults unchanged from previous commit.

Signed-off-by: Patrick <patrick@rtk.ai>
Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>

---------

Signed-off-by: Patrick <patrick@rtk.ai>
Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
…3 transfer, Secrets Manager

Add specialized filters for high-frequency AWS CLI commands:

- DynamoDB scan/query/get-item: recursive type flattening strips {S/N/BOOL/NULL/L/M/SS/NS/BS}
  wrappers, preserving all data with ~40%+ token savings
- CloudWatch Logs filter-log-events: timestamp truncation to HH:MM:SS, message deduplication
  with [xN] counts, metadata stripping (~50%+ savings)
- CloudWatch Logs get-query-results: compact field=value format, @ptr filtering
- Lambda invoke: extract StatusCode + FunctionError only (~60%+ savings)
- S3 sync/cp: summarize upload/download/delete counts, preserve errors verbatim (~60%+ savings)
- Secrets Manager get-secret-value: extract Name + SecretString only, compact JSON (~60%+ savings)

27 new tests covering type flattening, filter output, edge cases, and token savings.
@JamieCressey JamieCressey force-pushed the feat/aws-serverless-filters branch from 767a772 to 9a2860b Compare March 16, 2026 22:14
When a command is piped (e.g., `aws dynamodb scan | python -c 'json.load()'`),
RTK was rewriting the first segment, causing its compressed/filtered output to
break downstream programs expecting the original format.

Now piped segments are passed through unchanged. Non-piped compound segments
(&&, ||, ;) are still rewritten normally.

Fixes: aws dynamodb scan | python JSON parse error
@JamieCressey JamieCressey force-pushed the feat/aws-serverless-filters branch from 61e29eb to 3dda84c Compare March 16, 2026 22:24
@aeppling aeppling self-assigned this Mar 17, 2026
@aeppling aeppling added the enhancement New feature or request label Mar 17, 2026
@aeppling
Copy link
Copy Markdown
Contributor

Hello,

Thanks for contributing, new commands filters are always welcome !

There is things to be solved for this to be merged;

Main concern

This PR does pipe rewrite changes

The second commit changes pipe rewriting behavior in discover/registry.rs for all commands globally, not just AWS. This is not a AWS change but a core modification. Scope this in a new PR this could be a regression. Per CONTRIBUTING.md single-focus rule, this trade-off deserves its own PR with targeted handling. Same goes for the test of this feature which are located in registry.rs but aws focused

Filters review

DynamoDB scan/query

The filter drops LastEvaluatedKey. This is the pagination token. An LLM using this output has no way to know the scan was truncated at the DynamoDB page boundary. It could assume it has all the data.

Also drops ConsumedCapacity, which is minor but useful for cost debugging,

CloudWatch filter-log-events

Timestamps are truncated to HH:MM:SS, losing the date entirely. If searching across multiple days (common when debugging production incidents), you can't tell which day a log entry belongs to. At minimum keep the date: 01-15 10:30:00 or ISO short.

Also drops logStreamName, which matters when querying across multiple streams — you can't tell which Lambda invocation or container produced a given log line.

Lambda invoke

Only outputs Lambda: 200. Strips LogResult which contains the base64-encoded execution logs — START/END/REPORT lines, memory usage, billed duration, and all console.log/print output from the function. This is the primary debugging information when invoking a Lambda.

A developer running aws lambda invoke and getting back just a status code has lost the most useful part of the response. At minimum, decode and include the LogResult content (which is where the real token savings should come from — filter the decoded logs, not discard them).

Thanks again for contributing to RTK !

@aeppling
Copy link
Copy Markdown
Contributor

I'm discussing with maintainers about the filters i mentioned in the review, if we should keep those or not .

We are thinking about different level of filtering options, for now i need to review with maintainers.

- DynamoDB scan/query: preserve LastEvaluatedKey (pagination token) and
  ConsumedCapacity (RCU cost) so LLMs know when results are truncated
- CloudWatch filter-log-events: include date in timestamps (MM-DD HH:MM:SS
  instead of just HH:MM:SS) and show logStreamName per event
- Lambda invoke: remove filter entirely — no real token savings possible
  without stripping essential debugging info (LogResult). Falls through
  to generic AWS JSON compressor instead.
- Remove pipe rewrite changes from this PR (registry.rs reverted) —
  core behavior change belongs in its own PR per reviewer feedback
Instead of globally rewriting all piped commands (which breaks
`aws ... | jq`) or skipping all pipe rewrites (which misses savings
on `git log | grep`), introduce a PIPE_UNSAFE_PREFIXES list.

Commands like `aws` that transform JSON into compressed text are
not rewritten before pipes, preserving downstream consumer
compatibility. Text-based commands (git, cargo, grep, etc.) are
still rewritten for token savings.
@JamieCressey JamieCressey changed the title feat(aws): add serverless filters for DynamoDB, CloudWatch, Lambda, S3 transfer, Secrets Manager feat(aws): add serverless filters for DynamoDB, CloudWatch, S3 transfer, Secrets Manager Mar 17, 2026
@aeppling
Copy link
Copy Markdown
Contributor

This PR has many unrelated commits, can you please solve this and i'm ok with this feat ?

You could just cherry pick your commits in another PR , in this case, please tag the new PR here, i'll close this one and accept the other if it correctly introduce those cmd, as well as your PIPE_UNSAFE_PREFIXES once tested with others cmd

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 20, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ pszymkowiak
❌ JamieCressey
You have signed the CLA already but the status is still pending? Let us recheck it.

@aeppling
Copy link
Copy Markdown
Contributor

Hey

We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes src/ from a flat layout into subfolders.

No logic changes — only file moves and import path updates.

What you need to do

Rebase your branch on develop when receiving this comment:

git fetch origin && git rebase origin/develop

Git detects renames automatically. If you get import conflicts, update the paths:

use crate::git;        // now: use crate::cmds::git::git;
use crate::tracking;   // now: use crate::core::tracking;
use crate::config;     // now: use crate::core::config;
use crate::init;       // now: use crate::hooks::init;
use crate::gain;       // now: use crate::analytics::gain;

Need help rebasing? Tag @aeppling

jbronssin pushed a commit to jbronssin/rtk that referenced this pull request Mar 28, 2026
…filters

Inspired by rtk-ai#644 — cherry-picks the best ideas and integrates them
into the shared runner architecture:

- DynamoDB get-item: single-item unwrapping with ConsumedCapacity
- DynamoDB scan/query: now shows ConsumedCapacity (RCU) and pagination status
- DynamoDB N-type: try i64 first, then f64 (better precision)
- CloudWatch Logs get-query-results: field=value format, strips @ptr
- S3 sync/cp: text-based transfer summary (upload/download/delete counts)
- Secrets Manager get-secret-value: extracts Name + SecretString only

Total: 25 specialized AWS filters + generic fallback.
@jbronssin
Copy link
Copy Markdown
Contributor

Hey @JamieCressey — really solid work here. The DynamoDB type flattening, the S3 transfer summarization, and especially the pipe safety logic are all well thought out.

I opened #885 which covers a broader AWS expansion (25 filters total), and I cherry-picked several of your ideas into it:

  • The DynamoDB i64-first-then-f64 parsing for N types — much better than just f64
  • DynamoDB get-item as a separate filter
  • S3 sync/cp text summarization
  • Secrets Manager get-secret-value
  • CloudWatch get-query-results
  • ConsumedCapacity and LastEvaluatedKey display in scan/query

The main architectural difference is that #885 uses a shared runner (run_aws_filtered()) with tee-on-truncation (so truncated lists always have a [full output: ...] recovery path), whereas this PR uses the original per-handler boilerplate.

I think #885 supersedes this one given the overlap, but your pipe safety feature (PIPE_UNSAFE_PREFIXES) is something we don't have yet — that's a great idea that should be a follow-up PR on its own since it touches the registry, not just aws_cmd.

Thanks for the inspiration on this — credit where it's due.

@aeppling
Copy link
Copy Markdown
Contributor

Hello,

We are going to use this PR : #885

Which is up to date and implement more filters

Thanks for you contribution @JamieCressey !

@aeppling aeppling closed this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-changes enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants