feat(aws): add serverless filters for DynamoDB, CloudWatch, S3 transfer, Secrets Manager#644
feat(aws): add serverless filters for DynamoDB, CloudWatch, S3 transfer, Secrets Manager#644JamieCressey wants to merge 6 commits intortk-ai:developfrom
Conversation
* fix: P1 exit codes, grep regex perf, SQLite concurrency Exit code propagation (same pattern as existing modules): - wget_cmd: run() and run_stdout() now exit on failure - container: docker_logs, kubectl_pods/services/logs now check status before parsing JSON (was showing "No pods found" on error) - pnpm_cmd: replace bail!() with eprint + process::exit in run_list and run_install Performance: - grep_cmd: compile context regex once before loop instead of per-line in clean_line() (was N compilations per grep call) Data integrity: - tracking: add PRAGMA journal_mode=WAL and busy_timeout=5000 to prevent SQLite corruption with concurrent Claude Code instances Signed-off-by: Patrick <patrick@rtk.ai> Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu> * fix: address review findings on P1 fixes - tracking: WAL pragma non-fatal (NFS/read-only compat) - wget: forward raw stderr on failure, track raw==raw (no fake savings) - container: remove stderr shadow in docker_logs, add empty-stderr guard on all 4 new exit code paths for consistency with prisma pattern Signed-off-by: Patrick <patrick@rtk.ai> Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu> --------- Signed-off-by: Patrick <patrick@rtk.ai> Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
… (rtk-ai#630) * fix: raise output caps for grep, git status, and parser fallback (rtk-ai#617, rtk-ai#618, rtk-ai#620) - grep: per-file match cap 10 → 25, global max 50 → 200 - git status: file list caps 5/5/3 → 15/15/10 - parser fallback: truncate 500 → 2000 chars across all modules These P0 bugs caused LLM retry loops when RTK returned less signal than the raw command, making RTK worse than not using it. Fixes rtk-ai#617, rtk-ai#618, rtk-ai#620 Signed-off-by: Patrick <patrick@rtk.ai> Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu> * fix: update README example and add truncation tests for modified/untracked - parser/README.md: update example from 500 → 2000 to match code - git.rs: add test_format_status_modified_truncation (cap 15) - git.rs: add test_format_status_untracked_truncation (cap 10) Signed-off-by: Patrick <patrick@rtk.ai> Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu> * refactor: extract output caps into [limits] config section Move hardcoded caps into config.toml so users can tune them: [limits] grep_max_results = 200 # global grep match limit grep_max_per_file = 25 # per-file match limit status_max_files = 15 # staged/modified file list cap status_max_untracked = 10 # untracked file list cap passthrough_max_chars = 2000 # parser fallback truncation All 8 modules now read from config::limits() instead of hardcoded values. Defaults unchanged from previous commit. Signed-off-by: Patrick <patrick@rtk.ai> Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu> --------- Signed-off-by: Patrick <patrick@rtk.ai> Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
…3 transfer, Secrets Manager
Add specialized filters for high-frequency AWS CLI commands:
- DynamoDB scan/query/get-item: recursive type flattening strips {S/N/BOOL/NULL/L/M/SS/NS/BS}
wrappers, preserving all data with ~40%+ token savings
- CloudWatch Logs filter-log-events: timestamp truncation to HH:MM:SS, message deduplication
with [xN] counts, metadata stripping (~50%+ savings)
- CloudWatch Logs get-query-results: compact field=value format, @ptr filtering
- Lambda invoke: extract StatusCode + FunctionError only (~60%+ savings)
- S3 sync/cp: summarize upload/download/delete counts, preserve errors verbatim (~60%+ savings)
- Secrets Manager get-secret-value: extract Name + SecretString only, compact JSON (~60%+ savings)
27 new tests covering type flattening, filter output, edge cases, and token savings.
767a772 to
9a2860b
Compare
When a command is piped (e.g., `aws dynamodb scan | python -c 'json.load()'`), RTK was rewriting the first segment, causing its compressed/filtered output to break downstream programs expecting the original format. Now piped segments are passed through unchanged. Non-piped compound segments (&&, ||, ;) are still rewritten normally. Fixes: aws dynamodb scan | python JSON parse error
61e29eb to
3dda84c
Compare
|
Hello, Thanks for contributing, new commands filters are always welcome ! There is things to be solved for this to be merged; Main concernThis PR does pipe rewrite changes The second commit changes pipe rewriting behavior in discover/registry.rs for all commands globally, not just AWS. This is not a AWS change but a core modification. Scope this in a new PR this could be a regression. Per CONTRIBUTING.md single-focus rule, this trade-off deserves its own PR with targeted handling. Same goes for the test of this feature which are located in registry.rs but aws focused Filters reviewDynamoDB scan/queryThe filter drops LastEvaluatedKey. This is the pagination token. An LLM using this output has no way to know the scan was truncated at the DynamoDB page boundary. It could assume it has all the data. Also drops ConsumedCapacity, which is minor but useful for cost debugging, CloudWatch filter-log-eventsTimestamps are truncated to HH:MM:SS, losing the date entirely. If searching across multiple days (common when debugging production incidents), you can't tell which day a log entry belongs to. At minimum keep the date: 01-15 10:30:00 or ISO short. Also drops logStreamName, which matters when querying across multiple streams — you can't tell which Lambda invocation or container produced a given log line. Lambda invokeOnly outputs Lambda: 200. Strips LogResult which contains the base64-encoded execution logs — START/END/REPORT lines, memory usage, billed duration, and all console.log/print output from the function. This is the primary debugging information when invoking a Lambda. A developer running aws lambda invoke and getting back just a status code has lost the most useful part of the response. At minimum, decode and include the LogResult content (which is where the real token savings should come from — filter the decoded logs, not discard them). Thanks again for contributing to RTK ! |
|
I'm discussing with maintainers about the filters i mentioned in the review, if we should keep those or not . We are thinking about different level of filtering options, for now i need to review with maintainers. |
- DynamoDB scan/query: preserve LastEvaluatedKey (pagination token) and ConsumedCapacity (RCU cost) so LLMs know when results are truncated - CloudWatch filter-log-events: include date in timestamps (MM-DD HH:MM:SS instead of just HH:MM:SS) and show logStreamName per event - Lambda invoke: remove filter entirely — no real token savings possible without stripping essential debugging info (LogResult). Falls through to generic AWS JSON compressor instead. - Remove pipe rewrite changes from this PR (registry.rs reverted) — core behavior change belongs in its own PR per reviewer feedback
Instead of globally rewriting all piped commands (which breaks `aws ... | jq`) or skipping all pipe rewrites (which misses savings on `git log | grep`), introduce a PIPE_UNSAFE_PREFIXES list. Commands like `aws` that transform JSON into compressed text are not rewritten before pipes, preserving downstream consumer compatibility. Text-based commands (git, cargo, grep, etc.) are still rewritten for token savings.
|
This PR has many unrelated commits, can you please solve this and i'm ok with this feat ? You could just cherry pick your commits in another PR , in this case, please tag the new PR here, i'll close this one and accept the other if it correctly introduce those cmd, as well as your PIPE_UNSAFE_PREFIXES once tested with others cmd |
|
|
|
Hey We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes No logic changes — only file moves and import path updates. What you need to doRebase your branch on git fetch origin && git rebase origin/developGit detects renames automatically. If you get import conflicts, update the paths: use crate::git; // now: use crate::cmds::git::git;
use crate::tracking; // now: use crate::core::tracking;
use crate::config; // now: use crate::core::config;
use crate::init; // now: use crate::hooks::init;
use crate::gain; // now: use crate::analytics::gain;Need help rebasing? Tag @aeppling |
…filters Inspired by rtk-ai#644 — cherry-picks the best ideas and integrates them into the shared runner architecture: - DynamoDB get-item: single-item unwrapping with ConsumedCapacity - DynamoDB scan/query: now shows ConsumedCapacity (RCU) and pagination status - DynamoDB N-type: try i64 first, then f64 (better precision) - CloudWatch Logs get-query-results: field=value format, strips @ptr - S3 sync/cp: text-based transfer summary (upload/download/delete counts) - Secrets Manager get-secret-value: extracts Name + SecretString only Total: 25 specialized AWS filters + generic fallback.
|
Hey @JamieCressey — really solid work here. The DynamoDB type flattening, the S3 transfer summarization, and especially the pipe safety logic are all well thought out. I opened #885 which covers a broader AWS expansion (25 filters total), and I cherry-picked several of your ideas into it:
The main architectural difference is that #885 uses a shared runner ( I think #885 supersedes this one given the overlap, but your pipe safety feature ( Thanks for the inspiration on this — credit where it's due. |
|
Hello, We are going to use this PR : #885 Which is up to date and implement more filters Thanks for you contribution @JamieCressey ! |
Summary
Adds specialized RTK filters for high-frequency AWS serverless commands, plus scoped pipe-safety to prevent breaking downstream consumers.
New filters
{S/N/BOOL/NULL/L/M/SS/NS/BS}wrappers. PreservesLastEvaluatedKey(pagination token) andConsumedCapacity(RCU cost) so LLMs know when results are truncated (~40%+ savings)MM-DD HH:MM:SS),logStreamNameshown per event, consecutive duplicates collapsed with[xN]counts, metadata stripped (~40%+ savings)field=valueformat per row, internal@ptrfield filtered outName+SecretStringonly, compact-print JSON secrets, stripARN,VersionId,VersionStages,CreatedDate(~60%+ savings)Removed
LogResultcontains base64-encoded execution logs, memory/duration stats). Falls through to the generic JSON schema compressor instead.Pipe safety
PIPE_UNSAFE_PREFIXES: commands likeaws dynamodb scan | pythonare not rewritten (RTK's compressed output would breakjson.load()), while text-based commands likegit log | grepare still rewritten for savingsAll new filters are added as match arms in
aws_cmd::run(), following the existing pattern for STS/S3/EC2/ECS/RDS/CloudFormation. Unmatched subcommands continue to fall through to the generic JSON schema compressor.Test plan
cargo fmt --all --checkpassescargo clippy --all-targets— 0 errorscargo test --all— 963 tests pass, 0 failuresrtk aws dynamodb scan --table-name <table>rtk aws logs filter-log-events --log-group-name <group>rtk aws s3 sync <src> <dst>aws dynamodb scan ... | pythonpasses raw JSON through (pipe-safe)