v1.4.0 — Smart Extract, tutorial, and path language#17
Merged
kostas-jakeliunas-sb merged 4 commits intomainfrom Apr 2, 2026
Merged
v1.4.0 — Smart Extract, tutorial, and path language#17kostas-jakeliunas-sb merged 4 commits intomainfrom
kostas-jakeliunas-sb merged 4 commits intomainfrom
Conversation
New features: - --smart-extract: client-side extraction with auto-format detection (JSON, HTML, XML, CSV, Markdown, plain text) and a full path language supporting recursive search (...key), glob patterns (*), value/key filters ([=pattern], [key=pattern]), regex ([=/pattern/]), context expansion (~N), OR (|), AND (&), multi-index ([0,3,7]), slicing, [keys]/[values], escaped literals ((key.name)), and JSON schema output - Interactive tutorial command (25 steps, 11 chapters) with syntax- highlighted command box, prereq chain resolution, and ScrapingBee brand colors - --smart-extract works on all commands: scrape, google, amazon, walmart, youtube, chatgpt, fast-search - Deprecation warnings for --extract-field and --fields (use --smart-extract instead, removal in v2.0.0) Improvements: - Enhanced --extract-field and --fields with full path language support - CSV header row auto-detection (skips headers without --input-column) - Recursive prereq resolution in tutorial runner - API key check before prereq execution in tutorial - File-based usage cache with locking - Batch resume discovery (scrapingbee --resume) - Scraping pipeline agent added to all 5 AI platform skill trees - Skills synced across .agents/, .github/, .kiro/, .opencode/, plugins/ - LLM-focused marketing in AGENTS.md and SKILL.md frontmatter - scripts/sync-skills.sh for cross-platform skill synchronization Bug fixes: - Fixed regex compilation crash in value filter (_build_matcher) - Fixed ancestry list corruption in recursive context search - Fixed CSV serialization using str() instead of json.dumps() for dicts - Fixed retries/backoff type annotations across all commands - Suppressed harmless coroutine-never-awaited test warnings Tests: - 600 unit tests (67 new for path language + smart-extract) - 196 e2e tests passing - Full backward compatibility with v1.3.1 --extract-field and --fields
…ixes - Chainable ~N context expansion — works anywhere in the path, not just after recursive search. Find a value, go up, keep chaining. - [!=pattern] negation filter for excluding values/dicts - [*=pattern] glob key filter for matching dicts by any key's value - Scalar-only value matching — filters skip dicts/lists to prevent false positives from stringifying entire subtrees - List flattening in recursive search — ...section[id=faq] now correctly finds individual elements - Guard against None dict keys in recursive walk - Pass root through recursive _resolve_path calls for correct ~N ancestry - CSV header auto-detection in read_input_file - Deprecation warnings for --extract-field and --fields - Tutorial restructured: 25 steps, 11 chapters, prereq chain resolution - 647 unit tests (114 smart-extract tests including 47 chaining tests) - Updated CHANGELOG, AGENTS.md, SKILL.md, Amazon Q JSON skill
Cap _recursive_walk_simple() and _recursive_walk_ctx() at 100 levels of recursion depth. These functions power the ...key recursive search in --smart-extract and previously had no depth guard — a pathologically nested JSON structure could exhaust Python's call stack and crash ungracefully. Now they return gracefully when the limit is exceeded.
3a3198b to
f9bf45a
Compare
kostas-jakeliunas-sb
approved these changes
Apr 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--smart-extract: Client-side extraction with auto-format detection (JSON, HTML, XML, CSV, Markdown, text) and a full path language — recursive search (...key), glob patterns (*), value/key filters ([=pattern]), regex ([=/pattern/]), context expansion (~N), OR/AND operators, JSON schema output. Works on all commands.--extract-fieldand--fields: Replaced by--smart-extract. Deprecation warnings added, removal planned for v2.0.0.--smart-extractdocumented in AGENTS.md, SKILL.md (all 5 platforms), plugin.json. LLM-focused positioning: "extract just the data your LLM needs."Test plan
--extract-fieldand--fieldsstill work