feat: major performance & reliability improvements + critical bug fixes#169
Open
huohua-dev wants to merge 10 commits intolc:masterfrom
Open
feat: major performance & reliability improvements + critical bug fixes#169huohua-dev wants to merge 10 commits intolc:masterfrom
huohua-dev wants to merge 10 commits intolc:masterfrom
Conversation
added 10 commits
February 9, 2026 13:41
- Call os.Stdout.Sync() after each URL write in WriteURLs and WriteURLsJSON - Ensure data is immediately flushed to disk in pipe/redirect scenarios - Add atomic URL counter parameter for exit summary tracking
- Add StatusCodeError type to carry HTTP status codes through error chain - Implement exponential backoff retry for network errors (capped at 30s) - Skip retry for 429 rate-limit and 400 bad-request responses - Add shouldRetry() to detect retryable network errors - Replace manual case-insensitive search with strings.ToLower
- Implement dispatcher+worker pattern for parallel page fetching - Use sync.Once to safely stop dispatcher on empty results - Add structured logging with provider/domain/page fields - Use StatusCodeError for proper 400 status handling - Support configurable provider-threads parameter
- Implement dispatcher+worker pattern for parallel page fetching - Use errors.As with StatusCodeError for proper 429 detection - Stop pagination when has_next is false - Add structured logging with provider/domain/page/status fields
…rawl - Implement dispatcher+worker pattern using known page count - Cap worker threads to actual page count - Use errors.As with StatusCodeError for proper error classification - Add structured logging for connection errors and API errors
- Add provider/domain/page/error fields to warning logs - Add response body to rate-limit log for debugging
- Add ProviderThreads field to providers.Config - Register --provider-threads CLI flag with default value 3 - Support provider-threads in .gau.toml config file
- Create timeout context for each provider work item - Cap provider timeout at 5 minutes to prevent single provider blocking - Add structured logging with provider/domain/timeout fields
- Track total URL count using atomic counter - Log summary with total URLs and duration on exit
…sion dot mismatch Bug 1: --fp (RemoveParameters) had inverted Contains check and Add() placed after continue, causing lastURL set to stay empty forever → 0 output. Bug 2: path.Ext() returns '.png' but blacklist stores 'png' (no dot), so blacklistMap.Contains() never matched. Added TrimPrefix to strip leading dot. Both fixes applied to WriteURLs and WriteURLsJSON.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces significant performance and reliability improvements to gau, along with critical bug fixes that affect core functionality.
Changes
🐛 Bug Fixes
1.
--fpflag drops ALL URLs (critical, zero output)The
RemoveParameterslogic inoutput.gowas inverted — it skipped URLs that had not been seen before, and sincelastURL.Add()was placed after thecontinuestatement, it was never executed. This caused the dedup set to remain permanently empty, resulting in every single URL being dropped when--fpwas used.Before (broken):
After (fixed):
2.
--blacklistextension matching never workspath.Ext()returns extensions with a leading dot (e.g.,".png"), but the blacklist set stores values without dots (e.g.,"png"from--blacklist png,jpg). The comparison.pngvspngalways fails, so blacklist filtering was silently broken.Fix: Added
strings.TrimPrefix(ext, ".")before checking the blacklist. Applied to bothWriteURLsandWriteURLsJSON.⚡ Performance Improvements
3. Concurrent pagination per provider (
--provider-threads)Added a new
--provider-threadsflag (default: 3) that enables concurrent page fetching within each provider. Previously, each provider fetched pages sequentially — for large targets with hundreds of pages (e.g.,huawei.comwith 200+ OTX pages), this was extremely slow.Implemented for all four providers:
4. Retry with exponential backoff
Added a robust retry mechanism in the HTTP client with:
StatusCodeErrortype for better error handling5. Per-provider timeout control
Each provider now runs with its own timeout context (capped at 5 minutes), preventing a single slow/stuck provider from blocking the entire run indefinitely.
🛡️ Reliability Improvements
6. Real-time stdout flush
Added
os.Stdout.Sync()after each URL write to prevent data loss when the process is killed (e.g.,SIGKILL,Ctrl+C, pipe break). Previously, buffered output could be lost.7. Execution summary
Added a summary log line at the end of each run showing total URLs found and execution duration:
8. Structured error logging
All providers now use structured logging with
logrus.WithFieldsfor consistent, parseable error output including provider name, domain, page number, and timeout values.Testing
All changes have been tested against real-world targets with various flag combinations:
--fpalone--blacklist png,jpg--fp --blacklistcombo--provider-threadsFiles Changed
pkg/output/output.go— Fixed--fplogic inversion + blacklist dot mismatch + real-time flush + URL counterpkg/httpclient/client.go— Added retry with exponential backoff + StatusCodeErrorpkg/providers/wayback/wayback.go— Concurrent pagination + structured loggingpkg/providers/commoncrawl/commoncrawl.go— Concurrent pagination + structured loggingpkg/providers/otx/otx.go— Concurrent pagination + structured loggingpkg/providers/urlscan/urlscan.go— Structured loggingpkg/providers/providers.go— AddedProviderThreadsto Configrunner/runner.go— Per-provider timeout controlrunner/flags/flags.go— Added--provider-threadsflagcmd/gau/main.go— Execution summary + URL count trackingBackward Compatibility
All changes are fully backward compatible:
--provider-threads 3)