Fast-Crawl

Fast-Crawl is a high-performance web crawler designed to efficiently traverse and extract links from websites. It supports concurrent crawling and writing, making it suitable for large-scale web scraping tasks.

Features

Concurrent crawling with configurable number of crawlers and writers.
Graceful shutdown handling.
Error handling and logging.
Configurable process timeout.
Merges output from multiple writers into a single file.

Installation

Clone the repository:

git clone https://github.com/yourusername/fast-crawl.git
cd fast-crawl

Usage

Run the crawler with the default configuration:

./fast-crawl

Configuration

The crawler can be configured by modifying the Config struct in main.go. Key configuration options include:

NumCrawlers: Number of concurrent crawlers.
NumWriters: Number of concurrent writers.
QueueSize: Size of the link queue.
ProcessTimeout: Timeout for processing each link.
InitialLinks: List of initial URLs to start crawling from.
filename: Name of the output file.

How It Works

Initialization: The crawler initializes with the specified configuration and sets up channels for communication between crawlers and writers.
Crawling: Multiple crawler goroutines are started to process links concurrently. Each crawler fetches a URL, extracts links, and adds new links to the queue.
Writing: Writer goroutines write the crawled data to part files, which are later merged into a single output file.
Error Handling: Errors encountered during crawling and writing are logged and handled gracefully.
Shutdown: The crawler listens for shutdown signals and stops gracefully, ensuring all goroutines complete their tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bak		bak
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast-Crawl

Features

Installation

Usage

Configuration

How It Works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fast-Crawl

Features

Installation

Usage

Configuration

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages