Skip to content

ArpBansal/DistGrep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DistGrep

DistGrep is a distributed grep system implementation in Go. It demonstrates key concepts in distributed systems including Leader Election (using Raft) and distributed processing (using MapReduce).

Architecture

The system consists of a fixed 4-node cluster. Each node runs:

  1. HTTP Server: Handles client requests for job submission and status checks.
  2. Raft Consensus Module: Manages leader election and cluster coordination.
  3. MapReduce Engine: Executes the distributed grep search logic.

Cluster Configuration

By default, the 4 nodes are configured as follows:

Node ID HTTP Port Raft Port Data Dir
node-1 8081 9001 /tmp/dgrep-node-1
node-2 8082 9002 /tmp/dgrep-node-2
node-3 8083 9003 /tmp/dgrep-node-3
node-4 8084 9004 /tmp/dgrep-node-4

Prerequisites

  • Go 1.18 or higher

Getting Started

1. Run the Cluster

To start the 4-node cluster in a single process (simulate only):

make run

This will spin up all 4 nodes. You will see logs indicating leader election and server startup.

2. Submit a Grep Job

Once the cluster is running and a leader is elected, you can submit a grep job via HTTP POST to the leader (or any node, though forwarding might not be fully implemented if not leader-aware).

Endpoint: POST /submit

Body:

{
    "pattern": "pattern",
    "files": ["file1.txt", "/home/dir/"]
}

Example: Search for "error" in a log file:

curl -X POST http://localhost:8081/submit \
  -H "Content-Type: application/json" \
  -d '{
    "pattern": "error",
    "files": ["/home/file1.txt", "/home/file2.txt"]
  }'

3. Check Node Status

You can check the status of any node to see if it is the leader.

Endpoint: GET /status

Example:

curl http://localhost:8081/status

Features

  • Distributed Search: Uses MapReduce to parallelize the search across available workers (simulated in this single-process version).
  • Fault Tolerance: Uses Raft for coordination (Leader/Follower states).
  • Match Highlighting: Returns matches with the pattern highlighted (ANSI colors).

Project Structure

  • main.go: Entry point, sets up the cluster.
  • internal/:
    • coordinator/: Master logic for job distribution.
    • grep/: The specific Grep implementation (Map/Reduce functions).
    • http/: HTTP API server.
    • raft/: Raft consensus implementation.
    • mapreduce/: General MapReduce.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors