A complete distributed file system simulation inspired by Google File System, featuring fault tolerance, automatic replication, and a comprehensive web-based management dashboard.
- Distributed Storage: File chunks distributed across multiple servers
- Fault Tolerance: Automatic failure detection and re-replication
- Heartbeat Monitoring: Real-time server health tracking
- Role-Based Access Control: Admin, Manager, and User roles with different permissions
- Web Dashboard: Interactive UI for system management and monitoring
- Containerized Architecture: Full Docker deployment with docker-compose
-
Master Node (Port 8000)
- Manages metadata and chunk assignments
- Tracks heartbeats from chunk servers
- Detects failures and triggers re-replication
- Provides REST API for dashboard and clients
-
Chunk Servers (3 instances: Ports 9001-9003)
- Store file chunks
- Send periodic heartbeats to Master
- Handle upload/download requests
-
Client Service (Port 8001)
- Processes file uploads from web UI
- Splits files into chunks
- Distributes chunks to assigned servers
-
Web Interface (Port 8080)
- Single-page application
- Role-based dashboards
- Real-time system monitoring
- Docker Engine 20.10+
- Docker Compose 2.0+
- Modern web browser (Chrome, Firefox, Safari, Edge)
Create the following directory structure:
project-root/
βββ backend/
β βββ master_node.py
β βββ chunk_server.py
β βββ client_script.py
βββ web/
β βββ index.html
β βββ styles.css
β βββ script.js
βββ Dockerfile
βββ docker-compose.yml
βββ README.md
# Build and start all containers
docker-compose up -d --build
# Verify all containers are running
docker-compose psExpected output:
NAME STATUS PORTS
gfs_master Up (healthy) 0.0.0.0:8000->8000/tcp
gfs_chunk_1 Up
gfs_chunk_2 Up
gfs_chunk_3 Up
gfs_client Up 0.0.0.0:8001->8001/tcp
gfs_web Up 0.0.0.0:8080->80/tcp
Open your browser and navigate to:
http://localhost:8080
| Username | Password | Role | Capabilities |
|---|---|---|---|
| admin | admin123 | Admin | Full system access, user management, fault simulation |
| manager1 | manager123 | Manager | Monitor system, manage files, simulate faults |
| user1 | user123 | User | Upload files, view personal files, monitor system |
- System Status: Real-time server health and fault tolerance metrics
- User Management: Create users, promote to Manager role
- Chunk Servers: Monitor all servers, simulate failures
- File Distribution: View all uploaded files and chunk locations
- Fault Simulation: Test system resilience by simulating server failures
- System Overview: Monitor active servers and fault tolerance
- Server Management: View server status, simulate failures
- File Transfers: Track all file uploads and distributions
- File Upload: Upload files with automatic chunking and distribution
- Upload Progress: Real-time visual feedback on upload status
- System Health: View server status and fault tolerance
- My Files: View personal uploaded files and chunk distribution
- Login as
user1(password:user123) - Navigate to "Upload File" section
- Enter a filename (e.g.,
document.txt) - Enter or paste content
- Click "Upload File"
- Watch real-time progress as chunks are distributed
- Login as
adminormanager1 - Navigate to "Chunk Servers" section
- Click "Simulate Failure" on any active server
- Observe automatic re-replication of chunks
- System automatically redistributes affected chunks to healthy servers
- Login as
admin - Click "+ Add User" button
- Enter username, password, and select role
- New user can immediately login with created credentials
# All services
docker-compose logs -f
# Master node only
docker-compose logs -f master
# Specific chunk server
docker-compose logs -f chunk_server_1
# Client service
docker-compose logs -f client# Via API
curl http://localhost:8000/status | python3 -m json.tool
# Container health
docker-compose psEdit backend/master_node.py:
HEARTBEAT_TIMEOUT = 15 # seconds (default)Edit backend/master_node.py:
REPLICATION_FACTOR = 2 # default: 2 replicas per chunkEdit both backend/master_node.py and backend/client_script.py:
CHUNK_SIZE = 1024 * 1024 # 1MB (default)# Check logs
docker-compose logs
# Rebuild from scratch
docker-compose down -v
docker-compose up -d --build- Verify web container is running:
docker-compose ps web - Check port isn't in use:
lsof -i :8080(Mac/Linux) - Try accessing:
http://127.0.0.1:8080
- Ensure all chunk servers are active (check dashboard)
- Verify client container is running:
docker-compose ps client - Check client logs:
docker-compose logs client
# Restart specific chunk server
docker-compose restart chunk_server_1
# Check network connectivity
docker-compose exec master ping chunk_server_1- Chunk servers send heartbeats every 5 seconds
- Master marks server as failed after 15 seconds of no heartbeat
- Automatic re-replication begins immediately
- Chunks from failed server are identified
- New server assignments calculated using hash distribution
- Chunks re-replicated to maintain replication factor
- Metadata updated atomically
- Client requests chunk allocation from Master
- Master assigns chunks to available servers
- Client uploads chunks to assigned servers (parallel)
- Client registers completed chunks with Master
- Master updates metadata persistently
This is a simulation for educational purposes. For production use:
- Implement proper authentication (JWT, OAuth)
- Use HTTPS for all communications
- Add input validation and sanitization
- Implement rate limiting
- Add encryption for data at rest and in transit
# Stop all containers (preserves data)
docker-compose stop
# Stop and remove containers (preserves volumes)
docker-compose down
# Complete cleanup (removes all data)
docker-compose down -v- User-defined bridge network:
gfs_network - Container-to-container communication via service names
- External access via published ports
- Master metadata:
/data/master/chunks.json - User database:
/data/master/users.json - Chunk storage:
/data/chunks/<chunk_id>per server - Docker volumes ensure data persistence across restarts
Master Node (Port 8000)
GET /status- System status and metricsPOST /heartbeat- Chunk server heartbeatPOST /login- User authenticationGET /users- List all usersPOST /create_user- Create new userPOST /promote_user- Promote user to managerPOST /allocate_chunks- Request chunk allocationPOST /register_chunk- Register uploaded chunkPOST /simulate_failure- Simulate server failure
Client Service (Port 8001)
POST /upload- Upload file for distribution
This simulation demonstrates:
- Distributed system architecture
- Fault tolerance and recovery
- Metadata management
- Heartbeat-based health monitoring
- Chunk-based storage
- RESTful API design
- Container orchestration
- Role-based access control
This is an educational project. Feel free to use and modify for learning purposes.
This is a simulation project for educational purposes. Suggestions and improvements are welcome!
For issues or questions, check the troubleshooting section or review container logs for detailed error information.