A high-performance HTTP/1.1 server engineered from scratch in Modern C++20
Tags: c-plus-plus, cpp20, http-server, web-server, multithreading, network-programming, socket-programming, threadpool, http11, middleware, lru-cache, keep-alive, docker, github-actions, systems-programming, performance-engineering
- Overview
- Feature Matrix
- Project Scale
- Engineering Highlights
- Architecture
- Design Decisions
- Performance
- Security
- Testing Strategy
- Production Deployment
- Repository Structure
- Project Evolution
- Future Enhancements
- Why This Project Exists
This project is a high-performance HTTP/1.1 server engineered entirely in modern C++20. Designed with a focus on systems programming and performance engineering, this server bypasses external frameworks in favor of raw socket manipulation, custom thread-pooling, and an O(1) routing architecture.
Whether serving dynamic API endpoints through a robust middleware pipeline or streaming static assets from an LRU memory cache, the server multiplexes high-throughput connections with extreme efficiency.
| Feature | Status |
|---|---|
| HTTP/1.1 Parser | Complete |
| Thread Pool | Complete |
| O(1) Router | Complete |
| Middleware Pipeline | Complete |
| Static File Server | Complete |
| LRU Cache | Complete |
| Keep-Alive Connections | Complete |
| Docker Deployment | Complete |
| CI/CD Pipeline | Complete |
| Metric | Value |
|---|---|
| Development Phases | 10 |
| Source Files | 20+ |
| Test Suites | 10+ |
| Platform Support | Windows / Linux |
| Containerization | Docker (Alpine) |
| CI/CD | GitHub Actions |
- Thread-safe custom ThreadPool -- A robust concurrency model utilizing worker threads, mutexes, and condition variables for optimal task scheduling.
- TCP Server built from scratch -- Direct integration with Winsock (Windows) and BSD Sockets (POSIX) for zero-dependency networking.
- Custom HTTP Parser and Protocol handling -- Strict RFC-compliant extraction of methods, paths, headers, and bodies.
- O(1) Router -- Fast request dispatching using
std::unordered_map. - Middleware Architecture -- Composable request/response interception for logging, metrics, and tracing.
- Static File Serving -- MIME type detection with strict path traversal protection.
- LRU In-Memory File Cache -- High-performance O(1) cache hit architecture to bypass disk I/O.
- Keep-Alive Connections -- Connection multiplexing to drastically reduce TCP handshake latency.
- Metrics Endpoint -- Live atomic counters tracking cache hits, connection multiplexing, and latencies.
The server handles asynchronous connections by tightly coupling a custom TcpServer with a ThreadPool. Below is the high-level component graph and the detailed request lifecycle.
graph TD
A[Client Request] -->|TCP Socket| B[TcpServer]
B -->|Enqueues Task| C[ThreadPool]
C -->|Worker Thread| D[HttpParser]
D --> E[Middleware Pipeline]
E --> F[O1 Router]
F -->|Dynamic Route| G[Route Handlers]
F -->|Static Route| H[StaticFileHandler & LRU Cache]
G --> I[HttpResponse]
H --> I
I -->|Socket send| A
sequenceDiagram
participant Client
participant Socket
participant ThreadPool
participant TcpServer
participant HttpParser
participant Router
participant Middleware Pipeline
participant Route Handler
participant HttpResponse
Client->>Socket: TCP Connect & Send Data
Socket->>TcpServer: Accept Connection
TcpServer->>ThreadPool: Enqueue handleClient()
ThreadPool->>TcpServer: Worker Thread executes
TcpServer->>HttpParser: extract & parse(rawRequest)
HttpParser->>Router: parsed HttpRequest
Router->>Middleware Pipeline: dispatch(req)
Middleware Pipeline->>Route Handler: handle(req)
Route Handler-->>Middleware Pipeline: return HttpResponse
Middleware Pipeline-->>Router: return HttpResponse
Router-->>TcpServer: return HttpResponse
TcpServer->>HttpResponse: add headers (e.g. Keep-Alive)
TcpServer->>Socket: sendAll(res.toString())
Socket->>Client: Transmit Data
Creating threads per request causes kernel overhead, context switching, and memory waste. A fixed worker pool provides bounded concurrency, keeps system resource usage predictable, and significantly outperforms the thread-per-connection model under load.
TCP handshakes are expensive. Persistent connections significantly reduce latency and improve throughput by amortizing the connection establishment cost across multiple requests.
Static assets are often requested repeatedly. Serving from memory avoids filesystem traversal and disk I/O entirely, yielding measurable improvements in response time under sustained traffic.
Tested on a modern 8-core CPU with a 10,000 connection burst test.
| Component / Test | Operations | Duration | Throughput |
|---|---|---|---|
| Parser (Tiny 18B) | 100,000 iters | 42 ms | ~2.38M req/s |
| Router Dispatch | 1,000,000 iters | 104 ms | ~9.61M ops/s |
| ThreadPool Enqueue | 100,000 tasks | 20 ms | ~5.00M tasks/s |
| Static Cache (Cold) | 10,000 reqs | 2826 ms | ~3,538 req/s |
| Static Cache (Warm) | 10,000 reqs | 2084 ms | ~4,798 req/s |
| Component | Value |
|---|---|
| CPU | Ryzen 7 7435HS |
| RAM | 24 GB |
| OS | Windows 11 |
| Build | Release (-O3) |
- Queue Contention -- The custom ThreadPool is highly optimized. Worker threads eagerly process tasks to minimize mutex contention during rapid connection spikes.
- Cache Locality -- HTTP parsing operations utilize
std::string_viewextensively, resulting in zero-copy memory operations during header and boundary extraction. - False Sharing -- Atomic variables inside the Metrics singleton are aligned to avoid cache line invalidation across CPU cores.
- Memory Allocations -- Persistent connection buffer pools dramatically reduce heap fragmentation, enabling zero-allocation parses after the first request in a Keep-Alive session.
- HTTP Parsing Costs -- The parser short-circuits on malformed requests, utilizing tightly-bound loops rather than heavy regex engines.
Security is treated as a first-class citizen alongside performance.
- Path Traversal Protection -- Uses
std::filesystem::weakly_canonicalto enforce strict sandboxing of the public directory, blocking../attacks. - Request Size Limits -- Hard limits on
MAX_BODY_SIZE(1MB) prevent payload exhaustion. - Header Limits -- Maximum header limits (64KB) immediately drop slowloris-style connections.
- Timeout Protection -- Enforced
SO_RCVTIMEOandSO_SNDTIMEOtimeouts prevent zombie connections from tying up worker threads. - Safe File Access -- File handlers fail safely with robust error logging to prevent memory leakage on bad disk sectors.
The repository strictly mandates a test-driven approach using CMake and CTest.
- Unit Tests -- Full isolation testing of
HttpParser,Router,LRUCache, andMetrics. - Integration Tests -- End-to-end socket-level testing utilizing simulated TCP fragmentation.
- Stress Tests -- Validates the ThreadPool under intense enqueue/dequeue contention without triggering data races.
- Benchmarking Framework -- Built-in macro-level benchmarking for zero-regression validation.
This server provides an automated, multi-stage Docker build to yield a lightweight Alpine execution image.
# Build the production image
docker build -t multithreaded-http-server:latest .
# Run the server
docker run -d -p 8080:8080 multithreaded-http-server:latestAll pull requests trigger a strictly validated GitHub Actions pipeline that performs:
- CMake build verification across both Linux and Windows environments.
- Full CTest suite execution (
--output-on-failure). - Only code that compiles cleanly and passes all end-to-end TCP socket tests is merged.
MultiThreaded-HTTP-Server/
├── CMakeLists.txt
├── Dockerfile
├── .github/
│ └── workflows/
│ └── build.yml
├── include/ # Header files (Interfaces & Abstractions)
├── src/ # Implementation files
├── tests/ # CTest validation suite
├── benchmarks/ # Performance evaluation scripts
├── public/ # Static file hosting directory
└── assets/ # README artifacts & demo GIFs
This project was built iteratively, focusing heavily on solid architectural foundations before adding layers of complexity.
graph LR
P1[Phase 1: Config & Architecture] --> P2[Phase 2: ThreadPool]
P2 --> P3[Phase 3: TCP Networking]
P3 --> P4[Phase 4: HTTP Parser]
P4 --> P5[Phase 5: O1 Router]
P5 --> P6[Phase 6: CI/CD]
P6 --> P7[Phase 7: Middleware]
P7 --> P8[Phase 8: Static Files]
P8 --> P9[Phase 9: LRU Cache]
P9 --> P10[Phase 10: Keep-Alive]
| Enhancement | Description |
|---|---|
| HTTP/2 Support | Implement binary framing and stream multiplexing |
| TLS/SSL Integration | Enable HTTPS using OpenSSL wrappers |
| Non-Blocking I/O | Shift to an epoll/IOCP event loop for C10k scalability |
| WebSockets | Bi-directional communication upgrades |
This repository was created to solidify advanced systems engineering principles. Instead of building CRUD applications on top of Express or Django, this project forces a deep understanding of:
- How sockets traverse the kernel.
- How thread contention impacts memory caching.
- How HTTP is actually parsed at the byte level.
- How architectures evolve when standard libraries are heavily relied upon instead of external packages.
For engineering managers reviewing this codebase, this project demonstrates direct competency in:
| Domain | Skills |
|---|---|
| Multithreading | Safe manipulation of std::mutex, std::condition_variable, and atomics |
| Networking | Direct sys/socket.h and winsock2.h integration |
| Performance Engineering | Profiling, cache hit ratios, and zero-copy data parsing |
| Systems Design | Modular, decoupled components following SOLID principles |
| Production Infrastructure | Dockerization, CMake build systems, and GitHub Actions CI |
