feat: add CAP Theorem, Materialized Views, and Vertical Scaling tabs#50
Conversation
Add 3 new interactive tabs to the database visualizer: - CAP Theorem: stop/start replication to simulate network partition, write data and observe divergence between primary and replica - Materialized Views: compare 3-table JOIN query vs pre-computed materialized view, observe staleness after writes - Vertical Scaling: adjust InnoDB buffer pool size and benchmark 200 random queries to see the impact of RAM on performance Includes 5 new Playwright screenshots, new lab tasks (Tasks 6-9), updated Key Concepts and Conclusions, and 12 new API endpoints.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (5)
📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR adds three new database learning modules to the visualizer: CAP Theorem (demonstrating partition tolerance through replication controls), Materialized Views (showing trade-offs between read speed and data freshness), and Vertical Scaling (benchmarking buffer pool optimization). Updated lab documentation and added corresponding frontend UI controls and backend API endpoints. Changes
Sequence DiagramssequenceDiagram
participant User
participant Client as Frontend
participant Server as Backend
participant DB as Primary DB
participant Replica as Replica DB
User->>Client: Click "Stop Replication"
Client->>Server: POST /api/cap/stop-replication
Server->>DB: STOP REPLICA
Server-->>Client: Replication stopped
Client->>User: Display status
User->>Client: Click "Write & Compare"
Client->>Server: POST /api/cap/test-divergence
Server->>DB: INSERT INTO students
Server->>DB: SELECT from primary
Server->>Replica: SELECT from replica
Note over Replica: Replica lags (no replication)
Server->>Client: Return divergence result
Client->>User: Display partition outcome
User->>Client: Click "Start Replication"
Client->>Server: POST /api/cap/start-replication
Server->>DB: START REPLICA
Server-->>Client: Replication resumed
Replica->>Replica: Catch up with primary
Client->>User: Display consistency restored
sequenceDiagram
participant User
participant Client as Frontend
participant Server as Backend
participant DB as Database
User->>Client: Click "Create View"
Client->>Server: POST /api/views/create
Server->>DB: CREATE TABLE enrollment_summary AS<br/>(3-table JOIN query)
Server->>Client: Return row count & latency
Client->>User: Display view created
User->>Client: Click "Query with Join"
Client->>Server: POST /api/views/query-join
Server->>DB: Run expensive JOIN<br/>Capture EXPLAIN plan
Server->>Client: Return results & metrics
Client->>User: Display join results & timing
User->>Client: Click "Query View"
Client->>Server: POST /api/views/query-view
Server->>DB: SELECT \\* FROM enrollment_summary
Note over DB: Fast read from materialized view
Server->>Client: Return view results
Client->>User: Display faster query time
User->>Client: Click "Insert & Show Staleness"
Client->>Server: INSERT new data
Client->>Server: POST /api/views/query-view
Server->>DB: SELECT FROM enrollment_summary
Note over DB: New data NOT visible (stale)
Server->>Client: Return stale results
Client->>User: Highlight freshness trade-off
User->>Client: Click "Refresh View"
Client->>Server: POST /api/views/refresh
Server->>DB: Recreate enrollment_summary
Server->>Client: Return updated results
Client->>User: Display refreshed data
sequenceDiagram
participant User
participant Client as Frontend
participant Server as Backend
participant DB as Database
User->>Client: Select buffer pool size<br/>Click "Set Buffer"
Client->>Server: POST /api/vertical/set-buffer
Server->>DB: SET GLOBAL innodb_buffer_pool_size = X
Server->>Client: Confirmation
Client->>User: Display buffer size updated
User->>Client: Click "Run Benchmark"
Client->>Server: POST /api/vertical/benchmark
Server->>DB: FLUSH status variables
loop 200 queries
Server->>DB: SELECT \\* FROM access_log<br/>(random ID)
DB-->>Server: Measure latency
end
Server->>DB: Read buffer pool metrics
Server->>Server: Calculate hit ratio,<br/>avg latency, p95, QPS
Server->>Client: Return benchmark results
Client->>User: Display throughput/latency<br/>comparison
Note over User: Observe diminishing<br/>returns at larger sizes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR expands the database visualizer with three new interactive learning tabs (CAP Theorem, Materialized Views, Vertical Scaling) and updates the lab handout to add Tasks 6–9 covering these concepts.
Changes:
- Adds new backend API routes for CAP partition simulation, materialized view creation/querying, and vertical scaling benchmarks.
- Adds new UI tabs + control panels and corresponding client-side actions/animations.
- Extends the lab documentation with new guided tasks and screenshots for the added tabs.
Reviewed changes
Copilot reviewed 4 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| 10-databases/visualizer/server.py | Adds CAP/materialized-view/vertical-scaling API endpoints and wires them into ROUTES. |
| 10-databases/visualizer/index.html | Adds three new tabs and associated control panels for CAP/views/vertical scaling. |
| 10-databases/visualizer/app.js | Adds descriptions/explanations and UI actions/animations for the new endpoints. |
| 10-databases/visualizer/LAB-VISUALIZER.md | Adds Tasks 6–9 instructions and updates conclusions/key concepts accordingly. |
| 10-databases/visualizer/screenshots/09-cap-consistent.png | Adds a new screenshot referenced by the updated lab instructions. |
| def vertical_benchmark(body): | ||
| """Run random queries and measure performance.""" | ||
| count = int(body.get("count", 200)) | ||
| conn = get_conn(PRIMARY_HOST) | ||
| try: | ||
| steps = [] | ||
| total_start = time.perf_counter() | ||
| latencies = [] | ||
|
|
||
| # Reset buffer pool stats | ||
| with conn.cursor() as cur: | ||
| cur.execute("FLUSH STATUS") | ||
|
|
||
| with conn.cursor() as cur: | ||
| import random | ||
| for i in range(count): | ||
| sid = random.randint(1, 10) | ||
| rid = f"resource-{random.randint(1, 50)}" | ||
| t0 = time.perf_counter() | ||
| cur.execute( | ||
| "SELECT * FROM access_log WHERE student_id = %s AND resource = %s", | ||
| (sid, rid)) | ||
| cur.fetchall() | ||
| t1 = time.perf_counter() | ||
| latencies.append((t1 - t0) * 1000) | ||
|
|
||
| # Get buffer pool stats | ||
| with conn.cursor() as cur: | ||
| cur.execute("SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests'") | ||
| read_requests = int(cur.fetchone().get("Value", 0)) | ||
| cur.execute("SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads'") | ||
| disk_reads = int(cur.fetchone().get("Value", 0)) | ||
| cur.execute("SHOW GLOBAL VARIABLES LIKE 'innodb_buffer_pool_size'") | ||
| pool_size = cur.fetchone().get("Value", "0") | ||
|
|
||
| hit_ratio = round((1 - disk_reads / max(read_requests, 1)) * 100, 1) | ||
| avg_latency = round(sum(latencies) / len(latencies), 2) | ||
| p95 = round(sorted(latencies)[int(len(latencies) * 0.95)], 2) | ||
| qps = round(count / ((time.perf_counter() - total_start)), 1) |
There was a problem hiding this comment.
vertical_benchmark assumes count > 0, but count comes from the request body and can be 0 (or negative), which will trigger division-by-zero (sum(latencies) / len(latencies)) and an index error when computing p95. Validate count (e.g., clamp to a safe range like 1..N) and return a clear error if it’s invalid.
| cur.execute(""" | ||
| SELECT s.name, s.major, c.code, c.title, e.enrolled_at | ||
| FROM enrollments e | ||
| JOIN students s ON e.student_id = s.student_id | ||
| JOIN courses c ON e.course_id = c.course_id | ||
| ORDER BY s.name | ||
| """) | ||
| rows = cur.fetchall() | ||
| t1 = time.perf_counter() | ||
| steps.append(step_entry(seq, "SELECT with 3-table JOIN", "primary", | ||
| "OK", (t1 - t0) * 1000, | ||
| {"row_count": len(rows)})) | ||
| seq += 1 | ||
|
|
||
| # Get EXPLAIN | ||
| t0 = time.perf_counter() | ||
| cur.execute(""" | ||
| EXPLAIN SELECT s.name, s.major, c.code, c.title, e.enrolled_at | ||
| FROM enrollments e | ||
| JOIN students s ON e.student_id = s.student_id | ||
| JOIN courses c ON e.course_id = c.course_id | ||
| """) |
There was a problem hiding this comment.
In views_query_join, the EXPLAIN query is not the same as the query you time above (it omits the ORDER BY s.name, and the projection differs). That can produce a different plan and mislead students when comparing timing vs plan. Use EXPLAIN on the exact same SELECT you executed (including ORDER BY) so the reported plan matches the measured query.
| for (const step of result.steps) { | ||
| logEvent(step); | ||
| if (step.action === 'INSERT') { | ||
| highlightExpStep(0); |
There was a problem hiding this comment.
During the CAP "Write & Compare" animation, the INSERT step calls highlightExpStep(0), which corresponds to "Stop Replication" in the CAP explanation. This makes the explanation highlight out of sync with the action being animated. Use the step index for "Write & Compare" (currently 1) for the INSERT/SELECT steps so the UI highlights the correct explanation section.
| highlightExpStep(0); | |
| highlightExpStep(1); |
| total_ms = (time.perf_counter() - total_start) * 1000 | ||
| diverged = steps[1]["result"] != steps[2]["result"].split(" ")[0] | ||
| return { | ||
| "pattern": "cap", "steps": steps, "total_ms": round(total_ms, 2), | ||
| "outcome": "DIVERGED (partition active)" if diverged else "CONSISTENT", | ||
| } |
There was a problem hiding this comment.
cap_test_divergence derives diverged by comparing formatted result strings (steps[1]["result"] != steps[2]["result"].split(" ")[0]). This is brittle (e.g., primary result can be "NOT FOUND" while the replica side becomes "NOT") and can misclassify outcomes if the display text changes. Compute divergence from booleans instead (e.g., primary_found = ... is not None, replica_found = ... is not None, then diverged = primary_found != replica_found) and keep human-readable strings separate from the logic.
| name = body.get("name", "CAP Test") | ||
| email = f"cap{int(time.time())}@university.edu" | ||
| steps = [] | ||
| total_start = time.perf_counter() | ||
| seq = 1 | ||
|
|
||
| # Write to primary | ||
| conn = get_conn(PRIMARY_HOST) | ||
| try: | ||
| t0 = time.perf_counter() | ||
| with conn.cursor() as cur: | ||
| cur.execute("INSERT INTO students (name, email, major) VALUES (%s, %s, 'CAP Test')", | ||
| (name, email)) |
There was a problem hiding this comment.
cap_test_divergence generates emails with int(time.time()), but students.email is UNIQUE (see 10-databases/mysql/init/primary-init.sql:8). Multiple requests within the same second will collide and throw a MySQL error. Use a higher-resolution or random suffix (e.g., time.time_ns() and/or uuid4) to guarantee uniqueness.
| def vertical_set_buffer(body): | ||
| """Set InnoDB buffer pool size.""" | ||
| size = body.get("size", "64M") | ||
| conn = get_conn(PRIMARY_HOST) | ||
| try: | ||
| t0 = time.perf_counter() | ||
| with conn.cursor() as cur: | ||
| cur.execute(f"SET GLOBAL innodb_buffer_pool_size = {size}") | ||
| t1 = time.perf_counter() | ||
| return {"action": "SET BUFFER POOL", "size": size, |
There was a problem hiding this comment.
vertical_set_buffer builds SQL with an f-string from request data (SET GLOBAL innodb_buffer_pool_size = {size}). This allows SQL injection (the value comes from the client) and can also set invalid/unsafe values. Parse and validate size as an integer number of bytes (with reasonable min/max bounds), then format it as a trusted integer; if you want to allow human-friendly inputs like 64M, explicitly parse them server-side rather than passing them through.
| def vertical_set_buffer(body): | |
| """Set InnoDB buffer pool size.""" | |
| size = body.get("size", "64M") | |
| conn = get_conn(PRIMARY_HOST) | |
| try: | |
| t0 = time.perf_counter() | |
| with conn.cursor() as cur: | |
| cur.execute(f"SET GLOBAL innodb_buffer_pool_size = {size}") | |
| t1 = time.perf_counter() | |
| return {"action": "SET BUFFER POOL", "size": size, | |
| def parse_buffer_pool_size(size): | |
| """Parse a buffer pool size into bytes and enforce safe bounds.""" | |
| min_size = 5 * 1024 * 1024 | |
| max_size = 16 * 1024 * 1024 * 1024 | |
| multipliers = { | |
| "K": 1024, | |
| "M": 1024 * 1024, | |
| "G": 1024 * 1024 * 1024, | |
| } | |
| if isinstance(size, int): | |
| size_bytes = size | |
| else: | |
| value = str(size).strip() | |
| if not value: | |
| raise ValueError("size must not be empty") | |
| suffix = value[-1].upper() | |
| if suffix in multipliers: | |
| number = value[:-1].strip() | |
| if not number or not number.isdigit(): | |
| raise ValueError("size must be an integer number of bytes or use K, M, or G suffixes") | |
| size_bytes = int(number) * multipliers[suffix] | |
| else: | |
| if not value.isdigit(): | |
| raise ValueError("size must be an integer number of bytes or use K, M, or G suffixes") | |
| size_bytes = int(value) | |
| if size_bytes < min_size or size_bytes > max_size: | |
| raise ValueError( | |
| f"size must be between {min_size} and {max_size} bytes" | |
| ) | |
| return size_bytes | |
| def vertical_set_buffer(body): | |
| """Set InnoDB buffer pool size.""" | |
| try: | |
| size_bytes = parse_buffer_pool_size(body.get("size", "64M")) | |
| except ValueError as exc: | |
| return {"error": str(exc)} | |
| conn = get_conn(PRIMARY_HOST) | |
| try: | |
| t0 = time.perf_counter() | |
| with conn.cursor() as cur: | |
| cur.execute(f"SET GLOBAL innodb_buffer_pool_size = {size_bytes}") | |
| t1 = time.perf_counter() | |
| return {"action": "SET BUFFER POOL", "size": size_bytes, |
Summary
Test plan
Summary by CodeRabbit
New Features
Documentation