Skip to content

feat: add database scalability labs with interactive visualizer#42

Merged
gamaware merged 3 commits into
mainfrom
feat/lab-10-database-scalability
Apr 14, 2026
Merged

feat: add database scalability labs with interactive visualizer#42
gamaware merged 3 commits into
mainfrom
feat/lab-10-database-scalability

Conversation

@gamaware
Copy link
Copy Markdown
Owner

@gamaware gamaware commented Apr 14, 2026

Summary

  • Add interactive web visualizer (main lab) connected to live MySQL primary-replica cluster
    with animated SVG diagrams, real-time latency, built-in SQL console, and EC2 support
  • Add 3 optional CLI labs: MySQL (replication, ACID, indexing), MongoDB (replica set,
    read/write concerns, denormalization), Cassandra (multi-node ring, tunable consistency,
    partition keys)
  • Each lab includes docker-compose.yml, setup/cleanup scripts, CloudFormation template,
    and step-by-step instructions following the 13-section lab template
  • Update Module 10 README to match the pattern from labs 09 and 11

Test plan

  • Run cd 10-databases/visualizer && ./setup.sh and open http://localhost:8081
  • Test all 3 visualizer tabs (Replication, Consistency, Schema & Indexing)
  • Test SQL Console (expand, run SELECT query, verify tabular output)
  • Run cd 10-databases/mysql && ./setup.sh and verify replication works
  • Run cd 10-databases/mongodb && ./setup.sh and verify replica set forms
  • Run cd 10-databases/cassandra && ./setup.sh and verify 3-node ring
  • Deploy visualizer/cloudformation.yaml on EC2 and access via public IP
  • Verify all shell scripts pass shellcheck + shellharden
  • Verify all markdown passes markdownlint

Summary by CodeRabbit

  • New Features

    • Interactive Database Scalability Visualizer (web UI + backend) with exercises for replication, consistency, and indexing.
    • Three new labs: MySQL (replication & ACID), MongoDB (replica sets & tunable consistency), Cassandra (multi-node & partition keys).
    • Local Docker and optional EC2 deployment with automated setup and cleanup scripts.
  • Documentation

    • Detailed lab guides, walkthroughs, troubleshooting, and learning objectives for all new labs.
  • Chores

    • Updated secret baseline metadata and adjusted tooling ignores/pre-commit exclusions.

Add 4 labs for Module 10 (Databases) covering replication, ACID
transactions, indexing, and schema design across MySQL, MongoDB,
and Cassandra.

Main lab: interactive web visualizer connected to a live MySQL
primary-replica cluster with animated SVG diagrams, real-time
latency measurement, built-in SQL console, and EC2 CloudFormation
support. Three tabs: Replication, Consistency (ACID), Schema &
Indexing.

Optional CLI labs: MySQL (GTID replication, ACID, EXPLAIN),
MongoDB (replica set, read/write concerns, denormalization),
Cassandra (multi-node ring, tunable consistency, partition keys).

Each lab includes docker-compose.yml, setup/cleanup scripts,
CloudFormation template for EC2 deployment, and step-by-step
instructions following the 13-section lab template.
Copilot AI review requested due to automatic review settings April 14, 2026 01:23
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Adds Lab 10 “Database Scalability”: MySQL, MongoDB, Cassandra labs plus an interactive visualizer. Introduces Docker Compose stacks, setup/cleanup scripts, CloudFormation templates, SQL/CQL/JS seed/init files, frontend assets (HTML/CSS/JS), and a Python multithreaded server exposing JSON APIs for visualizer interactions.

Changes

Cohort / File(s) Summary
Repo config & secrets
\.pre-commit-config.yaml, \.secrets.baseline
Extended check-yaml hook exclude to also skip 10-databases/.../cloudformation.yaml; updated .secrets.baseline timestamp, adjusted two line numbers and added four new secret findings.
Module docs & index
10-databases/LAB.md, 10-databases/README.md
Added Lab 10 overview, objectives, quickstarts, glossary, architecture diagram, tasks, troubleshooting, and links to sub-labs.
Visualizer (frontend)
10-databases/visualizer/index.html, 10-databases/visualizer/style.css, 10-databases/visualizer/app.js, 10-databases/visualizer/Dockerfile, 10-databases/visualizer/LAB-VISUALIZER.md
New interactive UI, styles, client controller, Dockerfile, and lab guide for browser visualizer with tabs, animations, SQL console, and polling sidebar.
Visualizer (backend & infra)
10-databases/visualizer/server.py, 10-databases/visualizer/docker-compose.yml, 10-databases/visualizer/setup.sh, 10-databases/visualizer/cleanup.sh, 10-databases/visualizer/cloudformation.yaml
New Python Threading HTTP server exposing endpoints for replication, consistency, schema, SQL exec, plus compose/CFN deployment and helper scripts.
MySQL lab
10-databases/mysql/*, 10-databases/visualizer/docker-compose.yml
Added MySQL primary/replica compose, init SQL (init/primary-init.sql), setup/cleanup scripts, CloudFormation template, and LAB-MYSQL guide; visualizer compose references these MySQL services.
MongoDB lab
10-databases/mongodb/*
Added 3-node MongoDB compose, rs-init script, setup/cleanup scripts, CloudFormation template, and LAB-MONGODB guide.
Cassandra lab
10-databases/cassandra/*
Added 3-node Cassandra compose, CQL schema/seed, setup/cleanup scripts, CloudFormation template, and LAB-CASSANDRA guide.
Misc tooling
eslint.config.js
Updated ignores to include 10-databases/mongodb/init/**.

Sequence Diagram(s)

sequenceDiagram
    participant User as User (Browser)
    participant FrontEnd as Visualizer Frontend
    participant Server as Python Server
    participant Primary as MySQL Primary
    participant Replica as MySQL Replica

    User->>FrontEnd: Click "Write & Replicate"
    FrontEnd->>Server: POST /api/replication/write
    Server->>Primary: INSERT student record
    Primary-->>Server: Insert OK
    Server->>Replica: SELECT by student_id
    alt Row present
        Replica-->>Server: Row found
        Server->>Replica: SHOW REPLICA STATUS
        Replica-->>Server: Seconds_Behind_Source
    else Not yet replicated
        Replica-->>Server: No row
        Server-->>Server: mark lagging
    end
    Server-->>FrontEnd: steps + lag
    FrontEnd->>User: Animate & display result
Loading
sequenceDiagram
    participant User as User (Browser)
    participant FrontEnd as Visualizer Frontend
    participant Server as Python Server
    participant Primary as MySQL Primary

    User->>FrontEnd: Click "ACID Transfer"
    FrontEnd->>Server: POST /api/consistency/transfer
    Server->>Primary: BEGIN
    Primary-->>Server: OK
    Server->>Primary: DELETE (origin)
    Primary-->>Server: rows deleted
    Server->>Primary: INSERT (destination)
    Primary-->>Server: insert OK
    alt Constraint violation
        Server->>Primary: COMMIT -> error
        Primary-->>Server: constraint error
        Server->>Primary: ROLLBACK
        Primary-->>Server: rolled back
    else Success
        Server->>Primary: COMMIT
        Primary-->>Server: committed
    end
    Server-->>FrontEnd: step trace & outcome
Loading
sequenceDiagram
    participant User as User (Browser)
    participant FrontEnd as Visualizer Frontend
    participant Server as Python Server
    participant Primary as MySQL Primary

    User->>FrontEnd: Click "Analyze Query"
    FrontEnd->>Server: POST /api/schema/explain (params)
    Server->>Primary: EXPLAIN SELECT ...
    Primary-->>Server: plan
    Server->>Primary: SELECT COUNT(*) (timed)
    Primary-->>Server: rows scanned
    Server-->>FrontEnd: plan + row count
    User->>FrontEnd: Click "Add Index"
    FrontEnd->>Server: POST /api/schema/add-index
    Server->>Primary: CREATE INDEX ...
    Primary-->>Server: index created
    User->>FrontEnd: Re-run "Analyze Query"
    FrontEnd->>Server: POST /api/schema/explain
    Server->>Primary: EXPLAIN SELECT ...
    Primary-->>Server: optimized plan
    Server-->>FrontEnd: improved plan + reduced rows scanned
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 A rabbit hops where databases play,

MySQL, Mongo, Cassandra in array,
The visualizer dances, arrows take flight,
Labs spin up containers deep into night,
I nibble a carrot and cheer: "Code takes flight!" 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add database scalability labs with interactive visualizer' accurately and concisely summarizes the main change in the PR, which is adding new Lab 10 content with both an interactive visualizer and optional CLI labs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/lab-10-database-scalability

Comment @coderabbitai help to get the list of available commands and usage tips.

- Use contextlib.suppress instead of try/except/pass (ruff SIM105)
- Use strict equality (!==) instead of loose (!=) (eslint eqeqeq)
- Exclude MongoDB shell scripts from ESLint (not browser JS)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Module 10 “Databases” lab content, centered on an interactive MySQL primary/replica visualizer (web UI + Python API bridge), plus optional CLI labs for MySQL, MongoDB, and Cassandra, along with updated module docs and supporting Docker/CloudFormation scaffolding.

Changes:

  • Add an interactive browser-based visualizer backed by a live MySQL primary/replica cluster (Docker Compose + Python API + HTML/JS/CSS).
  • Add optional CLI-based labs for MySQL, MongoDB, and Cassandra (compose files, setup/cleanup scripts, CloudFormation templates, and walkthrough docs).
  • Update Module 10 README/LAB index and adjust pre-commit YAML validation exclusions and secrets baseline.

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
10-databases/visualizer/style.css Visualizer UI styling (dark theme layout/components).
10-databases/visualizer/index.html Visualizer UI structure (tabs, SVG diagram, SQL console, sidebar).
10-databases/visualizer/app.js Front-end logic: animations, polling, event log, SQL console.
10-databases/visualizer/server.py Python HTTP server + API routes bridging UI actions to MySQL.
10-databases/visualizer/Dockerfile Container image for the Python API server.
10-databases/visualizer/docker-compose.yml Compose stack for MySQL primary/replica + visualizer.
10-databases/visualizer/setup.sh Starts compose stack and configures MySQL replication.
10-databases/visualizer/cleanup.sh Tears down visualizer stack and volumes.
10-databases/visualizer/cloudformation.yaml EC2 deployment template for running the visualizer remotely.
10-databases/visualizer/LAB-VISUALIZER.md Main lab instructions for the interactive visualizer.
10-databases/mysql/docker-compose.yml Optional MySQL CLI lab compose stack (primary/replica).
10-databases/mysql/init/primary-init.sql MySQL schema + seed data (incl. access_log for indexing demo).
10-databases/mysql/setup.sh Optional MySQL lab setup incl. replication configuration.
10-databases/mysql/cleanup.sh Optional MySQL lab teardown.
10-databases/mysql/cloudformation.yaml EC2 template for MySQL CLI lab.
10-databases/mysql/LAB-MYSQL.md Optional MySQL CLI walkthrough.
10-databases/mongodb/docker-compose.yml Optional MongoDB replica set compose stack.
10-databases/mongodb/init/rs-init.js Replica set init + seed data for MongoDB lab.
10-databases/mongodb/setup.sh MongoDB lab setup (start nodes, init RS, seed).
10-databases/mongodb/cleanup.sh MongoDB lab teardown.
10-databases/mongodb/cloudformation.yaml EC2 template for MongoDB lab.
10-databases/mongodb/LAB-MONGODB.md Optional MongoDB walkthrough.
10-databases/cassandra/docker-compose.yml Optional Cassandra 3-node cluster compose stack.
10-databases/cassandra/init/schema.cql Cassandra schema + seed data for partition-key exercises.
10-databases/cassandra/setup.sh Cassandra lab setup (wait for ring, load schema).
10-databases/cassandra/cleanup.sh Cassandra lab teardown.
10-databases/cassandra/cloudformation.yaml EC2 template for Cassandra lab.
10-databases/cassandra/LAB-CASSANDRA.md Optional Cassandra walkthrough.
10-databases/README.md Module 10 README rewritten to match the new lab structure.
10-databases/LAB.md Module 10 lab index + comparison table across DB paradigms.
.pre-commit-config.yaml Adds check-yaml exclusions for Module 10 CloudFormation files.
.secrets.baseline Updates detect-secrets baseline for new lab files/strings.

Comment thread 10-databases/visualizer/server.py
Comment thread 10-databases/mongodb/setup.sh
Comment thread 10-databases/mongodb/init/rs-init.js Outdated
Comment thread 10-databases/visualizer/setup.sh
Comment thread .pre-commit-config.yaml
Comment thread 10-databases/visualizer/server.py
Comment thread 10-databases/visualizer/server.py
Comment thread 10-databases/visualizer/server.py
- Replace deprecated isMaster with hello command (MongoDB)
- Fail loudly on replica set init error instead of ignoring
- Add Docker Compose version check to visualizer setup.sh
- Remove SET/USE from SQL console allowlist (prevents state changes)
- Fix conn.autocommit usage (method call, not property assignment)
- Wrap JSON body parsing in try/except for proper error response
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (11)
10-databases/mysql/cleanup.sh-10-14 (1)

10-14: ⚠️ Potential issue | 🟡 Minor

Don't silently report a successful cleanup.

Line 11 and Line 14 discard stderr and force success, so the script still prints Cleanup complete when Docker is missing or the resources were not removed. Keep the cleanup idempotent, but surface a warning on failure so students know the lab may not be fully reset.

⚠️ Proposed fix
 echo "Stopping containers..."
-docker compose down -v --remove-orphans 2>/dev/null || true
+if ! docker compose down -v --remove-orphans; then
+  echo "Warning: failed to stop/remove MySQL lab containers." >&2
+fi
 
 echo "Removing volumes..."
-docker volume rm mysql_primary-data mysql_replica-data 2>/dev/null || true
+if ! docker volume rm mysql_primary-data mysql_replica-data >/dev/null 2>&1; then
+  echo "Warning: failed to remove one or more MySQL lab volumes." >&2
+fi

As per coding guidelines, "Review for shell scripting best practices, proper error handling, and shellcheck/shellharden compliance."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/mysql/cleanup.sh` around lines 10 - 14, The cleanup script
currently swallows errors for the commands "docker compose down -v
--remove-orphans" and "docker volume rm mysql_primary-data mysql_replica-data"
by redirecting stderr and using "|| true", causing a misleading "Cleanup
complete" message; change each command to preserve idempotency but detect
failures (remove "2>/dev/null || true"), capture the exit status, and if
non‑zero emit a clear warning (e.g., "Warning: docker compose down failed:
<error>" / "Warning: docker volume rm failed: <error>") while still allowing the
script to continue, so students see when resources weren’t removed; reference
the two commands above when making the changes.
10-databases/mongodb/init/rs-init.js-31-32 (1)

31-32: ⚠️ Potential issue | 🟡 Minor

Use direct assignment to switch databases instead of shadowing with const.

Line 32 shadows the global db with a const db declaration. While this does not trigger a temporal dead zone error (the right-hand side resolves to the global db successfully), shadowing the global db is not a recommended pattern. Since subsequent code only needs the "university" database, use direct assignment instead: db = db.getSiblingDB("university"). This is the pattern recommended by MongoDB documentation for scripts and avoids shadowing the global object.

🛠️ Recommended fix
-const db = db.getSiblingDB("university");
+db = db.getSiblingGB("university");

Alternatively, if working with multiple databases simultaneously, use a different variable name: const universityDb = db.getSiblingDB("university") and update references accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/mongodb/init/rs-init.js` around lines 31 - 32, The current line
declares a new const db that shadows the global db ("const db =
db.getSiblingDB('university')"); change this to assign to the existing global db
(use db = db.getSiblingDB("university")) or, if you need both databases, rename
the variable to a distinct identifier (e.g., universityDb =
db.getSiblingDB("university")) and update subsequent references; update the line
using the getSiblingDB call accordingly to avoid shadowing.
10-databases/cassandra/setup.sh-65-67 (1)

65-67: ⚠️ Potential issue | 🟡 Minor

Keep cqlsh stderr visible during schema load.

If this step fails, set -e will stop the script, but 2>/dev/null hides the only useful error message.

📝 Suggested fix
 docker cp init/schema.cql cass1:/tmp/schema.cql
-docker exec cass1 cqlsh -f /tmp/schema.cql 2>/dev/null
+docker exec cass1 cqlsh -f /tmp/schema.cql
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/cassandra/setup.sh` around lines 65 - 67, The schema load hides
cqlsh errors by redirecting stderr in the command 'docker exec cass1 cqlsh -f
/tmp/schema.cql 2>/dev/null'; remove the '2>/dev/null' redirection (or
alternatively redirect only stdout) so cqlsh stderr is printed during 'docker
exec cass1 cqlsh -f /tmp/schema.cql', ensuring any failure messages are visible
when the script stops under set -e.
10-databases/mongodb/LAB-MONGODB.md-299-299 (1)

299-299: ⚠️ Potential issue | 🟡 Minor

Point students to the current PRIMARY, not always mongo1.

After an election, mongo1 may be secondary or unavailable, so this troubleshooting row can send users to the wrong node.

📝 Suggested fix
-| `MongoServerError: not primary` | Writing to a secondary | Connect to mongo1 (the initial primary) |
+| `MongoServerError: not primary` | Writing to a secondary | Run `rs.status()` and connect to the current PRIMARY |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/mongodb/LAB-MONGODB.md` at line 299, The troubleshooting row
that tells users to "Connect to mongo1 (the initial primary)" is unsafe because
after an election mongo1 may not be primary; update the guidance for the
"MongoServerError: not primary" entry to instruct users to identify and connect
to the current primary (e.g., using rs.status()/rs.isMaster()/isWritablePrimary
checks or the mongo shell's primary discovery) instead of hardcoding "mongo1",
and replace the literal reference to "mongo1" with a note to connect to the node
reported as PRIMARY by the replica set status.
10-databases/mongodb/LAB-MONGODB.md-115-120 (1)

115-120: ⚠️ Potential issue | 🟡 Minor

Handle replication lag in the secondary-read example.

If Step 2.2 runs before mongo2 has applied the insert, findOne() returns null and zoe.name throws. Add a null guard so the step teaches lag instead of crashing.

💡 Suggested fix
 docker exec mongo2 mongosh --quiet --eval "
 db.getMongo().setReadPref('secondary');
 use('university');
 const zoe = db.students.findOne({ name: 'Zoe Adams' });
-print('Found on secondary: ' + zoe.name + ' (' + zoe.major + ')');
+if (zoe) {
+  print('Found on secondary: ' + zoe.name + ' (' + zoe.major + ')');
+} else {
+  print('Not visible on mongo2 yet. Wait a second and rerun this step.');
+}
 "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/mongodb/LAB-MONGODB.md` around lines 115 - 120, The
secondary-read example can crash if the replica hasn't applied the insert
because zoe will be null; update the mongosh snippet that calls
db.getMongo().setReadPref('secondary') and db.students.findOne({ name: 'Zoe
Adams' }) to guard the result: check the zoe variable for null/undefined and
only access zoe.name and zoe.major when present, otherwise print a clear message
indicating the document wasn't found on the secondary (replication lag) so the
step demonstrates lag instead of throwing.
10-databases/README.md-21-30 (1)

21-30: ⚠️ Potential issue | 🟡 Minor

Use repo-relative paths in the runnable steps.

cd visualizer and visualizer/cloudformation.yaml only work if the reader has already changed into 10-databases. Point to 10-databases/visualizer/... explicitly so the instructions work from the repo root as written. As per coding guidelines, Directory paths in instructions must match actual directory names using kebab-case with NN- prefix (e.g., \03-load-balancing-haproxy`)`.

Also applies to: 121-124, 154-157

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/README.md` around lines 21 - 30, Update the repo-relative paths
so the runnable steps work from the repository root: replace the `cd visualizer`
step with `cd 10-databases/visualizer` (and similarly update the `./setup.sh`
invocation to reference the setup script under 10-databases/visualizer), change
the `visualizer/cloudformation.yaml` reference to
`10-databases/visualizer/cloudformation.yaml`, and update the `LAB.md` link to
point to the 10-databases copy; also make the same fixes for the other
occurrences noted (the other references to `visualizer/...` in the file) so all
path references are repo-root-relative.
10-databases/visualizer/index.html-271-273 (1)

271-273: ⚠️ Potential issue | 🟡 Minor

Add an explicit accessible name to the SQL query field.

This input is currently placeholder-only, so once the placeholder disappears the field has no reliable label for screen readers. An aria-label or hidden <label> keeps the console usable.

♿ Suggested fix
-                <input type="text" id="console-input" placeholder="SELECT * FROM students LIMIT 5;" />
+                <input type="text" id="console-input" aria-label="SQL query" placeholder="SELECT * FROM students LIMIT 5;" />
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/index.html` around lines 271 - 273, The SQL input
with id "console-input" is placeholder-only and missing an accessible name; add
an explicit label (either a visually hidden <label for="console-input">SQL
query</label> placed before the input or an aria-label/aria-labelledby attribute
on the input, e.g., aria-label="SQL query input") so screen readers can identify
the field, and ensure the label text clearly describes purpose (e.g., "SQL
query" or "Console SQL input").
10-databases/visualizer/LAB-VISUALIZER.md-69-76 (1)

69-76: ⚠️ Potential issue | 🟡 Minor

Use a versioned/local template source instead of main.

That URL can drift away from the repo snapshot a learner is following, so the EC2 path stops being reproducible over time. Point them at 10-databases/visualizer/cloudformation.yaml from the same checkout/tag instead. As per coding guidelines, [0-1][0-9]-*/**: Review lab content for technical accuracy, clear step-by-step instructions, and educational value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/LAB-VISUALIZER.md` around lines 69 - 76, The link in
Step 2 currently points to the raw `main` branch URL for cloudformation.yaml
which can drift; update the step to reference the versioned/local template
`10-databases/visualizer/cloudformation.yaml` from the same checkout or tag the
learner is using (or provide an explicit tag/commit SHA) instead of the
`https://raw.githubusercontent.com/.../main/10-databases/visualizer/cloudformation.yaml`
URL so the EC2 setup is reproducible.
10-databases/cassandra/cloudformation.yaml-30-30 (1)

30-30: ⚠️ Potential issue | 🟡 Minor

Instance type t3.large is not AWS Free Tier eligible.

Per coding guidelines, lab content should work in a standard AWS Free Tier environment. The t3.large instance incurs charges (~$0.0832/hour in us-east-1). The description explains the memory requirement for a 3-node Cassandra cluster, but consider adding a cost warning in the lab documentation or offering a reduced-memory alternative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/cassandra/cloudformation.yaml` at line 30, The template
currently hardcodes InstanceType: t3.large which is not AWS Free Tier eligible;
update the CloudFormation to either use a Free Tier instance (e.g., t3.micro) or
parameterize the instance type so labs default to a free-tier-safe value and
allow overriding for higher-memory needs—adjust the resource referencing
InstanceType (the InstanceType property in the EC2/NodeGroup resource) and add a
short lab doc note/warning about cost and a suggested larger-instance
alternative for realistic 3-node Cassandra testing.
10-databases/visualizer/cloudformation.yaml-35-35 (1)

35-35: ⚠️ Potential issue | 🟡 Minor

Instance type t3.medium is not AWS Free Tier eligible.

Per coding guidelines, lab content should work in AWS Free Tier. The t3.medium instance incurs charges. Consider documenting the expected cost or testing if t3.micro (free tier) can handle the MySQL primary-replica plus visualizer workload.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/cloudformation.yaml` at line 35, The CloudFormation
template currently sets the EC2 InstanceType to t3.medium which is not Free Tier
eligible; update the InstanceType value used in the template (the InstanceType
property for the visualizer/DB instance) to a Free Tier option such as t3.micro
(or parameterize it and default to t3.micro) and then test the MySQL
primary-replica plus visualizer workload for performance; if t3.micro cannot
sustain the workload, add a short comment or README entry documenting expected
costs and recommending an appropriate instance type (e.g., t3.small/t3.medium)
for non-free-tier usage.
10-databases/visualizer/app.js-236-237 (1)

236-237: ⚠️ Potential issue | 🟡 Minor

Use strict equality !== instead of !=.

Static analysis flagged this. The loose equality != can produce unexpected results with type coercion.

Proposed fix
-    const lag = state.replica?.lag;
-    $('#stat-lag').textContent = lag != null ? lag : '--';
+    const lag = state.replica?.lag;
+    $('#stat-lag').textContent = lag !== null && lag !== undefined ? lag : '--';

Or more idiomatically:

-    $('#stat-lag').textContent = lag != null ? lag : '--';
+    $('#stat-lag').textContent = lag ?? '--';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/app.js` around lines 236 - 237, Replace the loose
null check in the DOM update for $('#stat-lag').textContent by using strict
inequality: change the comparison in the block that updates
$('#stat-lag').textContent to use !== null (i.e., use "lag !== null" instead of
"lag != null") so the null/undefined check is type-safe; keep the existing
strict check for className (lag === 0) as-is.
🧹 Nitpick comments (6)
10-databases/visualizer/style.css (1)

30-32: Avoid a page-level 900px minimum width.

Line 31 hard-locks the visualizer to desktop width, so the lab overflows on smaller laptop/tablet windows and when users keep docs side-by-side. Prefer responsive breakpoints on .layout and .sidebar instead of a global min-width.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/style.css` around lines 30 - 32, Remove the global
html { min-width: 900px; } rule and implement responsive breakpoints on the
layout components instead: replace the page-level constraint with CSS media
queries targeting the .layout and .sidebar selectors so they adapt (collapse,
stack, or scroll) at smaller widths; update the styles for .layout and .sidebar
to set appropriate min-width / flex-basis / max-width behavior per breakpoint to
prevent overflow while preserving the desktop layout.
10-databases/visualizer/LAB-VISUALIZER.md (1)

429-452: Replace bold pseudo-headings with real markdown structure.

These bold lead-ins are functioning as section headers, which makes the document harder to scan in generated outlines and breaks the markdown style rule. As per coding guidelines, No bold text should be used as headings in markdown files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/LAB-VISUALIZER.md` around lines 429 - 452, Replace
the bold pseudo-headings (e.g., "**Check which students are enrolled in which
courses:**", "**Compare primary vs replica data:**", "**Test isolation by
running concurrent queries:**", "**Reset the database:**") with proper Markdown
headings (e.g., "## Check which students are enrolled in which courses", "##
Compare primary vs replica data", etc.), remove trailing colons from the heading
lines, and ensure the code block and subsequent instructions remain under those
headings so the document outline and generated TOC reflect real sections.
10-databases/visualizer/app.js (1)

87-99: API helpers lack error handling for network failures.

If fetch throws (network error, CORS issue), the promise rejects and propagates uncaught. Callers like doReplication don't catch this, which could leave the UI in a broken state with animating = true.

Proposed improvement
 async function apiPost(url, body) {
-  const res = await fetch(url, {
-    method: 'POST',
-    headers: { 'Content-Type': 'application/json' },
-    body: JSON.stringify(body),
-  });
-  return res.json();
+  try {
+    const res = await fetch(url, {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify(body),
+    });
+    return res.json();
+  } catch (err) {
+    return { error: `Network error: ${err.message}` };
+  }
 }
 
 async function apiGet(url) {
-  const res = await fetch(url);
-  return res.json();
+  try {
+    const res = await fetch(url);
+    return res.json();
+  } catch (err) {
+    return { error: `Network error: ${err.message}` };
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/app.js` around lines 87 - 99, apiPost and apiGet
currently call fetch without try/catch or checking response.ok, so network
errors or non-2xx responses can reject or return invalid JSON and callers like
doReplication (which sets animating = true) never get a controlled error; wrap
fetch in try/catch inside apiPost and apiGet, check res.ok and throw or return a
standardized error object when not ok, and on network exceptions return/throw a
clear error (include the original error message) so callers can catch it and
reset animating; update references to apiPost and apiGet callers (e.g.,
doReplication) to handle the thrown error or error object accordingly.
10-databases/visualizer/server.py (2)

372-380: Multi-statement detection can be bypassed with semicolons inside string literals.

The check ";" in query.rstrip(";") would reject a legitimate query like SELECT * FROM students WHERE name = 'O;Brien'. Conversely, a malicious user familiar with MySQL quirks might find edge cases. For an educational lab with a limited audience this is acceptable, but worth noting.

A more robust approach would be to use a SQL parser or limit the console to read-only operations on the replica. However, for a lab environment the current protection is pragmatic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/server.py` around lines 372 - 380, The semicolon
check in the multi-statement rejection (the block that inspects query and
ALLOWED_SQL) can be bypassed by semicolons inside string literals; replace the
simple `";" in query.rstrip(";")` check with a scan that ignores semicolons
inside single- or double-quoted strings (or strip/replace string literal
contents first) so only semicolons outside string literals count as statement
separators, then continue to rstrip a trailing semicolon and perform the
ALLOWED_SQL check using first_word = query.split()[0].upper().

358-362: Consider whether SET should be in the allowlist.

SET allows changing session variables (e.g., SET autocommit=0, SET sql_mode=...), which could affect subsequent queries in unexpected ways. For a transient lab this is low-risk, but removing it would tighten the surface area.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/server.py` around lines 358 - 362, ALLOWED_SQL
currently includes "SET", which permits changing session variables; to tighten
security remove "SET" from the ALLOWED_SQL frozenset in server.py (symbol:
ALLOWED_SQL) or instead replace it with a stricter allowance rule that parses
the first token and only permits specific safe SET variants (e.g., "SET NAMES"
or "SET SESSION TRANSACTION ISOLATION LEVEL") by handling SET specially before
the general ALLOWED_SQL check. Update any tests or comments referencing
ALLOWED_SQL accordingly.
10-databases/cassandra/cloudformation.yaml (1)

20-25: Security group opens SSH to 0.0.0.0/0 — acceptable for short-lived lab but worth documenting.

Static analysis flagged unrestricted SSH ingress (AWS-0107). For a transient educational lab this is pragmatic, but consider adding a comment in the template or lab instructions noting that students should delete the stack promptly after completing the lab.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/cassandra/cloudformation.yaml` around lines 20 - 25, The
SecurityGroupIngress rule allowing tcp port 22 from 0.0.0.0/0 (the
SecurityGroupIngress block, FromPort: 22, ToPort: 22, CidrIp: 0.0.0.0/0) should
be documented as a deliberate, short-lived allowance for the lab; add a clear
comment in the CloudFormation template near that SecurityGroupIngress block
stating this is for transient educational use and instructing students/operators
to delete the stack immediately after the lab (or replace the CIDR with a
restricted range/authorized IP if used long-term).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.pre-commit-config.yaml:
- Around line 13-14: The check-yaml exclusion currently skips all Module 10
CloudFormation templates via the pattern "10-databases/.*/cloudformation\.yaml";
remove that pattern (or replace it with a narrowly scoped exception for the
specific parser-incompatible file) from the exclusions in
.pre-commit-config.yaml so that all Module 10 (*.yml/*.yaml) CloudFormation
templates are validated by check-yaml; if you must keep an exclusion, document
the exact filename and reason and use that exact path instead of the broad
"10-databases/.*/cloudformation\.yaml" pattern.

In `@10-databases/cassandra/docker-compose.yml`:
- Around line 7-12: The docker-compose Cassandra services currently set
CASSANDRA_CLUSTER_NAME/CASSANDRA_DC/CASSANDRA_RACK/CASSANDRA_SEEDS/CASSANDRA_ENDPOINT_SNITCH
but do not constrain JVM memory, which causes OOM on Free Tier; either add
explicit low-memory JVM env vars MAX_HEAP_SIZE=128M and HEAP_NEWSIZE=32M to each
Cassandra service (e.g., the cass1, cass2, cass3 service blocks that define
CASSANDRA_* env vars) so each container uses small heaps, or alternatively add a
short note in the compose file documenting the minimum host requirement (e.g.,
t2.small or separate instances) if you choose not to lower the heap; update the
environment sections for the three node services to include those MAX_HEAP_SIZE
and HEAP_NEWSIZE values.

In `@10-databases/cassandra/init/schema.cql`:
- Around line 51-71: The seed INSERTs into the students table use uuid() so
every run creates new rows; make seeding idempotent by replacing uuid() calls
with fixed UUID literals and add IF NOT EXISTS to each INSERT (or alternatively
TRUNCATE the students table before seeding in setup.sh) so rerunning setup.sh
does not duplicate rows; update the INSERT statements referencing the students
table and the setup.sh invocation that replays schema.cql accordingly.

In `@10-databases/cassandra/setup.sh`:
- Around line 33-61: The current readiness loop only tests CQL connectivity
(docker exec ... cqlsh) which can pass before nodes have joined the ring; change
the readiness check to verify ring convergence via nodetool status and wait
until each node reports "UN" in the cluster (e.g., use docker exec cassX
nodetool status and grep for its hostname/state or query cass1's nodetool status
until all three nodes are UN), increase the timeout to a safer value (e.g., 300
seconds) and use a slightly larger sleep interval (e.g., 2–5s) to avoid tight
polling; update the loop around cass1 cass2 cass3 (and any messages) so it
breaks only when the node is UN and on timeout prints the nodetool and container
logs for debugging.

In `@10-databases/LAB.md`:
- Around line 54-63: The "Environment Options" section and its table (headers
"Environment Options", rows "Local" and "EC2", and artifacts like `./setup.sh`
and `cloudformation.yaml`) needs EC2 sizing guidance: update this section to
call out that multi-node labs (MongoDB, Cassandra) require larger instances
(recommend minimum t3.medium/t3.large or m5.large with 2+ vCPUs and >=4–8GB RAM
and adequate EBS) or advise users on alternatives (use Local/Docker or run
single-node MySQL first) and add a short note directing low-resource users to
the Local setup or to follow smaller single-node lab options; keep the guidance
brief and place it directly below the existing table.

In `@10-databases/mongodb/cloudformation.yaml`:
- Around line 15-38: Parameterize and harden the EC2 resources: add
CloudFormation Parameters for KeyName, InstanceType and SSHCidrIp (use sensible
placeholders/defaults and descriptions) and replace the hardcoded KeyName value
"vockey" and the SecurityGroupIngress CidrIp 0.0.0.0/0 with references to those
parameters (SSHCidrIp should default to a restricted CIDR or instruct users to
supply their IP/32). In the LabSecurityGroup/SecurityGroupIngress block use the
SSHCidrIp parameter; in LabInstance use the InstanceType and KeyName parameters.
Require IMDSv2 by adding MetadataOptions with HttpTokens: required to
LabInstance, and enforce encrypted root storage by adding BlockDeviceMappings on
LabInstance with Ebs.Encrypted: true (and KmsKeyId if you want CMK support).
Finally, change any user-facing key filename in outputs or user-data SSH
instructions to a generic placeholder like "<your-key-pair.pem>" instead of
"labsuser.pem".

In `@10-databases/mongodb/setup.sh`:
- Around line 61-72: The replica-set wait loop (the while loop using i, docker
exec mongo1 mongosh --eval "rs.status()...") currently falls through on timeout;
update it to detect failure after the loop and abort: after the loop, test
whether a PRIMARY was found (re-run the same rs.status() check or set a flag
inside the loop), and if not, print a clear error like "ERROR: replica set
primary not elected after timeout" to stderr and exit non-zero (exit 1) so the
script does not print "Environment ready" on failure.

In `@10-databases/mysql/cloudformation.yaml`:
- Around line 15-38: The template currently hardcodes sensitive/academy-specific
values and insecure defaults: replace the SSH output reference to "labsuser.pem"
with a placeholder like "YOUR_LAB_KEY_FILE"; make KeyName in the LabInstance
properties parameterized (replace the literal vockey with a Parameters
reference) and document it as Academy-only if necessary; change InstanceType in
LabInstance from t3.medium to a Free Tier eligible type such as t2.micro or
t3.micro (parameterize if you want flexibility); remove the hardcoded
SecurityGroupIngress CidrIp of 0.0.0.0/0 and instead reference a new parameter
(e.g., AllowedSSHRange or StudentIpCidr) to restrict SSH; and add
MetadataOptions to LabInstance to require IMDSv2 plus BlockDeviceMappings to set
the root volume to encrypted. Update any Outputs that mention labsuser.pem to
use the YOUR_LAB_KEY_FILE placeholder or the KeyName parameter.

In `@10-databases/README.md`:
- Around line 175-180: The README currently presents "BASE" as the definitive
NoSQL trade-off and contrasts it with "ACID"; update the wording around the
"BASE" and "ACID" bullets to qualify consistency/transaction guarantees as a
spectrum rather than a binary (e.g., keep the "ACID" bullet but remove or
rephrase "NoSQL = BASE, not ACID"), explicitly note that some NoSQL systems like
MongoDB can support ACID transactions, and adjust corresponding text in the
nearby section referenced (lines around "BASE" and also the content at 212-218)
so the materials accurately state that different databases provide differing
levels of atomicity/consistency/isolation/durability rather than implying all
NoSQL systems lack ACID.

In `@10-databases/visualizer/Dockerfile`:
- Around line 1-11: The container runs the HTTP server as root (server.py binds
0.0.0.0:8080 and Dockerfile uses root at runtime); create a non-root user and
drop privileges before starting the app: add steps to create a dedicated
user/group (e.g., "app"), chown the application files and any writable dirs,
switch to that user with the Dockerfile USER directive before CMD, and ensure
the server process runs under that user rather than root so the exposed service
is not running with container root privileges.

In `@10-databases/visualizer/setup.sh`:
- Around line 48-56: The script currently runs START REPLICA and prints
"Replication configured." without verifying replication state; modify the
setup.sh sequence after START REPLICA to poll the replica status (using mysql -e
"SHOW REPLICA STATUS\G" or SHOW SLAVE STATUS\G) and check fields
Replica_IO_Running and Replica_SQL_Running are "Yes" and Last_SQL_Error is empty
(or Seconds_Behind_Master is not NULL) with a short retry loop and timeout; only
print "Replication configured." when the checks pass, and print/log a
descriptive error and exit nonzero if the timeout is reached or errors are
reported by the replica.

---

Minor comments:
In `@10-databases/cassandra/cloudformation.yaml`:
- Line 30: The template currently hardcodes InstanceType: t3.large which is not
AWS Free Tier eligible; update the CloudFormation to either use a Free Tier
instance (e.g., t3.micro) or parameterize the instance type so labs default to a
free-tier-safe value and allow overriding for higher-memory needs—adjust the
resource referencing InstanceType (the InstanceType property in the
EC2/NodeGroup resource) and add a short lab doc note/warning about cost and a
suggested larger-instance alternative for realistic 3-node Cassandra testing.

In `@10-databases/cassandra/setup.sh`:
- Around line 65-67: The schema load hides cqlsh errors by redirecting stderr in
the command 'docker exec cass1 cqlsh -f /tmp/schema.cql 2>/dev/null'; remove the
'2>/dev/null' redirection (or alternatively redirect only stdout) so cqlsh
stderr is printed during 'docker exec cass1 cqlsh -f /tmp/schema.cql', ensuring
any failure messages are visible when the script stops under set -e.

In `@10-databases/mongodb/init/rs-init.js`:
- Around line 31-32: The current line declares a new const db that shadows the
global db ("const db = db.getSiblingDB('university')"); change this to assign to
the existing global db (use db = db.getSiblingDB("university")) or, if you need
both databases, rename the variable to a distinct identifier (e.g., universityDb
= db.getSiblingDB("university")) and update subsequent references; update the
line using the getSiblingDB call accordingly to avoid shadowing.

In `@10-databases/mongodb/LAB-MONGODB.md`:
- Line 299: The troubleshooting row that tells users to "Connect to mongo1 (the
initial primary)" is unsafe because after an election mongo1 may not be primary;
update the guidance for the "MongoServerError: not primary" entry to instruct
users to identify and connect to the current primary (e.g., using
rs.status()/rs.isMaster()/isWritablePrimary checks or the mongo shell's primary
discovery) instead of hardcoding "mongo1", and replace the literal reference to
"mongo1" with a note to connect to the node reported as PRIMARY by the replica
set status.
- Around line 115-120: The secondary-read example can crash if the replica
hasn't applied the insert because zoe will be null; update the mongosh snippet
that calls db.getMongo().setReadPref('secondary') and db.students.findOne({
name: 'Zoe Adams' }) to guard the result: check the zoe variable for
null/undefined and only access zoe.name and zoe.major when present, otherwise
print a clear message indicating the document wasn't found on the secondary
(replication lag) so the step demonstrates lag instead of throwing.

In `@10-databases/mysql/cleanup.sh`:
- Around line 10-14: The cleanup script currently swallows errors for the
commands "docker compose down -v --remove-orphans" and "docker volume rm
mysql_primary-data mysql_replica-data" by redirecting stderr and using "||
true", causing a misleading "Cleanup complete" message; change each command to
preserve idempotency but detect failures (remove "2>/dev/null || true"), capture
the exit status, and if non‑zero emit a clear warning (e.g., "Warning: docker
compose down failed: <error>" / "Warning: docker volume rm failed: <error>")
while still allowing the script to continue, so students see when resources
weren’t removed; reference the two commands above when making the changes.

In `@10-databases/README.md`:
- Around line 21-30: Update the repo-relative paths so the runnable steps work
from the repository root: replace the `cd visualizer` step with `cd
10-databases/visualizer` (and similarly update the `./setup.sh` invocation to
reference the setup script under 10-databases/visualizer), change the
`visualizer/cloudformation.yaml` reference to
`10-databases/visualizer/cloudformation.yaml`, and update the `LAB.md` link to
point to the 10-databases copy; also make the same fixes for the other
occurrences noted (the other references to `visualizer/...` in the file) so all
path references are repo-root-relative.

In `@10-databases/visualizer/app.js`:
- Around line 236-237: Replace the loose null check in the DOM update for
$('#stat-lag').textContent by using strict inequality: change the comparison in
the block that updates $('#stat-lag').textContent to use !== null (i.e., use
"lag !== null" instead of "lag != null") so the null/undefined check is
type-safe; keep the existing strict check for className (lag === 0) as-is.

In `@10-databases/visualizer/cloudformation.yaml`:
- Line 35: The CloudFormation template currently sets the EC2 InstanceType to
t3.medium which is not Free Tier eligible; update the InstanceType value used in
the template (the InstanceType property for the visualizer/DB instance) to a
Free Tier option such as t3.micro (or parameterize it and default to t3.micro)
and then test the MySQL primary-replica plus visualizer workload for
performance; if t3.micro cannot sustain the workload, add a short comment or
README entry documenting expected costs and recommending an appropriate instance
type (e.g., t3.small/t3.medium) for non-free-tier usage.

In `@10-databases/visualizer/index.html`:
- Around line 271-273: The SQL input with id "console-input" is placeholder-only
and missing an accessible name; add an explicit label (either a visually hidden
<label for="console-input">SQL query</label> placed before the input or an
aria-label/aria-labelledby attribute on the input, e.g., aria-label="SQL query
input") so screen readers can identify the field, and ensure the label text
clearly describes purpose (e.g., "SQL query" or "Console SQL input").

In `@10-databases/visualizer/LAB-VISUALIZER.md`:
- Around line 69-76: The link in Step 2 currently points to the raw `main`
branch URL for cloudformation.yaml which can drift; update the step to reference
the versioned/local template `10-databases/visualizer/cloudformation.yaml` from
the same checkout or tag the learner is using (or provide an explicit tag/commit
SHA) instead of the
`https://raw.githubusercontent.com/.../main/10-databases/visualizer/cloudformation.yaml`
URL so the EC2 setup is reproducible.

---

Nitpick comments:
In `@10-databases/cassandra/cloudformation.yaml`:
- Around line 20-25: The SecurityGroupIngress rule allowing tcp port 22 from
0.0.0.0/0 (the SecurityGroupIngress block, FromPort: 22, ToPort: 22, CidrIp:
0.0.0.0/0) should be documented as a deliberate, short-lived allowance for the
lab; add a clear comment in the CloudFormation template near that
SecurityGroupIngress block stating this is for transient educational use and
instructing students/operators to delete the stack immediately after the lab (or
replace the CIDR with a restricted range/authorized IP if used long-term).

In `@10-databases/visualizer/app.js`:
- Around line 87-99: apiPost and apiGet currently call fetch without try/catch
or checking response.ok, so network errors or non-2xx responses can reject or
return invalid JSON and callers like doReplication (which sets animating = true)
never get a controlled error; wrap fetch in try/catch inside apiPost and apiGet,
check res.ok and throw or return a standardized error object when not ok, and on
network exceptions return/throw a clear error (include the original error
message) so callers can catch it and reset animating; update references to
apiPost and apiGet callers (e.g., doReplication) to handle the thrown error or
error object accordingly.

In `@10-databases/visualizer/LAB-VISUALIZER.md`:
- Around line 429-452: Replace the bold pseudo-headings (e.g., "**Check which
students are enrolled in which courses:**", "**Compare primary vs replica
data:**", "**Test isolation by running concurrent queries:**", "**Reset the
database:**") with proper Markdown headings (e.g., "## Check which students are
enrolled in which courses", "## Compare primary vs replica data", etc.), remove
trailing colons from the heading lines, and ensure the code block and subsequent
instructions remain under those headings so the document outline and generated
TOC reflect real sections.

In `@10-databases/visualizer/server.py`:
- Around line 372-380: The semicolon check in the multi-statement rejection (the
block that inspects query and ALLOWED_SQL) can be bypassed by semicolons inside
string literals; replace the simple `";" in query.rstrip(";")` check with a scan
that ignores semicolons inside single- or double-quoted strings (or
strip/replace string literal contents first) so only semicolons outside string
literals count as statement separators, then continue to rstrip a trailing
semicolon and perform the ALLOWED_SQL check using first_word =
query.split()[0].upper().
- Around line 358-362: ALLOWED_SQL currently includes "SET", which permits
changing session variables; to tighten security remove "SET" from the
ALLOWED_SQL frozenset in server.py (symbol: ALLOWED_SQL) or instead replace it
with a stricter allowance rule that parses the first token and only permits
specific safe SET variants (e.g., "SET NAMES" or "SET SESSION TRANSACTION
ISOLATION LEVEL") by handling SET specially before the general ALLOWED_SQL
check. Update any tests or comments referencing ALLOWED_SQL accordingly.

In `@10-databases/visualizer/style.css`:
- Around line 30-32: Remove the global html { min-width: 900px; } rule and
implement responsive breakpoints on the layout components instead: replace the
page-level constraint with CSS media queries targeting the .layout and .sidebar
selectors so they adapt (collapse, stack, or scroll) at smaller widths; update
the styles for .layout and .sidebar to set appropriate min-width / flex-basis /
max-width behavior per breakpoint to prevent overflow while preserving the
desktop layout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 55d5d1d5-067f-4801-b04c-94240910a10f

📥 Commits

Reviewing files that changed from the base of the PR and between 3af0da4 and 68b1b3e.

📒 Files selected for processing (32)
  • .pre-commit-config.yaml
  • .secrets.baseline
  • 10-databases/LAB.md
  • 10-databases/README.md
  • 10-databases/cassandra/LAB-CASSANDRA.md
  • 10-databases/cassandra/cleanup.sh
  • 10-databases/cassandra/cloudformation.yaml
  • 10-databases/cassandra/docker-compose.yml
  • 10-databases/cassandra/init/schema.cql
  • 10-databases/cassandra/setup.sh
  • 10-databases/mongodb/LAB-MONGODB.md
  • 10-databases/mongodb/cleanup.sh
  • 10-databases/mongodb/cloudformation.yaml
  • 10-databases/mongodb/docker-compose.yml
  • 10-databases/mongodb/init/rs-init.js
  • 10-databases/mongodb/setup.sh
  • 10-databases/mysql/LAB-MYSQL.md
  • 10-databases/mysql/cleanup.sh
  • 10-databases/mysql/cloudformation.yaml
  • 10-databases/mysql/docker-compose.yml
  • 10-databases/mysql/init/primary-init.sql
  • 10-databases/mysql/setup.sh
  • 10-databases/visualizer/Dockerfile
  • 10-databases/visualizer/LAB-VISUALIZER.md
  • 10-databases/visualizer/app.js
  • 10-databases/visualizer/cleanup.sh
  • 10-databases/visualizer/cloudformation.yaml
  • 10-databases/visualizer/docker-compose.yml
  • 10-databases/visualizer/index.html
  • 10-databases/visualizer/server.py
  • 10-databases/visualizer/setup.sh
  • 10-databases/visualizer/style.css

Comment thread .pre-commit-config.yaml
Comment on lines +13 to +14
09-distributed-file-systems/cloudformation\.yaml|
10-databases/.*/cloudformation\.yaml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not exclude Module 10 CloudFormation templates from YAML validation.

Line 13–Line 14 broadens check-yaml exclusions to skip all 10-databases/*/cloudformation.yaml, which removes syntax validation for newly added infra templates. Keep these files validated and only carve out exclusions for parser-incompatible edge cases with clear justification.

Suggested fix
       - id: check-yaml
         exclude: |
           (?x)^(
             01-introduction-horizontal-scalability/ecsdemo-.*/kubernetes/.*|
             01-introduction-horizontal-scalability/ecsdemo-.*/copilot/.*|
             08-cloud-networking-vpc/cloudformation/.*|
-            09-distributed-file-systems/cloudformation\.yaml|
-            10-databases/.*/cloudformation\.yaml
+            09-distributed-file-systems/cloudformation\.yaml
           )$

Based on learnings: Applies to .yml,.yaml : All YAML files must pass check-yaml validation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pre-commit-config.yaml around lines 13 - 14, The check-yaml exclusion
currently skips all Module 10 CloudFormation templates via the pattern
"10-databases/.*/cloudformation\.yaml"; remove that pattern (or replace it with
a narrowly scoped exception for the specific parser-incompatible file) from the
exclusions in .pre-commit-config.yaml so that all Module 10 (*.yml/*.yaml)
CloudFormation templates are validated by check-yaml; if you must keep an
exclusion, document the exact filename and reason and use that exact path
instead of the broad "10-databases/.*/cloudformation\.yaml" pattern.

Comment on lines +7 to +12
environment:
CASSANDRA_CLUSTER_NAME: university-cluster
CASSANDRA_DC: dc1
CASSANDRA_RACK: rack1
CASSANDRA_SEEDS: cass1
CASSANDRA_ENDPOINT_SNITCH: GossipingPropertyFileSnitch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For the official Docker image cassandra:4.1, what JVM heap settings are used by default, and is a 3-node cluster expected to fit on an AWS Free Tier EC2 instance with 1 GiB RAM?

💡 Result:

The official Docker image cassandra:4.1 uses dynamic JVM heap settings calculated by cassandra-env.sh based on available container RAM. Heap sizes are set to max(min(1/2 RAM, 1024MB), min(1/4 RAM, 8192MB)) for MAX_HEAP_SIZE, with HEAP_NEWSIZE (young gen) as min(250MB per core capped by cores available, 1/4 heap). These are passed as -Xms/-Xmx/-Xmn if not statically defined in jvm-server.options (which has no static heap settings by default). In Docker without memory limits, it detects host RAM and sets large heaps (e.g., 1GB+). With limits like --memory=1g, it uses container-visible RAM (~1GB), yielding ~512MB heap. A 3-node Cassandra cluster on a single AWS Free Tier t2.micro EC2 (1 GiB RAM, ~0.9GB usable) will not fit. Even with custom low settings (e.g., MAX_HEAP_SIZE=128M HEAP_NEWSIZE=32M per node via env vars), total JVM heap alone is ~1GB+, exceeding RAM after OS/Docker overhead (~200-300MB). Sources confirm Cassandra struggles below ~300-500MB per instance; multi-node on 1GB leads to OOM (issue #277). Use separate instances or lighter DB.

Citations:


Free Tier instance configuration is insufficient for this 3-node cluster.

The official cassandra:4.1 image uses dynamic JVM heap sizing based on available container memory. On an AWS Free Tier t2.micro instance (1 GiB RAM, ~0.9 GB usable after OS overhead), a 3-node Cassandra cluster will exhaust available memory during bootstrapping. Even with explicit low-memory settings (e.g., MAX_HEAP_SIZE=128M HEAP_NEWSIZE=32M per node), the combined JVM heap footprint (~1 GB+) exceeds available RAM, causing OOM failures before students can run the exercises.

Either document the minimum instance requirement (at least t2.small or separate instances) or configure explicit low-memory mode (e.g., MAX_HEAP_SIZE=128M, HEAP_NEWSIZE=32M) for lines 8, 26, and 47 to make this viable for Free Tier learners.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/cassandra/docker-compose.yml` around lines 7 - 12, The
docker-compose Cassandra services currently set
CASSANDRA_CLUSTER_NAME/CASSANDRA_DC/CASSANDRA_RACK/CASSANDRA_SEEDS/CASSANDRA_ENDPOINT_SNITCH
but do not constrain JVM memory, which causes OOM on Free Tier; either add
explicit low-memory JVM env vars MAX_HEAP_SIZE=128M and HEAP_NEWSIZE=32M to each
Cassandra service (e.g., the cass1, cass2, cass3 service blocks that define
CASSANDRA_* env vars) so each container uses small heaps, or alternatively add a
short note in the compose file documenting the minimum host requirement (e.g.,
t2.small or separate instances) if you choose not to lower the heap; update the
environment sections for the three node services to include those MAX_HEAP_SIZE
and HEAP_NEWSIZE values.

Comment on lines +51 to +71
-- Seed students
INSERT INTO students (major, student_id, name, email)
VALUES ('Computer Science', uuid(), 'Alice Johnson', 'alice@university.edu');
INSERT INTO students (major, student_id, name, email)
VALUES ('Mathematics', uuid(), 'Bob Smith', 'bob@university.edu');
INSERT INTO students (major, student_id, name, email)
VALUES ('Computer Science', uuid(), 'Carol Davis', 'carol@university.edu');
INSERT INTO students (major, student_id, name, email)
VALUES ('Physics', uuid(), 'David Lee', 'david@university.edu');
INSERT INTO students (major, student_id, name, email)
VALUES ('Computer Science', uuid(), 'Eva Martinez', 'eva@university.edu');
INSERT INTO students (major, student_id, name, email)
VALUES ('Mathematics', uuid(), 'Frank Wilson', 'frank@university.edu');
INSERT INTO students (major, student_id, name, email)
VALUES ('Biology', uuid(), 'Grace Kim', 'grace@university.edu');
INSERT INTO students (major, student_id, name, email)
VALUES ('Computer Science', uuid(), 'Henry Brown', 'henry@university.edu');
INSERT INTO students (major, student_id, name, email)
VALUES ('Physics', uuid(), 'Iris Chen', 'iris@university.edu');
INSERT INTO students (major, student_id, name, email)
VALUES ('Mathematics', uuid(), 'Jack Taylor', 'jack@university.edu');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the student seed idempotent.

setup.sh replays this file on every run, and uuid() changes the primary key each time. Re-running setup adds another 10 student rows, so the lab no longer matches its documented counts. Use fixed UUID literals with IF NOT EXISTS, or clear the table before reseeding.

Based on learnings, "Lab instructions must work from a clean environment - flag hardcoded paths, missing prerequisites, or steps that assume prior state."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/cassandra/init/schema.cql` around lines 51 - 71, The seed
INSERTs into the students table use uuid() so every run creates new rows; make
seeding idempotent by replacing uuid() calls with fixed UUID literals and add IF
NOT EXISTS to each INSERT (or alternatively TRUNCATE the students table before
seeding in setup.sh) so rerunning setup.sh does not duplicate rows; update the
INSERT statements referencing the students table and the setup.sh invocation
that replays schema.cql accordingly.

Comment on lines +33 to +61
# Start Cassandra cluster (nodes start sequentially due to depends_on)
echo "Starting 3-node Cassandra cluster..."
echo " (This takes 2-4 minutes -- each node must join the ring)"
docker compose up -d

# Wait for all nodes to be healthy
for node in cass1 cass2 cass3; do
echo ""
echo "Waiting for $node to join the cluster..."
i=0
while [ "$i" -lt 180 ]; do
i=$((i + 1))
if docker exec "$node" cqlsh -e "SELECT now() FROM system.local" > /dev/null 2>&1; then
echo " $node: ready"
break
fi
if [ "$i" -eq 180 ]; then
echo "ERROR: $node did not start within 3 minutes."
docker compose logs "$node" | tail -20
exit 1
fi
sleep 1
done
done

# Check cluster status
echo ""
echo "Cluster status:"
docker exec cass1 nodetool status 2>/dev/null | grep -E "^(UN|DN|Datacenter|=)" || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wait for ring convergence, not just cqlsh readiness.

This loop only proves each node is accepting CQL. Cassandra can do that before the node is fully UN in the ring, so the lab can seed against a partially formed cluster, and the 180-second cap is tight on slower or low-memory hosts.

🛠️ Suggested hardening
+# Give slower hosts a little more room.
+startup_timeout_seconds="${CASSANDRA_STARTUP_TIMEOUT_SECONDS:-420}"
+
 # Wait for all nodes to be healthy
 for node in cass1 cass2 cass3; do
     echo ""
     echo "Waiting for $node to join the cluster..."
     i=0
-    while [ "$i" -lt 180 ]; do
+    while [ "$i" -lt "$startup_timeout_seconds" ]; do
         i=$((i + 1))
         if docker exec "$node" cqlsh -e "SELECT now() FROM system.local" > /dev/null 2>&1; then
             echo "  $node: ready"
             break
         fi
-        if [ "$i" -eq 180 ]; then
-            echo "ERROR: $node did not start within 3 minutes."
+        if [ "$i" -eq "$startup_timeout_seconds" ]; then
+            echo "ERROR: $node did not start within ${startup_timeout_seconds} seconds."
             docker compose logs "$node" | tail -20
             exit 1
         fi
         sleep 1
     done
 done
+
+echo ""
+echo "Waiting for the ring to report 3 UN nodes..."
+i=0
+while [ "$i" -lt "$startup_timeout_seconds" ]; do
+    i=$((i + 1))
+    if [ "$(docker exec cass1 nodetool status 2>/dev/null | grep -c '^UN')" -eq 3 ]; then
+        break
+    fi
+    if [ "$i" -eq "$startup_timeout_seconds" ]; then
+        echo "ERROR: Cassandra ring did not stabilize in time."
+        docker exec cass1 nodetool status || true
+        exit 1
+    fi
+    sleep 1
+done

As per coding guidelines, "Review lab content for technical accuracy, clear step-by-step instructions, and educational value. Flag any steps that may not work in a standard AWS Free Tier environment."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/cassandra/setup.sh` around lines 33 - 61, The current readiness
loop only tests CQL connectivity (docker exec ... cqlsh) which can pass before
nodes have joined the ring; change the readiness check to verify ring
convergence via nodetool status and wait until each node reports "UN" in the
cluster (e.g., use docker exec cassX nodetool status and grep for its
hostname/state or query cass1's nodetool status until all three nodes are UN),
increase the timeout to a safer value (e.g., 300 seconds) and use a slightly
larger sleep interval (e.g., 2–5s) to avoid tight polling; update the loop
around cass1 cass2 cass3 (and any messages) so it breaks only when the node is
UN and on timeout prints the nodetool and container logs for debugging.

Comment thread 10-databases/LAB.md
Comment on lines +54 to +63
## Environment Options

Each lab supports two environments:

| Environment | What You Need | Setup |
| --- | --- | --- |
| **Local** | Docker Desktop | `./setup.sh` in the lab directory |
| **EC2** | Browser + SSH | Upload `cloudformation.yaml` via AWS Console |

Run one lab at a time to avoid port conflicts and resource contention.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add EC2 sizing guidance for the multi-node labs.

This table implies every lab is fine on a generic EC2 host, but the MongoDB and Cassandra labs run multiple database nodes and are much heavier than a standard free-tier-style instance. Call out minimum sizing here, or steer low-resource users to local setup / MySQL first.

📝 Suggested fix
 | **Local** | Docker Desktop | `./setup.sh` in the lab directory |
 | **EC2** | Browser + SSH | Upload `cloudformation.yaml` via AWS Console |
 
+> Note: the MongoDB and Cassandra labs run multiple database nodes and typically need more memory than a standard AWS Free Tier-style instance.
+> If you're on a low-resource host, prefer local Docker or start with the MySQL lab first.
+
 Run one lab at a time to avoid port conflicts and resource contention.

As per coding guidelines, "Review lab content for technical accuracy, clear step-by-step instructions, and educational value. Flag any steps that may not work in a standard AWS Free Tier environment."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/LAB.md` around lines 54 - 63, The "Environment Options" section
and its table (headers "Environment Options", rows "Local" and "EC2", and
artifacts like `./setup.sh` and `cloudformation.yaml`) needs EC2 sizing
guidance: update this section to call out that multi-node labs (MongoDB,
Cassandra) require larger instances (recommend minimum t3.medium/t3.large or
m5.large with 2+ vCPUs and >=4–8GB RAM and adequate EBS) or advise users on
alternatives (use Local/Docker or run single-node MySQL first) and add a short
note directing low-resource users to the Local setup or to follow smaller
single-node lab options; keep the guidance brief and place it directly below the
existing table.

Comment thread 10-databases/mongodb/setup.sh Outdated
Comment on lines +61 to +72
docker exec mongo1 mongosh --quiet --file /tmp/rs-init.js 2>/dev/null || true

# Wait for replica set to stabilize
echo " Waiting for primary election..."
i=0
while [ "$i" -lt 30 ]; do
i=$((i + 1))
if docker exec mongo1 mongosh --quiet --eval "rs.status().members.find(m => m.stateStr === 'PRIMARY') ? 'ok' : ''" 2>/dev/null | grep -q "ok"; then
break
fi
sleep 1
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Abort if the replica set never elects a PRIMARY.

If replica-set init never succeeds, this loop just falls through and the script still prints Environment ready. That leaves students in a broken lab with no clear failure.

🛠️ Suggested timeout handling
 docker exec mongo1 mongosh --quiet --file /tmp/rs-init.js 2>/dev/null || true
 
 # Wait for replica set to stabilize
 echo "  Waiting for primary election..."
 i=0
+primary_ready=0
 while [ "$i" -lt 30 ]; do
     i=$((i + 1))
     if docker exec mongo1 mongosh --quiet --eval "rs.status().members.find(m => m.stateStr === 'PRIMARY') ? 'ok' : ''" 2>/dev/null | grep -q "ok"; then
+        primary_ready=1
         break
     fi
     sleep 1
 done
+
+if [ "$primary_ready" -ne 1 ]; then
+    echo "ERROR: Timed out waiting for primary election."
+    docker compose logs mongo1 mongo2 mongo3
+    exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
docker exec mongo1 mongosh --quiet --file /tmp/rs-init.js 2>/dev/null || true
# Wait for replica set to stabilize
echo " Waiting for primary election..."
i=0
while [ "$i" -lt 30 ]; do
i=$((i + 1))
if docker exec mongo1 mongosh --quiet --eval "rs.status().members.find(m => m.stateStr === 'PRIMARY') ? 'ok' : ''" 2>/dev/null | grep -q "ok"; then
break
fi
sleep 1
done
docker exec mongo1 mongosh --quiet --file /tmp/rs-init.js 2>/dev/null || true
# Wait for replica set to stabilize
echo " Waiting for primary election..."
i=0
primary_ready=0
while [ "$i" -lt 30 ]; do
i=$((i + 1))
if docker exec mongo1 mongosh --quiet --eval "rs.status().members.find(m => m.stateStr === 'PRIMARY') ? 'ok' : ''" 2>/dev/null | grep -q "ok"; then
primary_ready=1
break
fi
sleep 1
done
if [ "$primary_ready" -ne 1 ]; then
echo "ERROR: Timed out waiting for primary election."
docker compose logs mongo1 mongo2 mongo3
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/mongodb/setup.sh` around lines 61 - 72, The replica-set wait
loop (the while loop using i, docker exec mongo1 mongosh --eval
"rs.status()...") currently falls through on timeout; update it to detect
failure after the loop and abort: after the loop, test whether a PRIMARY was
found (re-run the same rs.status() check or set a flag inside the loop), and if
not, print a clear error like "ERROR: replica set primary not elected after
timeout" to stderr and exit non-zero (exit 1) so the script does not print
"Environment ready" on failure.

Comment on lines +15 to +38
LabSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Lab 10A MySQL - SSH access
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
Description: SSH access

LabInstance:
Type: AWS::EC2::Instance
Properties:
InstanceType: t3.medium
ImageId: !Ref LatestAmiId
KeyName: vockey
SecurityGroupIds:
- !GetAtt LabSecurityGroup.GroupId
Tags:
- Key: Name
Value: lab10a-mysql
UserData:
Fn::Base64: |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n 10-databases/mysql/cloudformation.yaml

Repository: gamaware/system-design-course

Length of output: 3128


🌐 Web query:

AWS Free Tier t3.medium EC2 instance eligible

💡 Result:

No, the AWS Free Tier t3.medium EC2 instance is not eligible. Official AWS documentation and Free Tier pages list eligible instances as t3.micro, t3.small, t4g.micro, t4g.small, c7i-flex.large, and m7i-flex.large for accounts created on or after July 15, 2025 (6-month duration), or t2.micro/t3.micro for older accounts (12-month). t3.medium is not included in any list. You can verify using the AWS CLI: aws ec2 describe-instance-types --filters Name=free-tier-eligible,Values=true --query "InstanceTypes[*].[InstanceType]".

Citations:


Remove hardcoded credential references and parameterize Academy-specific values.

This template hardcodes the credential filename labsuser.pem in the SSH output (lines 79–81), which violates the requirement to use placeholders for sensitive values. Replace with YOUR_LAB_KEY_FILE or similar. Additionally:

  • Line 31: KeyName: vockey is hardcoded and Academy-specific; parameterize or document that this is Academy-only.
  • Line 29: t3.medium is not AWS Free Tier eligible; use t2.micro or t3.micro to align with lab requirements.
  • Line 23: SSH access to 0.0.0.0/0 should be restricted to a parameter for the student's IP or documented safe range.
  • Add explicit MetadataOptions to enforce IMDSv2 and BlockDeviceMappings for encrypted root volume.

As per coding guidelines [0-1][0-9]-*/**: Review lab content for technical accuracy and ensure steps work in a standard AWS Free Tier environment.

🧰 Tools
🪛 Trivy (0.69.3)

[error] 26-38: aws_instance should activate session tokens for Instance Metadata Service.

Instance does not require IMDS access to require a token.

Rule: AWS-0028

Resource: 10-databases/mysql/cloudformation.yaml:26-38

Learn more

(IaC/AWS)


[error] 23-23: Security groups should not allow unrestricted ingress to SSH or RDP from any IP address.

Security group rule allows unrestricted ingress from any IP address.

Rule: AWS-0107

Resource: LabSecurityGroup

Learn more

(IaC/AWS)


[error] 26-38: Instance with unencrypted block device.

Root block device is not encrypted.

Rule: AWS-0131

Resource: 10-databases/mysql/cloudformation.yaml:26-38

Learn more

(IaC/AWS)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/mysql/cloudformation.yaml` around lines 15 - 38, The template
currently hardcodes sensitive/academy-specific values and insecure defaults:
replace the SSH output reference to "labsuser.pem" with a placeholder like
"YOUR_LAB_KEY_FILE"; make KeyName in the LabInstance properties parameterized
(replace the literal vockey with a Parameters reference) and document it as
Academy-only if necessary; change InstanceType in LabInstance from t3.medium to
a Free Tier eligible type such as t2.micro or t3.micro (parameterize if you want
flexibility); remove the hardcoded SecurityGroupIngress CidrIp of 0.0.0.0/0 and
instead reference a new parameter (e.g., AllowedSSHRange or StudentIpCidr) to
restrict SSH; and add MetadataOptions to LabInstance to require IMDSv2 plus
BlockDeviceMappings to set the root volume to encrypted. Update any Outputs that
mention labsuser.pem to use the YOUR_LAB_KEY_FILE placeholder or the KeyName
parameter.

Comment thread 10-databases/README.md
Comment on lines +175 to +180
| **Replication** | Copying data from a primary to replicas via binary log |
| **Read replica** | A read-only copy that offloads SELECT queries from the primary |
| **Replication lag** | Delay between a write on primary and its appearance on replica |
| **ACID** | Atomicity, Consistency, Isolation, Durability -- transaction guarantees |
| **Atomicity** | A transaction either fully succeeds (COMMIT) or fully fails (ROLLBACK) |
| **BASE** | Basically Available, Soft state, Eventually consistent -- NoSQL trade-off |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Please avoid teaching “NoSQL = BASE, not ACID.”

This module includes MongoDB, which supports ACID transactions, so framing BASE as the NoSQL trade-off is misleading. Qualify it as a spectrum of consistency/transaction guarantees instead. As per coding guidelines, [0-1][0-9]-*/**: Review lab content for technical accuracy, clear step-by-step instructions, and educational value.

Also applies to: 212-218

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/README.md` around lines 175 - 180, The README currently presents
"BASE" as the definitive NoSQL trade-off and contrasts it with "ACID"; update
the wording around the "BASE" and "ACID" bullets to qualify
consistency/transaction guarantees as a spectrum rather than a binary (e.g.,
keep the "ACID" bullet but remove or rephrase "NoSQL = BASE, not ACID"),
explicitly note that some NoSQL systems like MongoDB can support ACID
transactions, and adjust corresponding text in the nearby section referenced
(lines around "BASE" and also the content at 212-218) so the materials
accurately state that different databases provide differing levels of
atomicity/consistency/isolation/durability rather than implying all NoSQL
systems lack ACID.

Comment on lines +1 to +11
FROM python:3.13-slim

WORKDIR /app

RUN pip install --no-cache-dir pymysql==1.1.1 cryptography==44.0.3

COPY . .

EXPOSE 8080

CMD ["python", "server.py"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Drop root before starting the HTTP server.

This image currently runs a publicly exposed process as root. Given 10-databases/visualizer/server.py:495-506 binds 0.0.0.0:8080 and 10-databases/visualizer/docker-compose.yml:49-59 publishes that port, any bug in the request handler lands with full container privileges.

🔒 Proposed hardening
 FROM python:3.13-slim
 
 WORKDIR /app
 
 RUN pip install --no-cache-dir pymysql==1.1.1 cryptography==44.0.3
+
+RUN useradd --system --create-home appuser
 
 COPY . .
+
+RUN chown -R appuser:appuser /app
+USER appuser
 
 EXPOSE 8080
 
 CMD ["python", "server.py"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM python:3.13-slim
WORKDIR /app
RUN pip install --no-cache-dir pymysql==1.1.1 cryptography==44.0.3
COPY . .
EXPOSE 8080
CMD ["python", "server.py"]
FROM python:3.13-slim
WORKDIR /app
RUN pip install --no-cache-dir pymysql==1.1.1 cryptography==44.0.3
RUN useradd --system --create-home appuser
COPY . .
RUN chown -R appuser:appuser /app
USER appuser
EXPOSE 8080
CMD ["python", "server.py"]
🧰 Tools
🪛 Trivy (0.69.3)

[error] 1-1: Image user should not be 'root'

Specify at least 1 USER command in Dockerfile with non-root user as argument

Rule: DS-0002

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/Dockerfile` around lines 1 - 11, The container runs
the HTTP server as root (server.py binds 0.0.0.0:8080 and Dockerfile uses root
at runtime); create a non-root user and drop privileges before starting the app:
add steps to create a dedicated user/group (e.g., "app"), chown the application
files and any writable dirs, switch to that user with the Dockerfile USER
directive before CMD, and ensure the server process runs under that user rather
than root so the exposed service is not running with container root privileges.

Comment on lines +48 to +56
docker exec mysql-replica mysql -u root -prootpass \
-e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql-primary', SOURCE_USER='repl', SOURCE_PASSWORD='replpass', SOURCE_AUTO_POSITION=1, GET_SOURCE_PUBLIC_KEY=1; START REPLICA;" \
2>/dev/null

docker exec mysql-replica mysql -u root -prootpass \
-e "SET GLOBAL read_only = ON; SET GLOBAL super_read_only = ON;" \
2>/dev/null

echo " Replication configured."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Verify replication before declaring the lab ready.

START REPLICA only launches async workers. If the source config is wrong or the replica is still coming up, this script still says “Visualizer ready” even though the replication tab is broken.

🔁 Suggested health check before the success banner
 docker exec mysql-replica mysql -u root -prootpass \
     -e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql-primary', SOURCE_USER='repl', SOURCE_PASSWORD='replpass', SOURCE_AUTO_POSITION=1, GET_SOURCE_PUBLIC_KEY=1; START REPLICA;" \
     2>/dev/null
 
+echo "  Waiting for replica threads..."
+i=0
+replication_ready=0
+while [ "$i" -lt 30 ]; do
+    i=$((i + 1))
+    replica_status=$(docker exec mysql-replica mysql -u root -prootpass -e "SHOW REPLICA STATUS\\G" 2>/dev/null || true)
+    if grep -q "Replica_IO_Running: Yes" <<<"$replica_status" \
+        && grep -q "Replica_SQL_Running: Yes" <<<"$replica_status"; then
+        replication_ready=1
+        break
+    fi
+    sleep 1
+done
+
+if [ "$replication_ready" -ne 1 ]; then
+    echo "ERROR: Replication did not become healthy."
+    exit 1
+fi
+
 docker exec mysql-replica mysql -u root -prootpass \
     -e "SET GLOBAL read_only = ON; SET GLOBAL super_read_only = ON;" \
     2>/dev/null
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
docker exec mysql-replica mysql -u root -prootpass \
-e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql-primary', SOURCE_USER='repl', SOURCE_PASSWORD='replpass', SOURCE_AUTO_POSITION=1, GET_SOURCE_PUBLIC_KEY=1; START REPLICA;" \
2>/dev/null
docker exec mysql-replica mysql -u root -prootpass \
-e "SET GLOBAL read_only = ON; SET GLOBAL super_read_only = ON;" \
2>/dev/null
echo " Replication configured."
docker exec mysql-replica mysql -u root -prootpass \
-e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql-primary', SOURCE_USER='repl', SOURCE_PASSWORD='replpass', SOURCE_AUTO_POSITION=1, GET_SOURCE_PUBLIC_KEY=1; START REPLICA;" \
2>/dev/null
echo " Waiting for replica threads..."
i=0
replication_ready=0
while [ "$i" -lt 30 ]; do
i=$((i + 1))
replica_status=$(docker exec mysql-replica mysql -u root -prootpass -e "SHOW REPLICA STATUS\\G" 2>/dev/null || true)
if grep -q "Replica_IO_Running: Yes" <<<"$replica_status" \
&& grep -q "Replica_SQL_Running: Yes" <<<"$replica_status"; then
replication_ready=1
break
fi
sleep 1
done
if [ "$replication_ready" -ne 1 ]; then
echo "ERROR: Replication did not become healthy."
exit 1
fi
docker exec mysql-replica mysql -u root -prootpass \
-e "SET GLOBAL read_only = ON; SET GLOBAL super_read_only = ON;" \
2>/dev/null
echo " Replication configured."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/setup.sh` around lines 48 - 56, The script currently
runs START REPLICA and prints "Replication configured." without verifying
replication state; modify the setup.sh sequence after START REPLICA to poll the
replica status (using mysql -e "SHOW REPLICA STATUS\G" or SHOW SLAVE STATUS\G)
and check fields Replica_IO_Running and Replica_SQL_Running are "Yes" and
Last_SQL_Error is empty (or Seconds_Behind_Master is not NULL) with a short
retry loop and timeout; only print "Replication configured." when the checks
pass, and print/log a descriptive error and exit nonzero if the timeout is
reached or errors are reported by the replica.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
.secrets.baseline (1)

221-255: Triaged demo credentials should be allowlisted before baseline refresh.

The new findings on Line 221, Line 230, Line 239, and Line 248 appear to be intentional lab/demo credentials. Keeping them as recurring baseline entries increases scanner noise and can hide real leaks. Please add explicit allowlist annotations (or scoped excludes for these lab fixtures), then regenerate the baseline.

Based on learnings: Update detect-secrets baseline for false positives via detect-secrets scan --update .secrets.baseline.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.secrets.baseline around lines 221 - 255, The .secrets.baseline contains
intentional demo credentials for the entries referencing
"10-databases/mysql/docker-compose.yml", "10-databases/mysql/setup.sh",
"10-databases/visualizer/docker-compose.yml", and
"10-databases/visualizer/setup.sh" (see the hashed_secret values
e63a0ab2f8e7ecde486b42ebfec16d4434840af4 and
7667ad761bd61125e6eb84aafc63fe3bb914ab5b); update the baseline to explicitly
allowlist/scoped-exclude these lab/demo fixtures (or add allowlist annotations
for those specific hashed_secret entries) and then regenerate the baseline by
running detect-secrets scan --update .secrets.baseline so future scans treat
them as known false positives.
10-databases/visualizer/app.js (1)

81-81: Clear the remaining ESLint warnings.

pollTimer is never read, and consoleHistory never needs reassignment. Dropping the unused interval handle or actually clearing it on teardown, plus changing consoleHistory to const, will keep this file warning-free.

Also applies to: 468-468

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/app.js` at line 81, pollTimer is declared but never
read and consoleHistory is reassigned unnecessarily; change consoleHistory to a
const and either remove the unused pollTimer declaration or actually store the
interval ID into pollTimer when calling setInterval and clear it in the
component/page teardown (e.g., in the cleanup or unload handler) so the interval
gets cleared; update references to the interval creation/cleanup around
pollTimer and the consoleHistory declaration (use const consoleHistory) to
eliminate the ESLint warnings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@10-databases/visualizer/app.js`:
- Around line 87-99: The async helpers apiPost and apiGet and the caller
doExplain fail to unwind UI state on errors and dereference result.steps without
guarding result.error; wrap fetch + res.json calls in try/catch (or add .catch)
and ensure a finally block resets animating and re-enables buttons (the same
cleanup currently only on the happy path), and in doExplain check result.error
before accessing result.steps (handle/log error and return early) so backend
errors or rejected fetches don’t leave the UI stuck; update apiPost/apiGet to
propagate errors cleanly and callers (doExplain and the other flows mentioned)
to handle those errors and run the shared cleanup.
- Around line 156-171: The logEvent function and the sidebar/course-list
renderer currently assign user-derived strings directly into element.innerHTML
(e.g., entry.innerHTML using step.action, step.target, step.result and sidebar
course entries); replace these sinks by creating each span with
document.createElement, set className (e.g.,
'log-time','log-action','log-target','log-result','log-latency'), assign
textContent with the untrusted values (e.g., step.action, step.target,
step.result, `${step.latency_ms.toFixed(1)}ms`), and append them to the entry;
apply the same change to the course list rendering code so all dynamic content
is inserted via textContent/node creation instead of innerHTML.

In `@10-databases/visualizer/server.py`:
- Around line 152-169: The DELETE branch can be a no-op but the code still runs
the UPDATE that decrements courses.enrolled, causing counts to diverge; modify
the logic around the DELETE (the cur.execute("DELETE FROM enrollments ...") and
subsequent affected = cur.rowcount) to check affected == 0 and if so skip the
downstream UPDATE and any insertion into the target course (or perform a
transaction rollback/return early). Concretely: after the DELETE and reading
affected, if affected == 0 append a step_entry indicating "NO ROWS - no-op" and
return/abort the transfer before executing cur.execute("UPDATE courses SET
enrolled = enrolled - 1 ...") or any insert into the target, ensuring the
operations involving from_id, student_id, seq, steps and step_entry are not
performed when no enrollment was removed.
- Around line 336-352: db_reset currently runs on an autocommit connection and
only trims extra rows, leaving the DB in a partial state on errors and unable to
fully restore the original seed rows or access_log index; change db_reset to
open a transactional connection (use get_conn(PRIMARY_HOST) with autocommit
disabled or begin a transaction), perform all DELETE/UPDATE/INSERT/DROP/CREATE
operations inside a try block, and on success commit once at the end and on any
exception rollback and re-raise or return an error; also ensure the handler
restores the exact seed baseline (recreate missing seed rows and recreate
access_log/index if needed rather than only suppressing OperationalError) so
db_reset is a true atomic revert (refer to db_reset, get_conn, the cursor block,
and the contextlib.suppress(pymysql.err.OperationalError) usage to locate where
to implement the transactional commit/rollback and stronger restoration logic).
- Around line 479-484: The current _serve_static method uses startswith to check
the resolved file path against the current working directory which is unsafe;
change the check to compute the realpath of the base directory (e.g., base =
os.path.realpath(os.getcwd())) and use os.path.commonpath([base, resolved]) ==
base to ensure the resolved path is within the base jail before proceeding to
open or serve the file; update the error branch to return a 403 if the
commonpath check fails.
- Around line 19-25: The code currently opens a fresh autocommit connection per
request in get_conn() and sql_exec(), so transaction verbs (START, COMMIT,
ROLLBACK) in ALLOWED_SQL can't work; either remove those verbs from ALLOWED_SQL
or implement per-client session persistence: modify get_conn()/sql_exec() to
reuse a persistent connection per client (e.g., key by session cookie or client
id) stored in an in-memory dict, create connections with autocommit=False, route
SQL requests for a given client to that stored connection, expose a cleanup to
close the connection on session end, and ensure sql_exec() applies
commit/rollback on the stored connection when receiving COMMIT/ROLLBACK
statements; reference functions/variables: get_conn, sql_exec, ALLOWED_SQL.

---

Nitpick comments:
In @.secrets.baseline:
- Around line 221-255: The .secrets.baseline contains intentional demo
credentials for the entries referencing "10-databases/mysql/docker-compose.yml",
"10-databases/mysql/setup.sh", "10-databases/visualizer/docker-compose.yml", and
"10-databases/visualizer/setup.sh" (see the hashed_secret values
e63a0ab2f8e7ecde486b42ebfec16d4434840af4 and
7667ad761bd61125e6eb84aafc63fe3bb914ab5b); update the baseline to explicitly
allowlist/scoped-exclude these lab/demo fixtures (or add allowlist annotations
for those specific hashed_secret entries) and then regenerate the baseline by
running detect-secrets scan --update .secrets.baseline so future scans treat
them as known false positives.

In `@10-databases/visualizer/app.js`:
- Line 81: pollTimer is declared but never read and consoleHistory is reassigned
unnecessarily; change consoleHistory to a const and either remove the unused
pollTimer declaration or actually store the interval ID into pollTimer when
calling setInterval and clear it in the component/page teardown (e.g., in the
cleanup or unload handler) so the interval gets cleared; update references to
the interval creation/cleanup around pollTimer and the consoleHistory
declaration (use const consoleHistory) to eliminate the ESLint warnings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3fb13917-2313-4fb9-8cce-36f05e922c4c

📥 Commits

Reviewing files that changed from the base of the PR and between 68b1b3e and 5e41b4f.

📒 Files selected for processing (7)
  • .secrets.baseline
  • 10-databases/mongodb/init/rs-init.js
  • 10-databases/mongodb/setup.sh
  • 10-databases/visualizer/app.js
  • 10-databases/visualizer/server.py
  • 10-databases/visualizer/setup.sh
  • eslint.config.js
✅ Files skipped from review due to trivial changes (2)
  • eslint.config.js
  • 10-databases/mongodb/init/rs-init.js
🚧 Files skipped from review as they are similar to previous changes (2)
  • 10-databases/mongodb/setup.sh
  • 10-databases/visualizer/setup.sh

Comment on lines +87 to +99
async function apiPost(url, body) {
const res = await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body),
});
return res.json();
}

async function apiGet(url) {
const res = await fetch(url);
return res.json();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Always unwind UI state in the async flows.

A rejected fetch()/res.json() leaves animating = true and all buttons disabled because cleanup only runs on the happy path. doExplain() also skips the result.error guard and dereferences result.steps unconditionally, so a normal backend error crashes the visualizer before it can recover.

As per coding guidelines, review lab content for technical accuracy, clear step-by-step instructions, and educational value.

Also applies to: 263-311, 315-376, 380-416

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/app.js` around lines 87 - 99, The async helpers
apiPost and apiGet and the caller doExplain fail to unwind UI state on errors
and dereference result.steps without guarding result.error; wrap fetch +
res.json calls in try/catch (or add .catch) and ensure a finally block resets
animating and re-enables buttons (the same cleanup currently only on the happy
path), and in doExplain check result.error before accessing result.steps
(handle/log error and return early) so backend errors or rejected fetches don’t
leave the UI stuck; update apiPost/apiGet to propagate errors cleanly and
callers (doExplain and the other flows mentioned) to handle those errors and run
the shared cleanup.

Comment on lines +156 to +171
function logEvent(step) {
const log = $('#event-log');
const entry = document.createElement('div');
entry.className = 'log-entry';

const time = new Date().toLocaleTimeString('en-US', { hour12: false });
const targetCls = step.target === 'primary' ? 'primary' : 'replica';
const resultCls = step.result === 'OK' || step.result === 'FOUND' ||
step.result === 'COMMITTED' ? 'ok' : 'error';

entry.innerHTML =
`<span class="log-time">${time}</span>` +
`<span class="log-action">${step.action}</span>` +
`<span class="log-target ${targetCls}">${step.target}</span>` +
`<span class="log-result ${resultCls}">${step.result}</span>` +
`<span class="log-latency">${step.latency_ms.toFixed(1)}ms</span>`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace the dynamic innerHTML sinks.

logEvent() and the course list both render dynamic strings through innerHTML. Here, Line 540 derives step.action from raw SQL input, and the sidebar renders DB-backed values, so a learner can inject markup back into the page through the built-in console. Build those nodes with textContent instead.

Also applies to: 245-250, 539-543

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/app.js` around lines 156 - 171, The logEvent function
and the sidebar/course-list renderer currently assign user-derived strings
directly into element.innerHTML (e.g., entry.innerHTML using step.action,
step.target, step.result and sidebar course entries); replace these sinks by
creating each span with document.createElement, set className (e.g.,
'log-time','log-action','log-target','log-result','log-latency'), assign
textContent with the untrusted values (e.g., step.action, step.target,
step.result, `${step.latency_ms.toFixed(1)}ms`), and append them to the entry;
apply the same change to the course list rendering code so all dynamic content
is inserted via textContent/node creation instead of innerHTML.

Comment on lines +19 to +25
def get_conn(host):
"""Return a MySQL connection to the specified host."""
return pymysql.connect(
host=host, user=MYSQL_USER, password=MYSQL_PASS,
database=MYSQL_DB, cursorclass=pymysql.cursors.DictCursor,
autocommit=True,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd server.py

Repository: gamaware/system-design-course

Length of output: 106


🏁 Script executed:

cat -n 10-databases/visualizer/server.py | head -100

Repository: gamaware/system-design-course

Length of output: 3991


🏁 Script executed:

wc -l 10-databases/visualizer/server.py

Repository: gamaware/system-design-course

Length of output: 110


🏁 Script executed:

cat -n 10-databases/visualizer/server.py | sed -n '350,410p'

Repository: gamaware/system-design-course

Length of output: 2618


🏁 Script executed:

cat -n 10-databases/visualizer/server.py | sed -n '200,360p'

Repository: gamaware/system-design-course

Length of output: 7065


Remove transaction verbs from ALLOWED_SQL or implement per-client session persistence.

sql_exec() creates a fresh connection per request and get_conn() enables autocommit=True, which means START, COMMIT, and ROLLBACK cannot function as learners would expect. Each statement commits immediately, and statements cannot span requests. Learners will be confused when these commands don't behave like transactions. Either remove the transaction verbs (lines 360) or persist a session per client.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/server.py` around lines 19 - 25, The code currently
opens a fresh autocommit connection per request in get_conn() and sql_exec(), so
transaction verbs (START, COMMIT, ROLLBACK) in ALLOWED_SQL can't work; either
remove those verbs from ALLOWED_SQL or implement per-client session persistence:
modify get_conn()/sql_exec() to reuse a persistent connection per client (e.g.,
key by session cookie or client id) stored in an in-memory dict, create
connections with autocommit=False, route SQL requests for a given client to that
stored connection, expose a cleanup to close the connection on session end, and
ensure sql_exec() applies commit/rollback on the stored connection when
receiving COMMIT/ROLLBACK statements; reference functions/variables: get_conn,
sql_exec, ALLOWED_SQL.

Comment on lines +152 to +169
# Delete old enrollment
t0 = time.perf_counter()
cur.execute("DELETE FROM enrollments WHERE student_id = %s AND course_id = %s",
(student_id, from_id))
affected = cur.rowcount
t1 = time.perf_counter()
steps.append(step_entry(seq, f"DELETE enrollment ({from_course})", "primary",
"OK" if affected > 0 else "NO ROWS",
(t1 - t0) * 1000, {"rows_affected": affected}))
seq += 1

# Update from-course count
t0 = time.perf_counter()
cur.execute("UPDATE courses SET enrolled = enrolled - 1 WHERE course_id = %s AND enrolled > 0",
(from_id,))
t1 = time.perf_counter()
steps.append(step_entry(seq, f"UPDATE {from_course} enrolled-1", "primary", "OK",
(t1 - t0) * 1000))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rollback when the source enrollment is missing.

If the DELETE touches 0 rows, the code still decrements courses.enrolled and may then insert into the target course, so courses.enrolled can diverge from the actual enrollments table. Treat that branch as a no-op or rollback before updating either course count.

As per coding guidelines, review lab content for technical accuracy, clear step-by-step instructions, and educational value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/server.py` around lines 152 - 169, The DELETE branch
can be a no-op but the code still runs the UPDATE that decrements
courses.enrolled, causing counts to diverge; modify the logic around the DELETE
(the cur.execute("DELETE FROM enrollments ...") and subsequent affected =
cur.rowcount) to check affected == 0 and if so skip the downstream UPDATE and
any insertion into the target course (or perform a transaction rollback/return
early). Concretely: after the DELETE and reading affected, if affected == 0
append a step_entry indicating "NO ROWS - no-op" and return/abort the transfer
before executing cur.execute("UPDATE courses SET enrolled = enrolled - 1 ...")
or any insert into the target, ensuring the operations involving from_id,
student_id, seq, steps and step_entry are not performed when no enrollment was
removed.

Comment on lines +336 to +352
def db_reset(_body):
"""Reset database to initial state."""
conn = get_conn(PRIMARY_HOST)
try:
with conn.cursor() as cur:
cur.execute("DELETE FROM enrollments")
cur.execute("DELETE FROM students WHERE student_id > 10")
cur.execute("DELETE FROM courses WHERE code NOT IN ('CS101','CS201','MATH101','PHYS101')")
cur.execute("""UPDATE courses SET enrolled = CASE code
WHEN 'CS101' THEN 4 WHEN 'CS201' THEN 2
WHEN 'MATH101' THEN 2 WHEN 'PHYS101' THEN 2 END""")
# Re-insert original enrollments
cur.execute("""INSERT INTO enrollments (student_id, course_id) VALUES
(1,1),(3,1),(5,1),(8,1),(1,2),(3,2),(2,3),(6,3),(4,4),(9,4)""")
with contextlib.suppress(pymysql.err.OperationalError):
cur.execute("DROP INDEX idx_student_resource ON access_log")
return {"result": "OK", "message": "Database reset to initial state"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

db_reset() needs to be a real, atomic reset.

This runs on an autocommit connection, so any mid-reset failure leaves the lab partially mutated. It also only trims extra rows; if the SQL console removed one of the seed rows from 10-databases/mysql/init/primary-init.sql:47-60 or changed access_log, this handler cannot restore the baseline state that the exercises depend on.

As per coding guidelines, review lab content for technical accuracy, clear step-by-step instructions, and educational value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/server.py` around lines 336 - 352, db_reset currently
runs on an autocommit connection and only trims extra rows, leaving the DB in a
partial state on errors and unable to fully restore the original seed rows or
access_log index; change db_reset to open a transactional connection (use
get_conn(PRIMARY_HOST) with autocommit disabled or begin a transaction), perform
all DELETE/UPDATE/INSERT/DROP/CREATE operations inside a try block, and on
success commit once at the end and on any exception rollback and re-raise or
return an error; also ensure the handler restores the exact seed baseline
(recreate missing seed rows and recreate access_log/index if needed rather than
only suppressing OperationalError) so db_reset is a true atomic revert (refer to
db_reset, get_conn, the cursor block, and the
contextlib.suppress(pymysql.err.OperationalError) usage to locate where to
implement the transactional commit/rollback and stronger restoration logic).

Comment on lines +479 to +484
def _serve_static(self, path):
filename = path.lstrip("/")
resolved = os.path.realpath(filename)
if not resolved.startswith(os.path.realpath(os.getcwd())):
self._send_json({"error": "Forbidden"}, 403)
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use commonpath() for the static-file jail.

startswith() on real paths is only a prefix check, so a sibling path such as /app2/... still passes when the base directory is /app. Compare the resolved file path with the base directory via os.path.commonpath() before opening the file.

🔒 Proposed fix
     def _serve_static(self, path):
-        filename = path.lstrip("/")
-        resolved = os.path.realpath(filename)
-        if not resolved.startswith(os.path.realpath(os.getcwd())):
+        base = os.path.realpath(os.getcwd())
+        filename = path.lstrip("/")
+        resolved = os.path.realpath(os.path.join(base, filename))
+        if os.path.commonpath([base, resolved]) != base:
             self._send_json({"error": "Forbidden"}, 403)
             return
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@10-databases/visualizer/server.py` around lines 479 - 484, The current
_serve_static method uses startswith to check the resolved file path against the
current working directory which is unsafe; change the check to compute the
realpath of the base directory (e.g., base = os.path.realpath(os.getcwd())) and
use os.path.commonpath([base, resolved]) == base to ensure the resolved path is
within the base jail before proceeding to open or serve the file; update the
error branch to return a 403 if the commonpath check fails.

@gamaware
Copy link
Copy Markdown
Owner Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Reviewed 32 files across 5 dimensions: CLAUDE.md adherence, shallow bug scan, git history context, previous PR comment patterns, and code comment compliance. Six candidate issues were identified and scored; all fell below the confidence threshold (highest was 50/100 for the ESLint MongoDB shell exclusion, which is technically justified by structurally incompatible syntax).

@gamaware gamaware merged commit 3561920 into main Apr 14, 2026
12 of 15 checks passed
@gamaware gamaware deleted the feat/lab-10-database-scalability branch April 14, 2026 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants