implemented DBservices according to new db schema#15
Conversation
📝 WalkthroughWalkthroughMigrates the database from a YouTube-centric schema to a normalized, platform-agnostic v2 schema: adds ORM models, a transactional migration CLI with dry-run support, an index-creation script, updated PostgreSQL FTS RPCs, refactors backend queries to join/aggregate the new schema, and updates types and frontend duration formatting. ChangesDatabase Schema Migration & Adaptation
Sequence Diagram(s)sequenceDiagram
participant ComponentA
participant ComponentB
ComponentA->>ComponentB: observable interaction
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/src/services/supabaseService.js (1)
447-480:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't drop non-array tag payloads.
Line 448 treats anything except a JSON array as “no tags”. In this PR, the shared contracts now allow object-shaped
tags, so those rows will contribute nothing totopicsortagshere. Normalize the JSONB value first instead of silently defaulting to[].🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/services/supabaseService.js` around lines 447 - 480, The code currently sets rowTags = Array.isArray(row.tags) ? row.tags : [] which drops object-shaped JSONB tag payloads; instead parse/normalize row.tags into a consistent array before using it (e.g., convert single-string, object-shaped, or array payloads into an array of tag values) so Speaker loop, topic loop and tag loop operate on valid entries; update the logic around rowTags creation (the variable used in the speakerMap/topicMap/tagSet flows) to normalize JSONB inputs first (using existing helpers like cleanLabel/normalizeLabel where appropriate) rather than defaulting to [].
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/src/services/supabaseService.js`:
- Around line 290-303: The rank calculation omits summaryVector so matches found
only in the cached TL;DR (su.content) get zero score; update the ts_rank call in
the SQL fragment (the expression producing ts_rank(... ) AS rank) to include
summaryVector alongside titleDescVector and textVector (e.g., combine
titleDescVector || textVector || summaryVector, ${ftsQuery}) so that hits in
su.content affect ranking; ensure the same ${ftsQuery} and summaryVector symbol
names are used to match the WHERE clause that checks ${summaryVector} @@
${ftsQuery}.
- Around line 366-382: The current read-then-write using query around the
summaries table (select by transcript_id and summary_type then update or insert)
is not atomic; replace that logic in supabaseService.js (the block that
references query, transcriptId, type, content and existing.rows) with a single
upsert SQL using INSERT ... ON CONFLICT (transcript_id, summary_type) DO UPDATE
so the write is atomic; ensure the DO UPDATE sets content = EXCLUDED.content and
created_at = NOW() (or equivalent) and call query once with parameters
[transcriptId, type, content].
In `@backend/supabase/migrations/001_full_text_search.sql`:
- Around line 57-61: The ts_rank calculation currently only includes
to_tsvector('english', COALESCE(c.title, '') || ' ' || COALESCE(c.description,
'')) and the transcript vector (t.corrected_text / t.raw_text) so matches found
only in the summary field su.content will produce rank = 0; update the ts_rank
expression to also include to_tsvector('english', COALESCE(su.content, ''))
alongside the existing vectors so that ranking reflects the same fields used in
the WHERE tsquery predicate (ensure the same vector concatenation used in Lines
83-85 is included in the ts_rank call that uses tsquery_val).
In `@NEW_DB_SCRIPTS/add_indexes.py`:
- Around line 82-87: The script catches exceptions from add_indexes() but only
logs them and prints a traceback, leaving the process to exit successfully;
update the except block in add_indexes.py (the try around add_indexes()) to
terminate with a non-zero status—either re-raise the exception or call
sys.exit(1) after logging the error (and add an import for sys if needed) so
failures cause a failing process exit.
In `@NEW_DB_SCRIPTS/Description.md`:
- Around line 6-8: The deployment docs reference old paths `app/models.py`,
`scripts/migrate_schema.py`, and `scripts/add_indexes.py` but this PR moved
those files under NEW_DB_SCRIPTS; update the docs (including the block around
lines 15-28) to point to `NEW_DB_SCRIPTS/app/models.py`,
`NEW_DB_SCRIPTS/migrate_schema.py`, and `NEW_DB_SCRIPTS/add_indexes.py` (or the
exact filenames as committed) and adjust any example commands to use
NEW_DB_SCRIPTS/ so operators run the correct scripts.
In `@NEW_DB_SCRIPTS/migrate_schema.py`:
- Around line 18-19: The script imports Base from app.models but should use the
v2 metadata defined in NEW_DB_SCRIPTS/models.py to ensure DDL
(create_all/migration SQL) targets the new schema; update the import to pull the
metadata or Base-equivalent from NEW_DB_SCRIPTS.models (the v2 metadata object
used for create_all) and pass that metadata into engine.create_all / SQL
generation instead of app.models.Base, keeping get_session and _get_engine usage
unchanged.
- Around line 268-273: The except block that catches failures from
run_migration(dry_run=args.dry_run) currently only logs the error and prints a
traceback, leaving the process exit code as zero; update the handler to
terminate the process with a non-zero exit code (e.g., call sys.exit(1) or
re-raise the exception) after logging so CI/CD detects failures—modify the
except block that references logger.error and traceback.print_exc to import sys
if needed and exit with a non-zero status.
- Around line 141-143: The current backfill loads all rows via
conn.execute(text("SELECT * FROM old_transcripts")).fetchall(), which can OOM;
change this to stream or batch the query results instead—use a server-side
cursor/streaming execution (e.g., execution_options(stream_results=True) or the
DB driver's cursor) or iterate with fetchmany/LIMIT+OFFSET to process chunks,
replacing the fetchall() usage and adjusting the logger.info that reports
progress to count per-batch (refer to the transcripts variable, the conn.execute
call and the "old_transcripts" query) so rows are processed incrementally rather
than all at once.
---
Outside diff comments:
In `@backend/src/services/supabaseService.js`:
- Around line 447-480: The code currently sets rowTags = Array.isArray(row.tags)
? row.tags : [] which drops object-shaped JSONB tag payloads; instead
parse/normalize row.tags into a consistent array before using it (e.g., convert
single-string, object-shaped, or array payloads into an array of tag values) so
Speaker loop, topic loop and tag loop operate on valid entries; update the logic
around rowTags creation (the variable used in the speakerMap/topicMap/tagSet
flows) to normalize JSONB inputs first (using existing helpers like
cleanLabel/normalizeLabel where appropriate) rather than defaulting to [].
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 9f902fba-dcb8-4119-88c8-73a61d93b113
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (7)
NEW_DB_SCRIPTS/Description.mdNEW_DB_SCRIPTS/add_indexes.pyNEW_DB_SCRIPTS/migrate_schema.pyNEW_DB_SCRIPTS/models.pybackend/src/services/supabaseService.jsbackend/supabase/migrations/001_full_text_search.sqltypes.ts
There was a problem hiding this comment.
8 issues found across 8 files
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Re-trigger cubic
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
NEW_DB_SCRIPTS/add_indexes.py (1)
20-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn a failing process status when engine creation fails.
On Line 20-Line 23, returning after logging makes index setup appear successful to automation even though nothing executed.
Proposed fix
def add_indexes(): engine = _get_engine() if not engine: - logger.error("Could not get database engine. Check DATABASE_URL.") - return + raise RuntimeError("Could not get database engine. Check DATABASE_URL.")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@NEW_DB_SCRIPTS/add_indexes.py` around lines 20 - 23, The current early-return after calling _get_engine() masks failures; change the behavior so the process exits with a non-zero status instead of returning: import sys (or use raise SystemExit) in NEW_DB_SCRIPTS/add_indexes.py and replace the bare "return" after logger.error("Could not get database engine. Check DATABASE_URL.") with sys.exit(1) (or raise SystemExit("Could not get database engine")), ensuring the failure is propagated to automation; keep the existing logger.error call and reference the _get_engine() call and engine variable in your change.NEW_DB_SCRIPTS/models.py (1)
409-411:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd a unique constraint for summary upserts.
The summaries table is missing a unique/exclusion constraint on
(transcript_id, summary_type). Upserts usingON CONFLICT (transcript_id, summary_type)will fail at runtime without it.Proposed fix
__table_args__ = ( + UniqueConstraint( + "transcript_id", + "summary_type", + name="uq_summaries_transcript_type", + ), Index("idx_summaries_type", "summary_type"), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@NEW_DB_SCRIPTS/models.py` around lines 409 - 411, The summaries model's __table_args__ currently defines a non-unique Index("idx_summaries_type", "summary_type") but lacks a unique constraint required for ON CONFLICT upserts on (transcript_id, summary_type); update __table_args__ in the summaries model to include a UniqueConstraint("transcript_id", "summary_type") (or an Index with unique=True) alongside the existing Index so the database enforces uniqueness for upserts.NEW_DB_SCRIPTS/migrate_schema.py (1)
32-35:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast when DB engine initialization fails.
On Line 32-Line 35, the script logs and returns when
_get_engine()is unavailable, which can still exit with code 0. That can hide failed migrations in CI/CD.Proposed fix
def run_migration(dry_run=False): engine = _get_engine() if not engine: - logger.error("Could not get database engine. Check DATABASE_URL.") - return + raise RuntimeError("Could not get database engine. Check DATABASE_URL.")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@NEW_DB_SCRIPTS/migrate_schema.py` around lines 32 - 35, The script currently logs and returns when _get_engine() fails, which allows a zero exit status; update the failure path in migrate_schema.py where engine = _get_engine() to terminate with a non-zero exit (e.g., call sys.exit(1) or raise SystemExit/RuntimeError) right after logger.error("Could not get database engine. Check DATABASE_URL.") so CI sees the migration failure; ensure you import sys or raise an exception and keep the log message intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/src/utils/dataProcessor.js`:
- Line 189: The current ternary uses a falsy check on row.duration_seconds which
treats 0 as missing; update the condition in the duration assignment (the line
setting duration using row.duration_seconds) to check for null/undefined (e.g.
row.duration_seconds != null or Number.isFinite(row.duration_seconds)) instead
of a truthy check so that 0 is formatted as "0:00" using the existing
Math.floor/ padStart formatting logic.
---
Outside diff comments:
In `@NEW_DB_SCRIPTS/add_indexes.py`:
- Around line 20-23: The current early-return after calling _get_engine() masks
failures; change the behavior so the process exits with a non-zero status
instead of returning: import sys (or use raise SystemExit) in
NEW_DB_SCRIPTS/add_indexes.py and replace the bare "return" after
logger.error("Could not get database engine. Check DATABASE_URL.") with
sys.exit(1) (or raise SystemExit("Could not get database engine")), ensuring the
failure is propagated to automation; keep the existing logger.error call and
reference the _get_engine() call and engine variable in your change.
In `@NEW_DB_SCRIPTS/migrate_schema.py`:
- Around line 32-35: The script currently logs and returns when _get_engine()
fails, which allows a zero exit status; update the failure path in
migrate_schema.py where engine = _get_engine() to terminate with a non-zero exit
(e.g., call sys.exit(1) or raise SystemExit/RuntimeError) right after
logger.error("Could not get database engine. Check DATABASE_URL.") so CI sees
the migration failure; ensure you import sys or raise an exception and keep the
log message intact.
In `@NEW_DB_SCRIPTS/models.py`:
- Around line 409-411: The summaries model's __table_args__ currently defines a
non-unique Index("idx_summaries_type", "summary_type") but lacks a unique
constraint required for ON CONFLICT upserts on (transcript_id, summary_type);
update __table_args__ in the summaries model to include a
UniqueConstraint("transcript_id", "summary_type") (or an Index with unique=True)
alongside the existing Index so the database enforces uniqueness for upserts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 5f63ca40-1646-4342-8aed-f6aa9f357b1e
📒 Files selected for processing (7)
NEW_DB_SCRIPTS/Description.mdNEW_DB_SCRIPTS/add_indexes.pyNEW_DB_SCRIPTS/migrate_schema.pyNEW_DB_SCRIPTS/models.pybackend/src/services/supabaseService.jsbackend/src/utils/dataProcessor.jsbackend/supabase/migrations/001_full_text_search.sql
Fixes #14
Summary by cubic
Migrates the backend to a new platform‑agnostic DB schema and updates DB services to use normalized tables. Improves full‑text search and transcript versioning, and adds duration formatting.
New Features
NEW_DB_SCRIPTS/models.py,migrate_schema.py,add_indexes.py, andDescription.md(v2 schema, transactional migration, partial GIN/B‑Tree indexes, deployment steps).backend/src/services/supabaseService.jsto the new schema (joinscontent_items,transcriptswithis_current = true,summaries,speakers,taxonomies,content_sources; returns speakers/tags/status/duration; FTS ranks titles/descriptions, transcript text, and TL;DR; safer input sanitization and distinct counts).summaries; updatedtypes.tsfor JSONtags/categoriesand optionaltopics;dataProcessornow formatsduration_secondsas mm:ss.backend/supabase/migrations/001_full_text_search.sqlto provide RPCs that search across normalized tables and return better snippets.Migration
python NEW_DB_SCRIPTS/migrate_schema.py --dry-runpython NEW_DB_SCRIPTS/migrate_schema.pypython NEW_DB_SCRIPTS/add_indexes.pyWritten for commit 92034e7. Summary will update on new commits. Review in cubic
Summary by CodeRabbit
New Features
Documentation