feat(crawler): SPEC-BRAND-NODE-001 PR-Y — products.brand_node_id FK + representative selection#10
Open
bbbang105 wants to merge 4 commits into
Open
feat(crawler): SPEC-BRAND-NODE-001 PR-Y — products.brand_node_id FK + representative selection#10bbbang105 wants to merge 4 commits into
bbbang105 wants to merge 4 commits into
Conversation
… representative selection - products.brand_node_id (bigint NULL FK → brand_nodes.id) backfill on import. Unknown brands trigger trigram fuzzy match (≥0.85 Jaccard) against existing brand_nodes; new brand_nodes row inserted with node columns NULL, alias candidates enqueued to brand_node_review_queue (reason='alias_candidate'). - products.is_brand_representative boolean: brand-VLM 5-image input SOT. Replaces SPEC-3 random.sample(5) approach. - New CLI select-representatives.ts: diversity heuristic (category round-robin + in_stock + recency) flags 5~10 products per brand, syncs cache to brand_nodes.representative_image_urls. - Drop mood_tags from PAI prompt + AnalysisResult (search weight=0 in v6). brand_nodes.primary_node_id (brand-VLM) replaces product-level mood channel. Migrations 057/058 live in app repo (committed in app PR-Z). 🗿 MoAI <email@mo.ai.kr>
🗿 MoAI <email@mo.ai.kr>
- classify-brands.ts: app /api/internal/classify-brand 일괄 호출 CLI. token bucket (8s start, auto adjust on success/429), 429/5xx retry with exponential backoff, 4xx no-retry, per-step verbose logging (token / POST / HTTP / response / outcome), failure jsonl log for re-run via --retry-failed. Options: --all, --brand-id, --limit, --force, --concurrency, --interval, --dry-run. - select-representatives.ts: --limit N added for small-batch testing (works combined with --all, applied after fetch). - .gitignore: .app-src (transient symlink used for editing app repo from this session). 🗿 MoAI <email@mo.ai.kr>
…tch CDN block
Schema rename (app migration 062):
- primary_node_id → primary_style_node_id
- secondary_node_id → secondary_style_node_id
- node_confidence → style_node_confidence
- node_assigned_at → style_node_assigned_at
- node_assigned_model → style_node_assigned_model
- DROP COLUMN style_node (legacy text enum)
Crawler updates:
- classify-brands.ts: primary_node_id IS NULL filter renamed
+ auto-filter brands with no products
- import-products.ts: loadBrandNodes() drops style_node select +
nodeMap removed (legacy text source gone);
products.style_node always NULL on new insert
(v5 search axis, weight=0 in v6)
- analyze-prompt.ts: doc comment updated
- select-representatives.ts: add BLOCKED_IMAGE_DOMAINS (farfetch-contents.com)
to prevent brand-VLM 5-image bundle fail on
a single 403 image URL
🗿 MoAI <email@mo.ai.kr>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
products.brand_node_id(bigint NULL FK →brand_nodes.id) backfilled on import. Unknown brand strings trigger trigram fuzzy match (Jaccard ≥ 0.85) against existingbrand_nodes; new rows inserted with node columns NULL, alias candidates enqueued tobrand_node_review_queue(reason=alias_candidate).products.is_brand_representativeboolean — brand-VLM 5-image input SOT. Replaces SPEC-3random.sample(5).select-representatives.ts: diversity heuristic (category round-robin + in_stock + recency) flags 5~10 products per brand, syncs cache tobrand_nodes.representative_image_urls.mood_tagsfrom PAI prompt +AnalysisResult(search weight=0 in v6).brand_nodes.primary_node_id(brand-VLM) replaces product-level mood channel.Coordinated with app session
057_products_brand_node_fk.sql/058_products_brand_representative.sqllive in app repo (committed via app PR alongside endpoint work).select-representativeswet-run will be followed byPOST /api/internal/classify-brandcalls (separate follow-up PR).import-products.tsis close (±0.05) to pg_trgmsimilarity(); 0.85 threshold has enough margin.Test plan
pnpm typecheckcleanpnpm test100/100 passingpnpm tsx src/select-representatives.ts --all --dry-run --target 5pnpm tsx src/select-representatives.ts --brand "AURALEE" --target 5brand_node_review_queueforalias_candidaterows after first import sweep🗿 MoAI email@mo.ai.kr