Skip to content

feat(crawler): SPEC-BRAND-NODE-001 PR-Y — products.brand_node_id FK + representative selection#10

Open
bbbang105 wants to merge 4 commits into
devfrom
feature/spec-brand-node-crawler
Open

feat(crawler): SPEC-BRAND-NODE-001 PR-Y — products.brand_node_id FK + representative selection#10
bbbang105 wants to merge 4 commits into
devfrom
feature/spec-brand-node-crawler

Conversation

@bbbang105
Copy link
Copy Markdown
Member

Summary

  • products.brand_node_id (bigint NULL FK → brand_nodes.id) backfilled on import. Unknown brand strings trigger trigram fuzzy match (Jaccard ≥ 0.85) against existing brand_nodes; new rows inserted with node columns NULL, alias candidates enqueued to brand_node_review_queue (reason=alias_candidate).
  • products.is_brand_representative boolean — brand-VLM 5-image input SOT. Replaces SPEC-3 random.sample(5).
  • New CLI select-representatives.ts: diversity heuristic (category round-robin + in_stock + recency) flags 5~10 products per brand, syncs cache to brand_nodes.representative_image_urls.
  • Drop mood_tags from PAI prompt + AnalysisResult (search weight=0 in v6). brand_nodes.primary_node_id (brand-VLM) replaces product-level mood channel.

Coordinated with app session

  • Migrations 057_products_brand_node_fk.sql / 058_products_brand_representative.sql live in app repo (committed via app PR alongside endpoint work).
  • After app PR-Z merges + INTERNAL_API_KEY shared, select-representatives wet-run will be followed by POST /api/internal/classify-brand calls (separate follow-up PR).
  • TS-side trigram similarity used in import-products.ts is close (±0.05) to pg_trgm similarity(); 0.85 threshold has enough margin.

Test plan

  • pnpm typecheck clean
  • pnpm test 100/100 passing
  • Dry-run after migration: pnpm tsx src/select-representatives.ts --all --dry-run --target 5
  • Wet-run on subset: pnpm tsx src/select-representatives.ts --brand "AURALEE" --target 5
  • Verify backfill stats from migration 057 logs (matched / null breakdown)
  • Sanity check brand_node_review_queue for alias_candidate rows after first import sweep

🗿 MoAI email@mo.ai.kr

… representative selection

- products.brand_node_id (bigint NULL FK → brand_nodes.id) backfill on import.
  Unknown brands trigger trigram fuzzy match (≥0.85 Jaccard) against existing
  brand_nodes; new brand_nodes row inserted with node columns NULL, alias
  candidates enqueued to brand_node_review_queue (reason='alias_candidate').
- products.is_brand_representative boolean: brand-VLM 5-image input SOT.
  Replaces SPEC-3 random.sample(5) approach.
- New CLI select-representatives.ts: diversity heuristic (category round-robin
  + in_stock + recency) flags 5~10 products per brand, syncs cache to
  brand_nodes.representative_image_urls.
- Drop mood_tags from PAI prompt + AnalysisResult (search weight=0 in v6).
  brand_nodes.primary_node_id (brand-VLM) replaces product-level mood channel.

Migrations 057/058 live in app repo (committed in app PR-Z).

🗿 MoAI <email@mo.ai.kr>
@bbbang105 bbbang105 added the 🚀 feat 새로운 기능 추가 / 일부 코드 추가 / 일부 코드 수정 (리팩토링과 구분) / 디자인 요소 수정 label May 14, 2026
🗿 MoAI <email@mo.ai.kr>
@bbbang105 bbbang105 added the 🌱 style 코드 의미에 영향을 주지 않는 변경사항 (코드 포맷팅, 오타 수정, 변수명 변경, 에셋 추가) label May 14, 2026
- classify-brands.ts: app /api/internal/classify-brand 일괄 호출 CLI.
  token bucket (8s start, auto adjust on success/429), 429/5xx retry with
  exponential backoff, 4xx no-retry, per-step verbose logging (token /
  POST / HTTP / response / outcome), failure jsonl log for re-run via
  --retry-failed. Options: --all, --brand-id, --limit, --force,
  --concurrency, --interval, --dry-run.
- select-representatives.ts: --limit N added for small-batch testing
  (works combined with --all, applied after fetch).
- .gitignore: .app-src (transient symlink used for editing app repo
  from this session).

🗿 MoAI <email@mo.ai.kr>
@bbbang105 bbbang105 added the ✅ test 테스트 코드 label May 14, 2026
…tch CDN block

Schema rename (app migration 062):
- primary_node_id      → primary_style_node_id
- secondary_node_id    → secondary_style_node_id
- node_confidence      → style_node_confidence
- node_assigned_at     → style_node_assigned_at
- node_assigned_model  → style_node_assigned_model
- DROP COLUMN style_node (legacy text enum)

Crawler updates:
- classify-brands.ts:    primary_node_id IS NULL filter renamed
                          + auto-filter brands with no products
- import-products.ts:    loadBrandNodes() drops style_node select +
                          nodeMap removed (legacy text source gone);
                          products.style_node always NULL on new insert
                          (v5 search axis, weight=0 in v6)
- analyze-prompt.ts:     doc comment updated
- select-representatives.ts: add BLOCKED_IMAGE_DOMAINS (farfetch-contents.com)
                          to prevent brand-VLM 5-image bundle fail on
                          a single 403 image URL

🗿 MoAI <email@mo.ai.kr>
@bbbang105 bbbang105 added ✂️ remove 패키지 혹은 폴더, 클래스 삭제 🎫 rename 패키지 혹은 폴더명, 클래스명 수정 🔄 refactor 코드 리팩토링 labels May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🚀 feat 새로운 기능 추가 / 일부 코드 추가 / 일부 코드 수정 (리팩토링과 구분) / 디자인 요소 수정 🔄 refactor 코드 리팩토링 ✂️ remove 패키지 혹은 폴더, 클래스 삭제 🎫 rename 패키지 혹은 폴더명, 클래스명 수정 🌱 style 코드 의미에 영향을 주지 않는 변경사항 (코드 포맷팅, 오타 수정, 변수명 변경, 에셋 추가) ✅ test 테스트 코드

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant