diff --git a/TASK.md b/TASK.md index 803edc7..4104ee2 100644 --- a/TASK.md +++ b/TASK.md @@ -1,23 +1,71 @@ -## 리팩토링 계획 - -### Phase 1: Repository 계층 도입 (데이터 로직 분리) - -1. **[Repo] `src/repositories` 디렉토리 생성**: 데이터베이스 쿼리 로직을 모아둘 디렉토리를 생성합니다. -2. **[Repo] `post.repository.ts` 생성 및 이전**: - * `blog_post`, `post_chunks`, `post_title_embeddings` 테이블 관련 쿼리를 이 파일로 옮깁니다. - * `qa.service.ts`의 `findPostById`, `findSimilarChunks` 로직을 이전합니다. - * `embedding.service.ts`의 `storeTitleEmbedding`, `storeContentEmbeddings` 로직을 이전합니다. -3. **[Repo] `persona.repository.ts` 생성 및 이전**: - * `persona` 테이블 관련 쿼리를 이 파일로 옮깁니다. - * `qa.service.ts`의 `getSpeechTonePrompt` 내부 DB 조회 로직을 `findPersonaById`와 같은 함수로 분리하여 이전합니다. -4. **[Service] 서비스 계층 수정**: - * `qa.service.ts`와 `embedding.service.ts`가 DB에 직접 접근하는 대신, 새로 만든 Repository의 함수를 호출하도록 코드를 수정합니다. - -### Phase 2: 프롬프트 관리 분리 - -5. **[Prompt] `src/prompts` 디렉토리 생성**: 프롬프트 템플릿을 관리할 디렉토리를 생성합니다. -6. **[Prompt] `qa.prompts.ts` 파일 생성**: - * `qa.service.ts`에 하드코딩된 시스템 프롬프트와 사용자 메시지 생성 로직을 이 파일로 옮깁니다. - * `createRagPrompt`, `createPostContextPrompt`와 같이 동적으로 프롬프트를 생성하는 함수를 만듭니다. -7. **[Service] `qa.service.ts` 수정**: - * `qa.prompts.ts`에서 프롬프트 생성 함수를 가져와(import) 사용하도록 수정합니다. \ No newline at end of file +## 하이브리드 서치 확장(질문 재작성 + 키워드) + +목표 +- 리콜 향상: 질문을 LLM이 재작성(rewrites)하고, 핵심 키워드를 생성하여 벡터+텍스트 양쪽에서 검색. +- 정밀/비용 제어: 재작성/키워드 개수를 제한하고, 융합 가중치로 결과를 안정적으로 선별. + +Plan JSON 확장(초안) +```json +{ + "rewrites": ["..."], + "keywords": ["..."], + "hybrid": { "enabled": true, "alpha": 0.7, "max_rewrites": 4, "max_keywords": 8 } +} +``` +- 기존 필드(`top_k`, `threshold`, `weights`, `filters.time`, `sort`, `limit`)와 공존. +- 서버에서 상한 강제(rewrites<=4, keywords<=8), 품질이 낮거나 중복인 항목 제거. + +프롬프트 가이드(확장) +- 역할: ‘검색 계획 + 재작성/키워드 생성’ +- 출력: 기존 계획 JSON + `rewrites[]`, `keywords[]`, `hybrid{}` +- 규칙: + - 불용/범용 단어(예: "글", "포스트", "블로그") 지양 + - 과도한 시기/주제 확장 금지(사용자 블로그 컨텍스트 벗어나지 않기) + - 중복/동의어 반복 최소화(문장 유사도 과다 시 제거) + +하이브리드 검색 파이프라인 +1) 멀티 벡터 검색: 원 질문 + rewrites 각각 임베딩 → pgvector Top-K 검색(합집합) +2) 텍스트 검색: keywords로 `post_chunks.content`/`blog_post.title` 텍스트 매칭 → Top-K 추출 +3) 랭킹 융합: 점수 정규화 후 `score = α·vec + (1-α)·text` 또는 RRF로 병합 → 상위 N 청크 선택 +4) 컨텍스트 구성: v1과 동일 프롬프트로 최종 LLM 호출 + +저장소/인덱스(제안) +- PostgreSQL 확장: `pg_trgm`(간단/범용) 또는 `tsvector`(정교) +- 1차안(pg_trgm) + - DDL: + - `CREATE EXTENSION IF NOT EXISTS pg_trgm;` + - `CREATE INDEX IF NOT EXISTS idx_pc_content_trgm ON post_chunks USING gin (content gin_trgm_ops);` + - `CREATE INDEX IF NOT EXISTS idx_bp_title_trgm ON blog_post USING gin (title gin_trgm_ops);` + - 질의: `similarity(content, $q) > $min OR content ILIKE ANY($patterns)` 등으로 text_score 계산 + +SSE 확장(선택) +- `event: rewrite` / `data: ["..."]` +- `event: keywords` / `data: ["..."]` +- `event: hybrid_result` / `data: [{ postId, postTitle }]` + +보안/안전 +- 재작성/키워드 출력 스키마 강제(JSON), 길이/개수 상한 +- 금칙어/민감어 필터(선택), 카테고리/기간 필터는 서버가 최종 결정 + +세부 구현 계획(하이브리드 서치) +1) 스키마/프롬프트 + - [ ] `src/types/ai.v2.types.ts`: Plan 스키마에 `rewrites[]`, `keywords[]`, `hybrid{enabled,alpha,max_rewrites,max_keywords}` 추가 + - [ ] `src/prompts/qa.v2.prompts.ts`: 프롬프트/JSON Schema에 확장 필드 반영 + few-shot 보강 +2) 플래너 서비스 + - [ ] `search-plan.service.ts`: 확장 필드 파싱/검증, 중복/불용어 필터링, 상한 강제 +3) 텍스트 검색 저장소 + - [ ] `post.repository.ts`: `textSearchChunksV2({ userId, query, keywords[], from?, to?, topK })` + - [ ] (옵션) DDL 문서화: pg_trgm 인덱스 생성 스크립트 추가 +4) 하이브리드 서비스 + - [ ] `hybrid-search.service.ts` 구현: 멀티 임베딩, 텍스트 검색, 정규화, α 융합/RRF, 상위 N 반환 +5) 오케스트레이션/이벤트 + - [ ] `qa.v2.service.ts`: plan.hybrid.enabled 시 하이브리드 경로 분기, (선택) `rewrite`/`keywords`/`hybrid_result` 송신 +6) 테스트/튜닝 + - [ ] 통합 테스트: 재작성/키워드 포함 질의에서 리콜↑ 확인 + - [ ] 가중치 α, Top-K 상수, 불용어/중복 필터 기준 튜닝 + - [ ] 0건/오류 폴백(v1 RAG) 검증 + +권장 단계적 도입 +- Phase 1: 스키마/프롬프트/하이브리드 파이프라인 구현(기본 α=0.7, rewrites<=3, keywords<=6) +- Phase 2: SSE 관측성 이벤트 추가, 품질 튜닝 +- Phase 3: 인덱스/성능 최적화, 불용어 사전/NER 보정 diff --git a/docs/api.md b/docs/api.md new file mode 100644 index 0000000..2e55bef --- /dev/null +++ b/docs/api.md @@ -0,0 +1,171 @@ +# Bubblog AI API 문서 (v1 ~ v2) + +본 문서는 `/ai` (v1)와 `/ai/v2` (v2) 엔드포인트를 정리합니다. 서버는 Express 기반이며, `POST /ask` 류는 Server‑Sent Events(SSE)로 답변을 스트리밍합니다. + +## 기본 정보 +- Base Path + - v1: `/ai` + - v2: `/ai/v2` +- 인증 + - `POST /ask` 엔드포인트는 `Authorization: Bearer ` 필요 + - 임베딩 생성 엔드포인트는 인증 없이 사용 가능 +- 본문 형식: `application/json` +- SSE 수신: `Content-Type: text/event-stream` + - 이벤트명은 `event:` 라인으로, 데이터는 `data:` 라인으로 전송됩니다. + - 일반 텍스트 콘텐츠는 `event: answer`로 분할 전송되며, 종료 시 `event: end` + `data: [DONE]`가 송신됩니다. + +## v1 엔드포인트 (`/ai`) + +### GET `/ai/health` +- 인증: 불필요 +- 응답(200): `{ "status": "ok" }` + +### POST `/ai/embeddings/title` +- 인증: 불필요 +- 요청 Body + - `post_id`(number, required) + - `title`(string, required) +- 동작: 제목 임베딩 생성 후 저장 +- 응답(200): `{ "ok": true }` + +### POST `/ai/embeddings/content` +- 인증: 불필요 +- 요청 Body + - `post_id`(number, required) + - `content`(string, required) +- 동작: 본문을 약 512 토큰 단위로 중첩(50) 청킹 → 임베딩 생성/저장 +- 응답(200): `{ "post_id": number, "chunk_count": number, "success": true }` + +### POST `/ai/ask` (SSE) +- 인증: 필요 (`Authorization: Bearer `) +- 요청 Body + - `question`(string, required) + - `user_id`(string, required) + - `category_id`(number, optional) + - `post_id`(number, optional) — 지정 시 해당 글 컨텍스트에 국한하여 답변 + - `speech_tone`(number, optional) + - `-1`: 간결하고 명확한 말투(기본) + - `-2`: 해당 글의 말투를 최대한 모사 + - 양의 정수: 페르소나 ID(해당 유저의 등록된 페르소나 참조) + - `llm`(object, optional) + - `provider`: `openai` | `gemini` + - `model`: string (미지정 시 서버 기본값 사용) + - `options`: `{ temperature?, top_p?, max_output_tokens? }` +- SSE 이벤트(주요) + - `exist_in_post_status`: `true|false` — 관련 컨텍스트 존재 여부 + - `context`: `[ { postId, postTitle }, ... ]` — 검색/선택된 컨텍스트 요약 + - `answer`: 모델의 부분 응답 텍스트(여러 번 전송) + - `end`: 종료 시 `data: [DONE]` + - `error`: `{ code?, message }` — 예: `post_id`가 없거나 권한 없음(403), 없음(404) +- 예시(curl) + ```bash + curl -N \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + -X POST http://localhost:3000/ai/ask \ + -d '{ + "question": "카테고리 A 관련 요약 해줘", + "user_id": "u_123", + "category_id": 1, + "speech_tone": -1 + }' + ``` + +## v2 엔드포인트 (`/ai/v2`) + +### GET `/ai/v2/health` +- 인증: 불필요 +- 응답(200): `{ "status": "ok", "v": "v2" }` + +### POST `/ai/v2/ask` (SSE) +- 인증: 필요 (`Authorization: Bearer `) +- 요청 Body + - `question`(string, required) + - `user_id`(string, required) + - `category_id`(number, optional) + - `post_id`(number, optional) + - `speech_tone`(number, optional) + - `-1`: 기본 말투(간결/명확) + - `-2`: 해당 글(post 모드) 말투 모사 + - 양수: 페르소나 ID(해당 유저의 등록 페르소나) + - `llm`(object, optional) + - `provider`: `openai` | `gemini` + - `model`: string (미지정 시 서버 기본값 사용) + - `options`: `{ temperature?: number, top_p?: number, max_output_tokens?: number }` +- 동작 개요 + - 서버가 질문을 토대로 “검색 계획(JSON)”을 생성·검증·정규화한 뒤, 계획에 따라 시맨틱 또는 하이브리드 검색을 수행하고 결과를 SSE로 스트리밍합니다. + - `post_id`가 있으면 post 모드(단일 글 컨텍스트)로 처리하며, 간략한 `search_plan`/`search_result` 이벤트 후 본문 기반 답변을 스트리밍합니다. +- 하이브리드 검색(벡터+텍스트) + - 계획에 `hybrid.enabled: true`인 경우 활성화됩니다. + - `rewrites`(재작성 질의)와 `keywords`(핵심 키워드)를 생성하여 벡터/텍스트 두 경로로 후보를 수집하고, `hybrid.retrieval_bias` 라벨을 서버가 `alpha` 값으로 매핑해 점수를 융합하여 상위 `top_k`를 선택합니다. + - 매핑(기본): `lexical → 0.3`, `balanced → 0.5`, `semantic → 0.75` + - 결합식: `score = alpha*vec + (1-alpha)*text` (각 경로 점수 min-max 정규화 후) + - SSE로 `rewrite`, `keywords`, `hybrid_result` 이벤트가 필요한 경우에만 송신됩니다. 하이브리드 결과가 없으면 시맨틱 검색으로 폴백합니다. +- SSE 이벤트 순서(일반적인 흐름) + 1) `search_plan`: 정규화된 검색 계획(JSON) + - 예시 데이터(정규화): + ```json + { + "mode": "rag", + "top_k": 5, + "threshold": 0.2, + "weights": { "chunk": 0.7, "title": 0.3 }, + "filters": { + "time": { "type": "absolute", "from": "2025-09-01T00:00:00.000Z", "to": "2025-09-30T23:59:59.999Z" } + }, + "sort": "created_at_desc", + "limit": 5, + "hybrid": { "enabled": true, "retrieval_bias": "balanced", "alpha": 0.5, "max_rewrites": 3, "max_keywords": 6 }, + "rewrites": ["프로젝트 X 요약", "프로젝트 X 핵심"], + "keywords": ["프로젝트 X", "핵심", "요약"] + } + ``` + - 비고: + - `filters.time`만 포함됩니다. `user_id`/`category_id`/`post_id` 등은 서버가 검색 시 내부적으로 적용합니다. + - `hybrid.retrieval_bias`는 LLM 라벨이며 서버가 `alpha`로 변환해 사용합니다. + - post 모드에서는 간략한 형태 예: `{ "mode": "post", "filters": { "post_id": 123, "user_id": "u_123" } }`. + 2) (하이브리드 사용 시) `rewrite`: `string[]` + 3) (하이브리드 사용 시) `keywords`: `string[]` + 4) (하이브리드 사용 시) `hybrid_result`: `[ { postId, postTitle }, ... ]` + 5) `search_result`: `[ { postId, postTitle }, ... ]` — 최종 컨텍스트 요약(하이브리드 또는 시맨틱) + 6) `exist_in_post_status`: `true|false` + 7) `context`: `[ { postId, postTitle }, ... ]` + 8) `answer` — 모델 부분 응답(여러 번) + 9) `end` — `data: [DONE]` + - 오류 시 `error`: `{ code?: number, message: string }` +- 폴백 동작 + - 플래너 실패 시 `search_plan`으로 `{ "mode": "rag", "fallback": true }`가 송신되며, v1 스타일 RAG로 컨텍스트를 구성합니다. + +- 예시(curl) + ```bash + curl -N \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + -X POST http://localhost:3000/ai/v2/ask \ + -d '{ + "question": "최근 한 달 블로그에서 프로젝트 X 관련 내용 요약", + "user_id": "u_123", + "category_id": 3, + "llm": { "provider": "openai", "model": "gpt-5-mini", "options": { "temperature": 0.2, "top_p": 0.9, "max_output_tokens": 800 } } + }' + ``` + +## 참고 사항 +- `post_id`가 지정된 요청에서 해당 글이 존재하지 않으면 SSE로 `error` 이벤트(404)가 송신되고 스트림이 종료됩니다. +- `post.is_public`이 `false`인 글은 요청 `user_id`가 글 소유자와 다르면 `error` 이벤트(403)로 차단됩니다. `post.is_public`이 `true`면 누구나 접근 가능합니다. +- v1/v2 모두 모델 응답 텍스트는 `answer` 이벤트로 분할 전송됩니다. 클라이언트는 누적하여 최종 답변을 구성해야 합니다. +- EventSource(브라우저) 사용 예시 + ```js + const es = new EventSource('/ai/v2/ask', { withCredentials: true }); // 헤더 인증이 필요한 경우 fetch/XHR 권장 + es.addEventListener('search_plan', (e) => console.log('plan', e.data)); + es.addEventListener('search_result', (e) => console.log('result', e.data)); + es.addEventListener('context', (e) => console.log('ctx', e.data)); + es.addEventListener('answer', (e) => renderAppend(JSON.parse(e.data))); + es.addEventListener('end', () => es.close()); + es.addEventListener('error', (e) => es.close()); + ``` + +## 요약 +- v1 `/ai/ask`: 컨텍스트 존재 여부와 요약(`exist_in_post_status`, `context`) 후 답변 스트리밍 +- v2 `/ai/v2/ask`: 위 흐름에 더해 검색 계획(`search_plan`)과 검색 결과 요약(`search_result`)을 추가로 제공 +- 임베딩 API(v1): 게시물 제목/본문 임베딩 생성 및 저장 diff --git a/docs/migrations/2025-01-pgtrgm.sql b/docs/migrations/2025-01-pgtrgm.sql new file mode 100644 index 0000000..168a891 --- /dev/null +++ b/docs/migrations/2025-01-pgtrgm.sql @@ -0,0 +1,9 @@ +-- Enable pg_trgm and add GIN indexes for text search on content and title +CREATE EXTENSION IF NOT EXISTS pg_trgm; + +CREATE INDEX IF NOT EXISTS idx_pc_content_trgm + ON post_chunks USING gin (content gin_trgm_ops); + +CREATE INDEX IF NOT EXISTS idx_bp_title_trgm + ON blog_post USING gin (title gin_trgm_ops); + diff --git a/docs/migrations/README.md b/docs/migrations/README.md new file mode 100644 index 0000000..3f4de87 --- /dev/null +++ b/docs/migrations/README.md @@ -0,0 +1,27 @@ +# Migrations Guide + +This folder contains SQL scripts for optional indexes/extensions used by the AI services. + +## Apply pg_trgm for text search + +File: `2025-01-pgtrgm.sql` + +Purpose: +- Enable `pg_trgm` extension and add GIN indexes on `post_chunks.content` and `blog_post.title` to accelerate partial/fuzzy text search used in hybrid search. + +Run (with `DATABASE_URL`): + +```bash +psql "$DATABASE_URL" -f docs/migrations/2025-01-pgtrgm.sql +``` + +Or inside `psql`: + +```sql +\i docs/migrations/2025-01-pgtrgm.sql +``` + +Notes: +- Indexes increase disk usage and write overhead; create only on columns used for text search. +- The extension must be installed once per database. + diff --git a/docs/reports/REPORT-askv2.md b/docs/reports/REPORT-askv2.md new file mode 100644 index 0000000..c9d4229 --- /dev/null +++ b/docs/reports/REPORT-askv2.md @@ -0,0 +1,134 @@ +# 보고서: /ai/v2/ask — 구조, 동작, 하이브리드 검색 계획 + +## 1) 개요 +- 목표: v1의 고정형 RAG 한계를 보완. “검색 계획(JSON)”을 LLM이 생성 → 서버가 안전하게 표준화/검증 → 시맨틱/하이브리드 검색 수행 → 최종 답변을 SSE로 스트리밍. +- 상태: `POST /ai/v2/ask`(SSE) 운영 중. v1은 유지하며, v2에서 계획 기반 검색과 관측성을 강화. + +## 2) v1의 한계와 v2 도입 효과 +- 고정 파라미터 → 동적 계획 + - v1: 임계치(0.2), LIMIT(5), 가중치(0.7/0.3) 등 고정. + - v2: `top_k`, `threshold`, `weights`, `sort`, `limit`을 질문/맥락에 맞게 동적 제어(서버가 범위 강제). +- 시간/정렬 의도 미반영 → 자연어 시간 해석 + - v1: “최근/지난주/9월/작년” 같은 시간 의도 반영 불가. + - v2: `filters.time`(상대/월/분기/연도)을 받아 `KST 절대범위(from/to)`로 변환해 쿼리에 반영. +- 리콜/정밀도 균형 한계 → 하이브리드 + - v1: 임베딩 기반 RAG만. + - v2: 벡터+텍스트 하이브리드 융합으로 재현율 향상 및 키워드 민감 질의 대응. +- 관측성 부족 → SSE 메타 이벤트 + - v1: 검색/선택 근거가 불투명. + - v2: `search_plan`, `rewrite`, `keywords`, `hybrid_result`, `search_result` 등으로 의사결정 가시화. +- 안전성/정합성 + - v1: 클라이언트 입력 이탈 감지 어려움. + - v2: JSON Schema(strict) + Zod 검증 + 필터 강제 주입으로 안전한 실행계획만 수행. +- 비용/토큰 관리 + - v1: 고정 상수로 세밀한 제어 어려움. + - v2: `top_k`/향후 `limit`/dedupe로 토큰 예산을 상황별 최적화 가능. + +![v1 구조](../structureDiagram/askV1-structure.png) + +## 3) 검색 계획(Planner)과 안전 표준화 +- 타입/스키마(`src/types/ai.v2.types.ts`) + - `mode`: `rag | post` + - `top_k`(1..10), `threshold`(0..1), `weights`(`chunk`,`title`; 합 1로 정규화) + - `filters`: `{ user_id, category_ids?, post_id?, time? }` + - `sort`: `created_at_desc | created_at_asc`, `limit`(1..20) + - `hybrid`: `{ enabled, retrieval_bias, alpha?, max_rewrites, max_keywords }` + - `rewrites`: string[], `keywords`: string[] +- 프롬프트 규칙(`src/prompts/qa.v2.prompts.ts`) + - 서버 제공 필터(`user_id`, `category_id`, `post_id`)는 고정(FIXED)이며 변경 금지. + - 플래너는 `top_k`, `threshold`, `filters.time`, `sort`, `limit`, `hybrid`, `rewrites`, `keywords`만 결정. + - 하이브리드 시 `hybrid.retrieval_bias ∈ {lexical, balanced, semantic}` 라벨을 결정. +- 생성/정규화(`src/services/search-plan.service.ts`) + - OpenAI Responses(JSON Schema strict) → Zod 파싱/검증. + - 값 범위 강제, `weights` 합 1로 정규화. + - 불용어/중복 제거로 `rewrites`/`keywords` 정제. + - 시간 필터를 `KST 절대범위`로 변환(`src/utils/time.ts`). + - `retrieval_bias → alpha` 매핑(서버): `lexical→0.3`, `balanced→0.5`, `semantic→0.75`(필요 시 서버가 `alpha`를 직접 클램프/주입). + - `user_id/category_id/post_id`는 서버가 최종 주입(정합성 보장). 실패 시 v1 RAG 폴백. + +## 4) 검색 엔진: 시맨틱 + 하이브리드 +- 시맨틱(`src/services/semantic-search.service.ts` → `findSimilarChunksV2`) + - 입력: 질문 임베딩, `threshold/top_k/weights/sort`, 카테고리/시간 필터. + - 점수: `w_chunk*(1 - chunk_dist) + w_title*(1 - title_dist)`. + - 저장소: `postRepository.findSimilarChunksV2(...)` 호출, 파라미터 바인딩 기반 안전 쿼리. +- 하이브리드(`src/services/hybrid-search.service.ts`) + - 입력: 원 질문 + `rewrites`(재작성), `keywords`(키워드), `alpha`(서버 매핑). + - 벡터 경로: 각 query 임베딩 → 시맨틱 후보 수집 → 동일 청크는 최대 vecScore로 병합. + - 텍스트 경로: `textSearchChunksV2`(키워드/질의 기반 텍스트 검색) → 동일 청크는 최대 textScore로 병합. + - 정규화/융합: min-max 정규화 후 `score = alpha*vec + (1-alpha)*text`로 결합 → 상위 `top_k`를 반환. + - 폴백: 하이브리드 결과 비었을 때 시맨틱 경로로 재시도. + - SSE 메타: `rewrite`, `keywords`, `hybrid_result`로 중간 산출물/요약을 별도 송신. + +![v2 구조](../structureDiagram/askV2-structure.png) + +## 5) 파이프라인(SSE 이벤트)과 모드 +- 컨트롤/오케스트레이션 + - `src/controllers/ai.v2.controller.ts`: SSE 헤더, 스트림 라우팅. + - `src/services/qa.v2.service.ts`: 계획 생성 → 검색 실행(하이브리드/시맨틱) → LLM 호출까지 오케스트레이션. +- post 모드(`post_id` 존재) + - 접근 정책: `post.is_public=false`면 소유자만(403), 미존재 시 404. + - 이벤트: `search_plan`(간략) → `search_result` → `exist_in_post_status:true` → `context` → `answer`* → `end`. + - 컨텍스트: 해당 글 본문만 사용(전처리 후). 하이브리드 미사용. +- rag/하이브리드 모드 + - 이벤트 순서(일반): + 1) `search_plan`: 정규화된 계획(JSON). `hybrid.enabled`가 true면 `retrieval_bias`와 서버 계산 `alpha`가 함께 포함. + 2) `rewrite`(선택): 계획의 재작성 질의 목록. + 3) `keywords`(선택): 계획의 핵심 키워드 목록. + 4) `hybrid_result`(선택): 하이브리드 후보 요약. + 5) `search_result`: 최종 컨텍스트 요약. + 6) `exist_in_post_status`: `true | false`. + 7) `context`: `[ { postId, postTitle }, ... ]`. + 8) `answer`*: 모델 부분 응답. + 9) `end`: 종료 시그널(`[DONE]`). + - 오류 시: `error` 송신. + +## 6) LLM/비용/프로바이더 +- 기본 모델: `openai/gpt-5-mini`(환경변수로 변경 가능), 임베딩: `text-embedding-3-small`. +- OpenAI: Responses API 스트리밍 우선, 실패 시 논스트림/Chat Completions 폴백. +- Gemini: 현재 논스트림 결과를 SSE 청크로 분할 전송. +- 비용 로깅: 프롬프트/완료 토큰 및 추정 비용 로깅(`src/llm/*`). +- 툴콜: 컨텍스트 부족 시 `report_content_insufficient` 툴을 통해 안내 문구 유도. + +## 7) 보안·정합성 +- 생성 계획 방어선: JSON Schema(strict) + Zod 검증 + 서버 측 범위 강제. +- 필터 주입: `user_id/category_id/post_id`는 서버가 최종 주입하여 일탈 방지. +- SQL 안전성: 파라미터 바인딩/화이트리스트 템플릿. +- 데이터 최소화: SSE에는 ID/제목 위주 요약만 전송. +- 접근 제어: post 모드에서 `is_public`/소유자 검사. + +## 8) 현재 동작과 한계(향후 과제) +- 현재 + - `top_k`로 리콜 폭 제어, 시맨틱/하이브리드 모두 적용. + - `limit`은 계획에는 존재하나 컨텍스트 축약에는 미적용(향후 포스트 단위 dedupe와 함께 적용 예정). + - 청크 단위 랭킹 → 동일 포스트 다수 노출 가능. + - 카테고리 배열은 첫 항목만 반영. +- 향후 + - 포스트 단위 dedupe + `limit` 적용으로 다양성/비용 균형. + - `retrieval_bias → alpha` 매핑의 AB 테스트/텔레메트리 튜닝(예: {0.25,0.5,0.8}). + - 점수 정규화 고도화(z-score/quantile) 및 RRF 옵션 도입 검토. + - 텍스트 경로 향상(전처리/순위 함수/키워드 확장) 및 프로바이더 다변화. + - SSE `search_sql`/`search_debug`로 투명성 강화. + +## 9) 파일 맵(핵심) +- 라우트/컨트롤러: `src/routes/ai.v2.routes.ts`, `src/controllers/ai.v2.controller.ts`, `src/app.ts` +- 플래너/타입/프롬프트: `src/services/search-plan.service.ts`, `src/types/ai.v2.types.ts`, `src/prompts/qa.v2.prompts.ts` +- 시간 유틸: `src/utils/time.ts` +- 검색: `src/services/semantic-search.service.ts`, `src/services/hybrid-search.service.ts`, `src/repositories/post.repository.ts` +- 오케스트레이션: `src/services/qa.v2.service.ts` +- LLM: `src/llm/*` + +## 10) 예시 +- 요청 +```json +{ + "question": "지난달 프로젝트 X 관련 핵심만 3개 보여줘", + "user_id": "u_123", + "category_id": 3, + "llm": { "provider": "openai", "options": { "temperature": 0.2 } } +} +``` +- 이벤트 예시 + - `search_plan`(rag, time=지난달, top_k/threshold/weights/sort/limit/hybrid 포함; hybrid.retrieval_bias와 서버 주입 alpha 함께 표기) + - `rewrite`/`keywords`(선택) + - `hybrid_result`(선택) + - `search_result` → `exist_in_post_status` → `context` → `answer`* → `end` diff --git a/docs/structureDiagram/askV1-structure.png b/docs/structureDiagram/askV1-structure.png new file mode 100644 index 0000000..daad8e9 Binary files /dev/null and b/docs/structureDiagram/askV1-structure.png differ diff --git a/docs/structureDiagram/askV2-structure.png b/docs/structureDiagram/askV2-structure.png new file mode 100644 index 0000000..c621efc Binary files /dev/null and b/docs/structureDiagram/askV2-structure.png differ diff --git a/package-lock.json b/package-lock.json index 65109c4..5814cd7 100644 --- a/package-lock.json +++ b/package-lock.json @@ -10,6 +10,7 @@ "license": "ISC", "dependencies": { "@dqbd/tiktoken": "^1.0.13", + "@google/genai": "^0.2.0", "cors": "^2.8.5", "dotenv": "^16.4.5", "express": "^4.19.2", @@ -47,6 +48,18 @@ "resolved": "https://registry.npmjs.org/@dqbd/tiktoken/-/tiktoken-1.0.22.tgz", "integrity": "sha512-RYhO8xeHkMNX5Ixqf4M1Ve3siCYJY/dI0yLnlX4M4oIEDOvjMIQ+E+3OUpAaZcWTaMtQJzGcDAghYfllpx3i/w==" }, + "node_modules/@google/genai": { + "version": "0.2.0", + "resolved": "https://registry.npmjs.org/@google/genai/-/genai-0.2.0.tgz", + "integrity": "sha512-r7EiRHSqc6D1lDIMvM4OemjUwPpUbYb9jTxe1eLCiFbooHrmPc6U9z3n56E/iWzigkZmjRh4IC0CMzoB1aql9w==", + "dependencies": { + "google-auth-library": "^9.14.2", + "ws": "^8.18.0" + }, + "engines": { + "node": ">=18.0.0" + } + }, "node_modules/@jridgewell/resolve-uri": { "version": "3.1.2", "resolved": "https://registry.npmjs.org/@jridgewell/resolve-uri/-/resolve-uri-3.1.2.tgz", @@ -284,6 +297,14 @@ "node": ">=0.4.0" } }, + "node_modules/agent-base": { + "version": "7.1.4", + "resolved": "https://registry.npmjs.org/agent-base/-/agent-base-7.1.4.tgz", + "integrity": "sha512-MnA+YT8fwfJPgBx3m60MNqakm30XOkyIoH1y6huTQvC0PwZG7ki8NacLBcrPbNoo8vEZy7Jpuk7+jMO+CUovTQ==", + "engines": { + "node": ">= 14" + } + }, "node_modules/agentkeepalive": { "version": "4.6.0", "resolved": "https://registry.npmjs.org/agentkeepalive/-/agentkeepalive-4.6.0.tgz", @@ -330,6 +351,33 @@ "integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==", "dev": true }, + "node_modules/base64-js": { + "version": "1.5.1", + "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", + "integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ] + }, + "node_modules/bignumber.js": { + "version": "9.3.1", + "resolved": "https://registry.npmjs.org/bignumber.js/-/bignumber.js-9.3.1.tgz", + "integrity": "sha512-Ko0uX15oIUS7wJ3Rb30Fs6SkVbLmPBAKdlm7q9+ak9bbIeFf0MwuBsQV6z7+X768/cHsfg+WlysDWJcmthjsjQ==", + "engines": { + "node": "*" + } + }, "node_modules/binary-extensions": { "version": "2.3.0", "resolved": "https://registry.npmjs.org/binary-extensions/-/binary-extensions-2.3.0.tgz", @@ -535,7 +583,6 @@ "version": "4.4.1", "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.1.tgz", "integrity": "sha512-KcKCqiftBJcZr++7ykoDIEwSa3XWowTfNPo92BYxjXiyYEVrUQh2aLyhxBCwww+heortUFxEJYcRzosstTEBYQ==", - "dev": true, "dependencies": { "ms": "^2.1.3" }, @@ -747,6 +794,11 @@ "resolved": "https://registry.npmjs.org/ms/-/ms-2.0.0.tgz", "integrity": "sha512-Tpp60P6IUJDTuOq/5Z8cdskzJujfwqfOTkrwIwj7IRISpnkJnT6SyJ4PCPnGMoFjC9ddhal5KVIYtAt97ix05A==" }, + "node_modules/extend": { + "version": "3.0.2", + "resolved": "https://registry.npmjs.org/extend/-/extend-3.0.2.tgz", + "integrity": "sha512-fjquC59cD7CyW6urNXK0FBufkZcoiGG80wTuPujX590cB5Ttln20E2UB4S/WARVqhXffZl2LNgS+gQdPIIim/g==" + }, "node_modules/fill-range": { "version": "7.1.1", "resolved": "https://registry.npmjs.org/fill-range/-/fill-range-7.1.1.tgz", @@ -859,6 +911,34 @@ "url": "https://github.com/sponsors/ljharb" } }, + "node_modules/gaxios": { + "version": "6.7.1", + "resolved": "https://registry.npmjs.org/gaxios/-/gaxios-6.7.1.tgz", + "integrity": "sha512-LDODD4TMYx7XXdpwxAVRAIAuB0bzv0s+ywFonY46k126qzQHT9ygyoa9tncmOiQmmDrik65UYsEkv3lbfqQ3yQ==", + "dependencies": { + "extend": "^3.0.2", + "https-proxy-agent": "^7.0.1", + "is-stream": "^2.0.0", + "node-fetch": "^2.6.9", + "uuid": "^9.0.1" + }, + "engines": { + "node": ">=14" + } + }, + "node_modules/gcp-metadata": { + "version": "6.1.1", + "resolved": "https://registry.npmjs.org/gcp-metadata/-/gcp-metadata-6.1.1.tgz", + "integrity": "sha512-a4tiq7E0/5fTjxPAaH4jpjkSv/uCaU2p5KC6HVGrvl0cDjA8iBZv4vv1gyzlmK0ZUKqwpOyQMKzZQe3lTit77A==", + "dependencies": { + "gaxios": "^6.1.1", + "google-logging-utils": "^0.0.2", + "json-bigint": "^1.0.0" + }, + "engines": { + "node": ">=14" + } + }, "node_modules/get-intrinsic": { "version": "1.3.0", "resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz", @@ -906,6 +986,49 @@ "node": ">= 6" } }, + "node_modules/google-auth-library": { + "version": "9.15.1", + "resolved": "https://registry.npmjs.org/google-auth-library/-/google-auth-library-9.15.1.tgz", + "integrity": "sha512-Jb6Z0+nvECVz+2lzSMt9u98UsoakXxA2HGHMCxh+so3n90XgYWkq5dur19JAJV7ONiJY22yBTyJB1TSkvPq9Ng==", + "dependencies": { + "base64-js": "^1.3.0", + "ecdsa-sig-formatter": "^1.0.11", + "gaxios": "^6.1.1", + "gcp-metadata": "^6.1.0", + "gtoken": "^7.0.0", + "jws": "^4.0.0" + }, + "engines": { + "node": ">=14" + } + }, + "node_modules/google-auth-library/node_modules/jwa": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/jwa/-/jwa-2.0.1.tgz", + "integrity": "sha512-hRF04fqJIP8Abbkq5NKGN0Bbr3JxlQ+qhZufXVr0DvujKy93ZCbXZMHDL4EOtodSbCWxOqR8MS1tXA5hwqCXDg==", + "dependencies": { + "buffer-equal-constant-time": "^1.0.1", + "ecdsa-sig-formatter": "1.0.11", + "safe-buffer": "^5.0.1" + } + }, + "node_modules/google-auth-library/node_modules/jws": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/jws/-/jws-4.0.0.tgz", + "integrity": "sha512-KDncfTmOZoOMTFG4mBlG0qUIOlc03fmzH+ru6RgYVZhPkyiy/92Owlt/8UEN+a4TXR1FQetfIpJE8ApdvdVxTg==", + "dependencies": { + "jwa": "^2.0.0", + "safe-buffer": "^5.0.1" + } + }, + "node_modules/google-logging-utils": { + "version": "0.0.2", + "resolved": "https://registry.npmjs.org/google-logging-utils/-/google-logging-utils-0.0.2.tgz", + "integrity": "sha512-NEgUnEcBiP5HrPzufUkBzJOD/Sxsco3rLNo1F1TNf7ieU8ryUzBhqba8r756CjLX7rn3fHl6iLEwPYuqpoKgQQ==", + "engines": { + "node": ">=14" + } + }, "node_modules/gopd": { "version": "1.2.0", "resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz", @@ -917,6 +1040,37 @@ "url": "https://github.com/sponsors/ljharb" } }, + "node_modules/gtoken": { + "version": "7.1.0", + "resolved": "https://registry.npmjs.org/gtoken/-/gtoken-7.1.0.tgz", + "integrity": "sha512-pCcEwRi+TKpMlxAQObHDQ56KawURgyAf6jtIY046fJ5tIv3zDe/LEIubckAO8fj6JnAxLdmWkUfNyulQ2iKdEw==", + "dependencies": { + "gaxios": "^6.0.0", + "jws": "^4.0.0" + }, + "engines": { + "node": ">=14.0.0" + } + }, + "node_modules/gtoken/node_modules/jwa": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/jwa/-/jwa-2.0.1.tgz", + "integrity": "sha512-hRF04fqJIP8Abbkq5NKGN0Bbr3JxlQ+qhZufXVr0DvujKy93ZCbXZMHDL4EOtodSbCWxOqR8MS1tXA5hwqCXDg==", + "dependencies": { + "buffer-equal-constant-time": "^1.0.1", + "ecdsa-sig-formatter": "1.0.11", + "safe-buffer": "^5.0.1" + } + }, + "node_modules/gtoken/node_modules/jws": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/jws/-/jws-4.0.0.tgz", + "integrity": "sha512-KDncfTmOZoOMTFG4mBlG0qUIOlc03fmzH+ru6RgYVZhPkyiy/92Owlt/8UEN+a4TXR1FQetfIpJE8ApdvdVxTg==", + "dependencies": { + "jwa": "^2.0.0", + "safe-buffer": "^5.0.1" + } + }, "node_modules/has-flag": { "version": "3.0.0", "resolved": "https://registry.npmjs.org/has-flag/-/has-flag-3.0.0.tgz", @@ -977,6 +1131,18 @@ "node": ">= 0.8" } }, + "node_modules/https-proxy-agent": { + "version": "7.0.6", + "resolved": "https://registry.npmjs.org/https-proxy-agent/-/https-proxy-agent-7.0.6.tgz", + "integrity": "sha512-vK9P5/iUfdl95AI+JVyUuIcVtd4ofvtrOr3HNtM2yxC9bnMbEdp3x01OhQNnjb8IJYi38VlTE3mBXwcfvywuSw==", + "dependencies": { + "agent-base": "^7.1.2", + "debug": "4" + }, + "engines": { + "node": ">= 14" + } + }, "node_modules/humanize-ms": { "version": "1.2.1", "resolved": "https://registry.npmjs.org/humanize-ms/-/humanize-ms-1.2.1.tgz", @@ -1057,6 +1223,25 @@ "node": ">=0.12.0" } }, + "node_modules/is-stream": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/is-stream/-/is-stream-2.0.1.tgz", + "integrity": "sha512-hFoiJiTl63nn+kstHGBtewWSKnQLpyb155KHheA1l39uvtO9nWIop1p3udqPcUd/xbF1VLMO4n7OI6p7RbngDg==", + "engines": { + "node": ">=8" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/json-bigint": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/json-bigint/-/json-bigint-1.0.0.tgz", + "integrity": "sha512-SiPv/8VpZuWbvLSMtTDU8hEfrZWg/mH/nV/b4o0CYbSxu1UIQPLdwKOCIyLQX+VIPO5vrLX3i8qtqFyhdPSUSQ==", + "dependencies": { + "bignumber.js": "^9.0.0" + } + }, "node_modules/jsonwebtoken": { "version": "9.0.2", "resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz", @@ -1922,6 +2107,18 @@ "node": ">= 0.4.0" } }, + "node_modules/uuid": { + "version": "9.0.1", + "resolved": "https://registry.npmjs.org/uuid/-/uuid-9.0.1.tgz", + "integrity": "sha512-b+1eJOlsR9K8HJpow9Ok3fiWOWSIcIzXodvv0rQjVoOVNpWMpxf1wZNpt4y9h10odCNrqnYp1OBzRktckBe3sA==", + "funding": [ + "https://github.com/sponsors/broofa", + "https://github.com/sponsors/ctavan" + ], + "bin": { + "uuid": "dist/bin/uuid" + } + }, "node_modules/v8-compile-cache-lib": { "version": "3.0.1", "resolved": "https://registry.npmjs.org/v8-compile-cache-lib/-/v8-compile-cache-lib-3.0.1.tgz", @@ -1958,6 +2155,26 @@ "webidl-conversions": "^3.0.0" } }, + "node_modules/ws": { + "version": "8.18.3", + "resolved": "https://registry.npmjs.org/ws/-/ws-8.18.3.tgz", + "integrity": "sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg==", + "engines": { + "node": ">=10.0.0" + }, + "peerDependencies": { + "bufferutil": "^4.0.1", + "utf-8-validate": ">=5.0.2" + }, + "peerDependenciesMeta": { + "bufferutil": { + "optional": true + }, + "utf-8-validate": { + "optional": true + } + } + }, "node_modules/xtend": { "version": "4.0.2", "resolved": "https://registry.npmjs.org/xtend/-/xtend-4.0.2.tgz", diff --git a/package.json b/package.json index 712d202..f4594d4 100644 --- a/package.json +++ b/package.json @@ -7,7 +7,8 @@ "dev": "nodemon --watch 'src/**/*.ts' --exec 'ts-node' src/server.ts", "build": "tsc", "start": "node dist/server.js", - "test": "echo \"Error: no test specified\" && exit 1" + "test": "echo \"Error: no test specified\" && exit 1", + "db:migrate:pgtrgm": "psql \"$DATABASE_URL\" -f docs/migrations/2025-01-pgtrgm.sql" }, "keywords": [], "author": "", @@ -24,6 +25,7 @@ }, "dependencies": { "@dqbd/tiktoken": "^1.0.13", + "@google/genai": "^0.2.0", "cors": "^2.8.5", "dotenv": "^16.4.5", "express": "^4.19.2", diff --git a/readme.md b/readme.md index a656468..fa301f2 100644 --- a/readme.md +++ b/readme.md @@ -87,3 +87,45 @@ CREATE INDEX idx_post_title_embeddings_embedding ## 4. AI 서버 작동 ### `.env` 파일 작성 (루트 디렉토리 `BUBBLOG_AI`에 위치) + +--- + +## 5. 텍스트 검색 인덱스 (pg_trgm) + +하이브리드 검색의 키워드/부분일치 성능 향상을 위해 `pg_trgm` 확장 및 GIN 인덱스를 추가합니다. + +### 5.1 확장 및 인덱스 생성 스크립트 + +프로젝트에 제공된 스크립트를 사용하세요: + +`docs/migrations/2025-01-pgtrgm.sql` + +내용: + +```sql +CREATE EXTENSION IF NOT EXISTS pg_trgm; +CREATE INDEX IF NOT EXISTS idx_pc_content_trgm ON post_chunks USING gin (content gin_trgm_ops); +CREATE INDEX IF NOT EXISTS idx_bp_title_trgm ON blog_post USING gin (title gin_trgm_ops); +``` + +### 5.2 적용 방법 + +- 환경변수 `DATABASE_URL`이 설정된 경우: + +```bash +psql "$DATABASE_URL" -f docs/migrations/2025-01-pgtrgm.sql +``` + +- 또는 npm 스크립트 사용: + +```bash +npm run db:migrate:pgtrgm +``` + +- 또는 수동 실행(PostgreSQL 쉘 접속 후): + +```sql +\i docs/migrations/2025-01-pgtrgm.sql +``` + +주의: 인덱스는 쓰기 비용과 디스크 사용량을 증가시킵니다. 텍스트 검색에 사용하는 컬럼(`post_chunks.content`, `blog_post.title`)에만 생성하세요. diff --git a/src/app.ts b/src/app.ts index 6968488..dfd79c4 100644 --- a/src/app.ts +++ b/src/app.ts @@ -1,6 +1,7 @@ import express, { Express, Request, Response, NextFunction } from 'express'; import cors from 'cors'; import aiRouter from './routes/ai.routes'; +import aiV2Router from './routes/ai.v2.routes'; const app: Express = express(); @@ -21,6 +22,7 @@ app.get('/', (request: Request, response: Response) => { }); app.use('/ai', aiRouter); +app.use('/ai/v2', aiV2Router); // Central Error Handler app.use((err: Error, req: Request, res: Response, next: NextFunction) => { diff --git a/src/config.ts b/src/config.ts index 18993cc..5bd402a 100644 --- a/src/config.ts +++ b/src/config.ts @@ -12,7 +12,13 @@ const configSchema = z.object({ TOKEN_AUDIENCE: z.string().default('bubblog'), ALGORITHM: z.string().default('HS256'), EMBED_MODEL: z.string().default('text-embedding-3-small'), - CHAT_MODEL: z.string().default('gpt-4o'), + // 기본 LLM 모델: GPT-5 계열 + CHAT_MODEL: z.string().default('gpt-5-mini'), + GEMINI_API_KEY: z.string().optional(), + GEMINI_CHAT_MODEL: z.string().default('gemini-2.5-flash'), + GEMINI_THINKING_BUDGET: z.string().optional(), + LLM_COST_LOG: z.string().default('false'), + LLM_COST_ROUND: z.coerce.number().default(4), }); const config = configSchema.parse(process.env); diff --git a/src/controllers/ai.controller.ts b/src/controllers/ai.controller.ts index e6ab029..d276f23 100644 --- a/src/controllers/ai.controller.ts +++ b/src/controllers/ai.controller.ts @@ -49,14 +49,48 @@ export const askHandler = async ( next: NextFunction ) => { try { - const { question, user_id, category_id, speech_tone, post_id } = req.body; + const { question, user_id, category_id, speech_tone, post_id, llm } = req.body as any; - res.setHeader('Content-Type', 'text/event-stream'); - res.setHeader('Cache-Control', 'no-cache'); + // SSE headers and anti-buffering hints + res.setHeader('Content-Type', 'text/event-stream; charset=utf-8'); + res.setHeader('Cache-Control', 'no-cache, no-transform'); res.setHeader('Connection', 'keep-alive'); + // Nginx buffering off + res.setHeader('X-Accel-Buffering', 'no'); + // Flush headers early so clients start processing immediately + (res as any).flushHeaders?.(); + // Reduce Nagle’s algorithm buffering on the socket for faster flush + (res.socket as any)?.setNoDelay?.(true); + // Prime the SSE stream to break proxy buffering thresholds + res.write(':ok\n\n'); - const stream = await answerStream(question, user_id, category_id, speech_tone, post_id); - stream.pipe(res); + const stream = await answerStream(question, user_id, category_id, speech_tone, post_id, llm); + // Manually bridge to ensure flushing of SSE deltas + stream.on('data', (chunk) => { + const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(String(chunk)); + res.write(buf); + const canFlush = typeof (res as any).flush === 'function'; + // try to flush if supported by runtime/middleware + (res as any).flush?.(); + try { + console.log( + JSON.stringify({ type: 'debug.sse.write', at: Date.now(), bytes: buf.length, flushed: canFlush }) + ); + } catch {} + }); + stream.on('end', () => { + res.end(); + }); + stream.on('error', () => { + res.end(); + }); + + // Cleanup if client disconnects + req.on('close', () => { + try { + stream.destroy(); + } catch {} + }); } catch (error) { next(error); diff --git a/src/controllers/ai.v2.controller.ts b/src/controllers/ai.v2.controller.ts new file mode 100644 index 0000000..ed1f579 --- /dev/null +++ b/src/controllers/ai.v2.controller.ts @@ -0,0 +1,29 @@ +import { Request, Response, NextFunction } from 'express'; +import { AskV2Request } from '../types/ai.v2.types'; +import { answerStreamV2 } from '../services/qa.v2.service'; + +export const askV2Handler = async ( + req: Request<{}, {}, AskV2Request>, + res: Response, + next: NextFunction +) => { + try { + const { question, user_id, category_id, speech_tone, post_id, llm } = req.body as any; + res.setHeader('Content-Type', 'text/event-stream'); + res.setHeader('Cache-Control', 'no-cache'); + res.setHeader('Connection', 'keep-alive'); + + const stream = await answerStreamV2( + question, + user_id, + category_id, + speech_tone, + post_id, + llm + ); + stream.pipe(res); + } catch (error) { + next(error); + } +}; + diff --git a/src/llm/index.ts b/src/llm/index.ts new file mode 100644 index 0000000..3c67eb4 --- /dev/null +++ b/src/llm/index.ts @@ -0,0 +1,162 @@ +import { PassThrough } from 'stream'; +import { GenerateRequest } from './types'; +import { getDefaultChat } from './model-registry'; +import { generateOpenAIStream } from './providers/openai-responses'; +import { generateGeminiStream } from './providers/gemini'; +import { countChatMessagesTokens, countTextTokens } from '../utils/tokenizer'; +import { calcCost, getModelPricing } from '../utils/cost'; +import config from '../config'; +import { randomUUID } from 'crypto'; + +export const generate = async (req: GenerateRequest): Promise => { + const merged = { ...req }; + if (!merged.provider || !merged.model) { + const def = getDefaultChat(); + merged.provider = merged.provider || def.provider; + merged.model = merged.model || def.modelId; + } + + const doLog = (config.LLM_COST_LOG || '').toString().toLowerCase() === 'true'; + const round = config.LLM_COST_ROUND ?? 4; + const corrId = randomUUID(); + const model = merged.model as string; + const provider = merged.provider as string; + + // Pre-call logging: prompt tokens + estimated input cost + const messages = merged.messages || []; + let promptTokens = 0; + try { + promptTokens = countChatMessagesTokens(messages as any, model); + } catch { + // ignore + } + const pricing = getModelPricing(model); + const estInputCost = pricing ? calcCost(promptTokens, pricing.input_per_1k) : 0; + if (doLog) { + const pre = { + type: 'llm.request', + corrId, + provider, + model, + promptTokens, + estInputCost, + userId: merged.meta?.userId, + categoryId: merged.meta?.categoryId, + postId: merged.meta?.postId, + }; + console.log(JSON.stringify(pre)); + } + + const startedAt = Date.now(); + + const providerStream = + merged.provider === 'openai' + ? await generateOpenAIStream(merged) + : merged.provider === 'gemini' + ? await generateGeminiStream(merged) + : (() => { + const s = new PassThrough(); + s.write(`event: error\n`); + s.write(`data: ${JSON.stringify({ message: 'Unknown provider' })}\n\n`); + s.end(); + return s; + })(); + + // Wrap provider stream to accumulate output tokens + const outer = new PassThrough(); + let buffer = ''; + let outputText = ''; + + // Debug: start info + try { + console.log( + JSON.stringify({ type: 'debug.llm.start', provider, model, messages: (messages || []).length }) + ); + } catch {} + + const flushBuffer = () => { + // Split by double newline to get SSE events + const chunks = buffer.split('\n\n'); + // Keep last partial + buffer = chunks.pop() || ''; + for (const block of chunks) { + const lines = block.split('\n'); + let evt: string | null = null; + let dataLine: string | null = null; + for (const line of lines) { + if (line.startsWith('event:')) evt = line.slice(6).trim(); + if (line.startsWith('data:')) dataLine = line.slice(5).trim(); + } + if (evt === 'answer' && dataLine) { + try { + const parsed = JSON.parse(dataLine); + if (typeof parsed === 'string') outputText += parsed; + else outputText += JSON.stringify(parsed); + } catch { + outputText += dataLine; + } + } + outer.write(block + '\n\n'); + } + }; + + providerStream.on('data', (chunk) => { + const str = Buffer.isBuffer(chunk) ? chunk.toString('utf8') : String(chunk); + buffer += str; + flushBuffer(); + }); + providerStream.on('end', () => { + if (buffer.length > 0) { + outer.write(buffer); + buffer = ''; + } + + const completionTokens = (() => { + try { + return countTextTokens(outputText, model); + } catch { + return 0; + } + })(); + const durationMs = Date.now() - startedAt; + if (doLog) { + const inputCost = pricing ? calcCost(promptTokens, pricing.input_per_1k) : 0; + const outputCost = pricing ? calcCost(completionTokens, pricing.output_per_1k) : 0; + const totalCost = inputCost + outputCost; + const post = { + type: 'llm.response', + corrId, + provider, + model, + promptTokens, + completionTokens, + inputCost, + outputCost, + totalCost, + durationMs, + }; + console.log(JSON.stringify(post)); + } + try { + console.log( + JSON.stringify({ type: 'debug.llm.end', provider, model, durationMs, outputChars: outputText.length }) + ); + } catch {} + outer.end(); + }); + providerStream.on('error', (e) => { + if (doLog) { + console.log( + JSON.stringify({ type: 'llm.error', corrId, provider, model, message: (e as any)?.message || 'error' }) + ); + } + try { + console.error( + JSON.stringify({ type: 'debug.llm.error', provider, model, message: (e as any)?.message || 'error' }) + ); + } catch {} + outer.emit('error', e); + }); + + return outer; +}; diff --git a/src/llm/model-registry.ts b/src/llm/model-registry.ts new file mode 100644 index 0000000..14cbdfe --- /dev/null +++ b/src/llm/model-registry.ts @@ -0,0 +1,12 @@ +import { ProviderName } from './types'; + +type ModelEntry = { + provider: ProviderName; + modelId: string; +}; + +// Minimal registry for now; can expand with tokenizer/pricing later. +const DEFAULT_CHAT: ModelEntry = { provider: 'openai', modelId: 'gpt-5-mini' }; + +export const getDefaultChat = (): ModelEntry => DEFAULT_CHAT; + diff --git a/src/llm/providers/gemini.ts b/src/llm/providers/gemini.ts new file mode 100644 index 0000000..a399317 --- /dev/null +++ b/src/llm/providers/gemini.ts @@ -0,0 +1,75 @@ +import { PassThrough } from 'stream'; +import config from '../../config'; +import { GenerateRequest } from '../types'; + +// Using @google/genai per project plan; keep types loose for compatibility +// eslint-disable-next-line @typescript-eslint/no-var-requires +const { GoogleGenAI } = require('@google/genai'); + +const buildPromptFromMessages = (messages: { role: string; content: string }[]) => { + // Simple concatenation preserving roles + return messages + .map((m) => `[${m.role}]\n${m.content}`) + .join('\n\n'); +}; + +export const generateGeminiStream = async (req: GenerateRequest): Promise => { + const stream = new PassThrough(); + try { + const modelId = req.model || process.env.GEMINI_CHAT_MODEL || 'gemini-2.5-flash'; + const apiKey = (config as any).GEMINI_API_KEY || process.env.GEMINI_API_KEY; + if (!apiKey) { + stream.write(`event: error\n`); + stream.write(`data: ${JSON.stringify({ message: 'Gemini API key not configured' })}\n\n`); + stream.end(); + return stream; + } + + const ai = new GoogleGenAI({ apiKey }); + + const text = buildPromptFromMessages(req.messages || []); + + const generationConfig: any = {}; + if (req.options?.temperature != null) generationConfig.temperature = req.options.temperature; + if (req.options?.top_p != null) generationConfig.topP = req.options.top_p; + if (req.options?.max_output_tokens != null) generationConfig.maxOutputTokens = req.options.max_output_tokens; + + const thinkingBudget = parseInt(process.env.GEMINI_THINKING_BUDGET || '0', 10) || 0; + const configBlock: any = thinkingBudget > 0 ? { thinkingConfig: { thinkingBudget } } : {}; + + // Non-streaming first, then chunk SSE + const result = await ai.models.generateContent({ + model: modelId, + contents: [ + { + role: 'user', + parts: [{ text }], + }, + ], + generationConfig, + config: configBlock, + }); + + // Try common text access paths + const outputText = (result?.response?.text && result.response.text()) || (result?.text && result.text()) || ''; + + const finalText = typeof outputText === 'string' ? outputText : ''; + + const chunkSize = 400; + for (let i = 0; i < finalText.length; i += chunkSize) { + const chunk = finalText.slice(i, i + chunkSize); + stream.write(`event: answer\n`); + stream.write(`data: ${JSON.stringify(chunk)}\n\n`); + } + stream.write(`event: end\n`); + stream.write(`data: [DONE]\n\n`); + stream.end(); + return stream; + } catch (err) { + stream.write(`event: error\n`); + stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`); + stream.end(); + return stream; + } +}; + diff --git a/src/llm/providers/openai-responses.ts b/src/llm/providers/openai-responses.ts new file mode 100644 index 0000000..65cc1ec --- /dev/null +++ b/src/llm/providers/openai-responses.ts @@ -0,0 +1,320 @@ +import { PassThrough } from 'stream'; +import OpenAI from 'openai'; +import config from '../../config'; +import { GenerateRequest, OpenAIStyleMessage, OpenAIStyleTool } from '../types'; + +const openai = new OpenAI({ apiKey: config.OPENAI_API_KEY }); + +const toResponsesInput = (messages: OpenAIStyleMessage[] = []) => { + // Convert simple chat-style messages to Responses API input format + // Responses API expects 'input_text' as the content type (not 'text'). + return messages.map((m) => ({ + role: m.role, + content: [{ type: 'input_text', text: m.content }], + })); +}; + +const toResponsesTools = (tools: OpenAIStyleTool[] = []) => { + // Map Chat Completions style tool definitions to Responses API format + // Chat: { type: 'function', function: { name, description, parameters } } + // Responses: { type: 'function', name, description, parameters } + return tools + .filter((t) => t && (t as any).type === 'function' && (t as any).function?.name) + .map((t) => ({ + type: 'function', + name: t.function.name, + description: t.function.description, + parameters: t.function.parameters, + })); +}; + +export const generateOpenAIStream = async (req: GenerateRequest): Promise => { + const stream = new PassThrough(); + // Guard to avoid writing after stream end + let closed = false; + const safeWrite = (chunk: string) => { + if (!closed && !stream.writableEnded && !stream.destroyed) { + stream.write(chunk); + } + }; + const safeEnd = () => { + if (!closed && !stream.writableEnded && !stream.destroyed) { + closed = true; + stream.end(); + } else { + closed = true; + } + }; + const model = req.model || 'gpt-5-mini'; + const messages = req.messages || []; + const toolsChat = (req.tools || []) as OpenAIStyleTool[]; + + // For gpt-5-* prefer Responses API. For other models, fall back to Chat Completions streaming. + const isGpt5Family = /(^|\b)gpt-5/i.test(model); + + // Debug: basic call info + try { + console.log( + JSON.stringify({ + type: 'debug.openai.start', + model, + isGpt5Family, + hasTools: Array.isArray(req.tools) && req.tools.length > 0, + options: { + temperature: req.options?.temperature, + top_p: req.options?.top_p, + max_output_tokens: req.options?.max_output_tokens, + reasoning_effort: (req as any)?.options?.reasoning_effort, + text_verbosity: (req as any)?.options?.text_verbosity, + }, + }) + ); + } catch {} + + try { + if (isGpt5Family) { + // Prefer Responses API streaming for gpt-5 + try { + const respParams: any = { + model, + input: toResponsesInput(messages) as any, + tools: toolsChat && toolsChat.length > 0 ? toResponsesTools(toolsChat) : undefined, + max_output_tokens: req.options?.max_output_tokens, + }; + // GPT-5 family: omit temperature/top_p; allow reasoning/text controls + if (req.options?.reasoning_effort) { + respParams.reasoning = { effort: req.options.reasoning_effort }; + } else { + // 기본값: 생각(추론) 강도를 최소화하여 지연을 줄임 + respParams.reasoning = { effort: 'minimal' }; + } + if (req.options?.text_verbosity) { + respParams.text = { verbosity: req.options.text_verbosity }; + } else { + // Encourage text output on GPT-5 if not specified + respParams.text = { verbosity: 'low' }; + } + try { + console.log( + JSON.stringify({ type: 'debug.openai.path', path: 'responses.stream', paramsKeys: Object.keys(respParams) }) + ); + } catch {} + const responsesStream: any = await (openai as any).responses.stream(respParams); + + // let loggedFirstDelta = false; + responsesStream.on('response.output_text.delta', (ev: any) => { + const text = typeof ev === 'string' ? ev : ev?.delta ?? ''; + if (text) { + safeWrite(`event: answer\n`); + safeWrite(`data: ${JSON.stringify(text)}\n\n`); + // try { console.log(JSON.stringify({ type: 'debug.openai.delta', len: String(text).length, at: Date.now() })); } catch {} + // if (!loggedFirstDelta) { + // try { console.log(JSON.stringify({ type: 'debug.openai.delta', len: String(text).length })); } catch {} + // loggedFirstDelta = true; + // } + } + }); + + // Stream tool-call arguments as answer chunks to maintain SSE shape + responsesStream.on('response.tool_call.delta', (ev: any) => { + const argsDelta = ev?.arguments_delta || ev?.arguments || ev?.delta || ''; + if (argsDelta) { + safeWrite(`event: answer\n`); + safeWrite(`data: ${JSON.stringify(argsDelta)}\n\n`); + } + }); + // Also handle non-delta tool_call events + responsesStream.on('response.tool_call', (ev: any) => { + const args = ev?.arguments || ev?.arguments_delta || ''; + if (args) { + safeWrite(`event: answer\n`); + safeWrite(`data: ${JSON.stringify(args)}\n\n`); + } + }); + + // Catch-all messages to ensure we don't miss alternative text events + responsesStream.on('message', (msg: any) => { + try { + const m = typeof msg === 'string' ? JSON.parse(msg) : msg; + if (!m) return; + // Prefer explicit output_text delta + if (m.type === 'response.output_text.delta' && m.delta) { + safeWrite(`event: answer\n`); + safeWrite(`data: ${JSON.stringify(m.delta)}\n\n`); + } + // Some SDKs may emit full output_text chunk at once + else if (m.type === 'response.output_text' && typeof m.text === 'string') { + safeWrite(`event: answer\n`); + safeWrite(`data: ${JSON.stringify(m.text)}\n\n`); + } + // Generic delta fallback + else if (m.type === 'response.delta' && typeof m.delta === 'string') { + safeWrite(`event: answer\n`); + safeWrite(`data: ${JSON.stringify(m.delta)}\n\n`); + } + // Log for visibility + console.log( + JSON.stringify({ type: 'debug.openai.msg', mtype: m.type, keys: Object.keys(m || {}) }) + ); + } catch (e) { + try { console.log(JSON.stringify({ type: 'debug.openai.msg_parse_error' })); } catch {} + } + }); + + responsesStream.on('response.completed', () => { + safeWrite(`event: end\n`); + safeWrite(`data: [DONE]\n\n`); + safeEnd(); + try { console.log(JSON.stringify({ type: 'debug.openai.completed' })); } catch {} + }); + + responsesStream.on('error', (e: any) => { + safeWrite(`event: error\n`); + safeWrite(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`); + safeEnd(); + try { + console.error( + JSON.stringify({ type: 'debug.openai.error', path: 'responses.stream', message: (e as any)?.message }) + ); + } catch {} + }); + + // Do not await completion here; return immediately so callers can consume deltas in real-time + (async () => { + try { + await responsesStream.done(); + } catch {} + })(); + return stream; + } catch (e) { + // Fallback to non-streaming Responses if streaming path fails + try { + const createParams: any = { + model, + input: toResponsesInput(messages) as any, + // Avoid tools in non-streaming mode to ensure text output + max_output_tokens: req.options?.max_output_tokens, + }; + if (req.options?.reasoning_effort) createParams.reasoning = { effort: req.options.reasoning_effort }; + else createParams.reasoning = { effort: 'low' }; + if (req.options?.text_verbosity) createParams.text = { verbosity: req.options.text_verbosity }; + try { + console.log( + JSON.stringify({ type: 'debug.openai.path', path: 'responses.create', paramsKeys: Object.keys(createParams) }) + ); + } catch {} + const response = await openai.responses.create(createParams); + const text = (response as any).output_text ?? ''; + const answerText = typeof text === 'string' ? text : ''; + const fallbackText = (() => { + try { + const outputs = (response as any).output || []; + if (Array.isArray(outputs) && outputs.length > 0) { + const parts = outputs + .flatMap((o: any) => o.content || []) + .filter((c: any) => c.type === 'output_text') + .map((c: any) => c.text) + .join(''); + return parts || ''; + } + } catch { + // ignore + } + return ''; + })(); + const finalText = answerText || fallbackText; + const chunkSize = 400; + for (let i = 0; i < finalText.length; i += chunkSize) { + const chunk = finalText.slice(i, i + chunkSize); + safeWrite(`event: answer\n`); + safeWrite(`data: ${JSON.stringify(chunk)}\n\n`); + } + safeWrite(`event: end\n`); + safeWrite(`data: [DONE]\n\n`); + safeEnd(); + return stream; + } catch (e2) { + // fall through to chat completions streaming below + try { + console.error( + JSON.stringify({ type: 'debug.openai.error', path: 'responses.create', message: (e2 as any)?.message }) + ); + } catch {} + } + } + } + + // Chat Completions streaming as universal fallback + try { console.log(JSON.stringify({ type: 'debug.openai.path', path: 'chat.completions.stream' })); } catch {} + let chatStream: any; + + // temperature/top_p are not supported on reasoning models (e.g., GPT-5 family) + if (!isGpt5Family) { + chatStream = await openai.chat.completions.create({ + model, + messages: messages as any, + tools: (req.tools as OpenAI.Chat.Completions.ChatCompletionTool[]) || undefined, + tool_choice: req.tools && req.tools.length > 0 ? 'auto' : undefined, + stream: true, + temperature: req.options?.temperature, + top_p: req.options?.top_p, + max_tokens: req.options?.max_output_tokens as any, + }); + } else { + chatStream = await openai.chat.completions.create({ + model, + messages: messages as any, + tools: (req.tools as OpenAI.Chat.Completions.ChatCompletionTool[]) || undefined, + tool_choice: req.tools && req.tools.length > 0 ? 'auto' : undefined, + stream: true, + max_tokens: req.options?.max_output_tokens as any, + }); + } + + // Iterate asynchronously; return stream immediately to allow real-time consumption + (async () => { + try { + for await (const chunk of chatStream) { + const content = chunk.choices[0]?.delta?.content || ''; + const toolCalls = chunk.choices[0]?.delta?.tool_calls; + + if (toolCalls) { + for (const toolCall of toolCalls) { + if (toolCall.function?.arguments) { + safeWrite(`event: answer\n`); + safeWrite(`data: ${JSON.stringify(toolCall.function.arguments)}\n\n`); + } + } + } else if (content) { + safeWrite(`event: answer\n`); + safeWrite(`data: ${JSON.stringify(content)}\n\n`); + } + + if (chunk.choices[0]?.finish_reason) { + safeWrite(`event: end\n`); + safeWrite(`data: [DONE]\n\n`); + safeEnd(); + try { console.log(JSON.stringify({ type: 'debug.openai.completed', path: 'chat.completions.stream' })); } catch {} + break; + } + } + } catch (e) { + safeWrite(`event: error\n`); + safeWrite(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`); + safeEnd(); + } + })(); + + return stream; + } catch (err) { + safeWrite(`event: error\n`); + safeWrite(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`); + safeEnd(); + try { + console.error( + JSON.stringify({ type: 'debug.openai.error', path: 'top', message: (err as any)?.message, model, isGpt5Family }) + ); + } catch {} + return stream; + } +}; diff --git a/src/llm/types.ts b/src/llm/types.ts new file mode 100644 index 0000000..3c12a6f --- /dev/null +++ b/src/llm/types.ts @@ -0,0 +1,35 @@ +export type ProviderName = 'openai' | 'gemini'; + +export type OpenAIStyleMessage = { + role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; + content: string; +}; + +export type OpenAIStyleTool = { + type: 'function'; + function: { + name: string; + description?: string; + parameters?: Record; + }; +}; + +export type GenerateRequest = { + provider?: ProviderName; + model?: string; + messages?: OpenAIStyleMessage[]; + tools?: OpenAIStyleTool[]; + options?: { + temperature?: number; + top_p?: number; + max_output_tokens?: number; + // GPT-5 family specific controls + reasoning_effort?: 'minimal' | 'low' | 'medium' | 'high'; + text_verbosity?: 'low' | 'medium' | 'high'; + }; + meta?: { + userId?: string; + categoryId?: number; + postId?: number; + }; +}; diff --git a/src/prompts/qa.prompts.ts b/src/prompts/qa.prompts.ts index 010de57..599e46d 100644 --- a/src/prompts/qa.prompts.ts +++ b/src/prompts/qa.prompts.ts @@ -7,7 +7,20 @@ export const createPostContextPrompt = ( question: string, speechTonePrompt: string ): OpenAI.Chat.Completions.ChatCompletionMessageParam[] => { - const systemPrompt = `너는 사용자의 블로그 글 컨텍스트만으로 답변한다. 컨텍스트에 없는 사실은 추정하지 말고 “문서에 없음”이라고 말한다. 말투는 speech_tone 지시에 따른다.`; + const systemPrompt = ` 당신은 블로그 운영자 AI입니다. 사용자의 블로그에 대한 질문에 답변합니다. + 블로그 운영자 AI는 사용자의 질문에 대해 블로그 본문 컨텍스트를 참고하여 답변합니다. + 모든 한국어 응답은 무슨일이 있어도 반드시 답변 말투 및 규칙을 따릅니다. + 또한 주어진 내용외의 내용을 지어내지 마십시오. + + [말투 지침] + ${speechTonePrompt} + - 위 말투/지시문을 출력에 절대 노출하지 말고(예: "말투", "규칙" 등 언급 금지), 실제 답변 내용에만 반영하십시오. + + [응답 규칙] + 1. 만약 제목과 본문을 활용해 답변할 수 있다면 답변 말투 및 규칙을 지켜 직접 답변하고, 마지막에 추가적인 내용에 대한 질문을 유도하는 문장을 추가합니다. + 2. 만약 질문이 욕설·비난·무관·부적절하거나 주어진 제목, 본문과 관련이 없다면 사과와 블로그 관련된 내용만 답변 가능하다는 내용을 답변 말투 및 규칙을 지켜 답합니다. + 3. 질문이 블로그 카테고리나 사용자 블로그에는 부합하지만 제공된 본문 컨텍스트의 내용이 매우 부족하거나 적절하지 않다고 판단되면, 함수명이나 함수 호출을 절대 언급하지 말고(예: "report_content_insufficient" 같은 문자열이나 괄호 "()" 출력 금지), 답변 말투 및 규칙을 지켜 자연스럽게 다음을 수행합니다: (a) 현재로서는 본문 컨텍스트가 부족함을 간단히 안내하고, (b) 질문과 직접 관련된 정보를 구체적으로 요청합니다(예: 게시일, 최근 글 목록, 최신 포스트의 제목과 날짜 등). 서버가 내부 함수를 제공하는 경우에도 사용자가 보는 답변에는 함수명/호출을 절대 노출하지 않습니다. + `; const userMessage = ` [context] 제목: ${post.title} @@ -18,9 +31,6 @@ ${processedContent} [user] ${question} - -[instruction] -답변 말투: "${speechTonePrompt}" `; return [ @@ -40,14 +50,16 @@ export const createRagPrompt = ( 모든 한국어 응답은 무슨일이 있어도 반드시 답변 말투 및 규칙을 따릅니다. 또한 주어진 내용외의 내용을 지어내지 마십시오. + [말투 지침] + ${speechTonePrompt} + - 위 말투/지시문을 출력에 절대 노출하지 말고(예: "말투", "규칙" 등 언급 금지), 실제 답변 내용에만 반영하십시오. + [응답 규칙] 1. 만약 제목과 본문을 활용해 답변할 수 있다면 답변 말투 및 규칙을 지켜 직접 답변하고, 마지막에 추가적인 내용에 대한 질문을 유도하는 문장을 추가합니다. 2. 만약 질문이 욕설·비난·무관·부적절하거나 주어진 제목, 본문과 관련이 없다면 사과와 블로그 관련된 내용만 답변 가능하다는 내용을 답변 말투 및 규칙을 지켜 답합니다. - 3. 질문이 블로그 카테고리나 사용자 블로그에는 부합하지만 제공된 본문 컨텍스트의 내용이 매우 부족하거나 적절하지 않다고 판단되면 report_content_insufficient 함수를 호출하고 답변 말투 및 규칙을 지켜 해당 내용이 아직 부족하다는 안내를 합니다. 그 후 본문 컨텍스트를 참고해 질문과 관련된 답변할 수 있는 내용을 언급하고 해당 내용에 대한 질문을 직접적으로 유도합니다. + 3. 질문이 블로그 카테고리나 사용자 블로그에는 부합하지만 제공된 본문 컨텍스트의 내용이 매우 부족하거나 적절하지 않다고 판단되면, 함수명이나 함수 호출을 절대 언급하지 말고(예: "report_content_insufficient" 같은 문자열이나 괄호 "()" 출력 금지), 답변 말투 및 규칙을 지켜 자연스럽게 다음을 수행합니다: (a) 현재로서는 본문 컨텍스트가 부족함을 간단히 안내하고, (b) 질문과 직접 관련된 정보를 구체적으로 요청합니다(예: 게시일, 최근 글 목록, 최신 포스트의 제목과 날짜 등). 서버가 내부 함수를 제공하는 경우에도 사용자가 보는 답변에는 함수명/호출을 절대 노출하지 않습니다. `; const userMessage = ` - 답변 말투 및 규칙: "${speechTonePrompt}" - 반드시 말투 및 규칙에 따라 대답하세요! 아래의 질문과 블로그 본문 컨텍스트를 참고하여 답변하세요. 사용자의 질문: ${question} 가장 근접한 블로그 본문 컨텍스트: diff --git a/src/prompts/qa.v2.prompts.ts b/src/prompts/qa.v2.prompts.ts new file mode 100644 index 0000000..a752324 --- /dev/null +++ b/src/prompts/qa.v2.prompts.ts @@ -0,0 +1,117 @@ +// Search Plan prompt templates for v2 + +import { SearchPlan } from '../types/ai.v2.types'; + +export const getSearchPlanSchemaJson = (): Record => ({ + type: 'object', + additionalProperties: false, + properties: { + mode: { enum: ['rag', 'post'] }, + top_k: { type: 'integer', minimum: 1, maximum: 10 }, + threshold: { type: 'number', minimum: 0, maximum: 1 }, + weights: { + type: 'object', + additionalProperties: false, + properties: { + chunk: { type: 'number', minimum: 0, maximum: 1 }, + title: { type: 'number', minimum: 0, maximum: 1 }, + }, + required: ['chunk', 'title'], + }, + rewrites: { type: 'array', items: { type: 'string' } }, + keywords: { type: 'array', items: { type: 'string' } }, + hybrid: { + type: 'object', + additionalProperties: false, + properties: { + enabled: { type: 'boolean' }, + retrieval_bias: { enum: ['lexical', 'balanced', 'semantic'] }, + max_rewrites: { type: 'integer', minimum: 0, maximum: 4 }, + max_keywords: { type: 'integer', minimum: 0, maximum: 8 }, + }, + required: ['enabled', 'retrieval_bias', 'max_rewrites', 'max_keywords'], + }, + // Only time is allowed under filters. Responses JSON Schema requires closed objects + // with explicit required fields at each level. We constrain time to label-form only. + filters: { + type: 'object', + additionalProperties: false, + properties: { + time: { + type: 'object', + additionalProperties: false, + properties: { + type: { type: 'string', enum: ['label'] }, + label: { type: 'string', minLength: 1 }, + }, + required: ['type', 'label'], + }, + }, + required: ['time'], + }, + sort: { enum: ['created_at_desc', 'created_at_asc'] }, + limit: { type: 'integer', minimum: 1, maximum: 20 }, + }, + required: ['mode', 'top_k', 'threshold', 'weights', 'rewrites', 'keywords', 'hybrid', 'filters', 'sort', 'limit'], +}); + +export const buildSearchPlanPrompt = (params: { + now_utc: string; + now_kst: string; + timezone: string; + user_id: string; + category_id?: number; + post_id?: number; + defaults?: Partial; + question: string; +}): string => { + const defaults = JSON.stringify( + params.defaults || { + top_k: 5, + threshold: 0.2, + weights: { chunk: 0.7, title: 0.3 }, + sort: 'created_at_desc', + limit: 5, + }, + ); + + const schemaHint = JSON.stringify(getSearchPlanSchemaJson()); + + return [ + 'You are a Search Plan Generator for a Korean blogging platform.', + 'Your task is to read the user question and output ONLY a JSON object that defines a safe search plan.', + '', + `now_utc: ${params.now_utc}`, + `now_kst: ${params.now_kst}`, + `timezone: ${params.timezone}`, + `user_id: ${params.user_id}`, + `category_id: ${params.category_id ?? ''}`, + `post_id: ${params.post_id ?? ''}`, + '', + 'Rules:', + '1) Output ONLY a single JSON object matching the schema. No extra text.', + '2) Respect bounds: top_k 1..10, limit 1..20, threshold 0..1, weights in [0,1]. The server normalizes their sum.', + '3) Do NOT output any filters other than filters.time. The server injects user_id/category_id/post_id.', + ' - Your job: decide top_k, threshold, sort, limit; and optionally rewrites/keywords/hybrid only.', + '4) Mode must follow constraints: if post_id exists, use mode="post"; otherwise use mode="rag".', + '5) Time MUST be provided via a label only: filters.time = { "type": "label", "label": "..." }', + ' - Allowed labels (examples): "all_time"(no filter), "today", "yesterday", "last_7_days", "last_14_days", "last_30_days", "this_month", "last_month",', + ' or structured: "2006_to_now", "2019-2022", "2024-Q3", "Q3-2024", "2024-09", "2024".', + ' - For queries like "최근 글", prefer a short window label such as "last_7_days" or "last_30_days" (choose appropriately).', + '6) Do NOT include any temporal words or ranges inside rewrites/keywords. Temporal intent must live ONLY in filters.time.', + '7) If the question asks for N items (e.g., “N개”), set limit=N within bounds.', + '8) Keep weights to defaults unless a clear need implies otherwise.', + '8) When helpful for recall, set hybrid.enabled=true and choose hybrid.retrieval_bias ∈ {lexical, balanced, semantic}. Then generate concise rewrites (<= max_rewrites) and focused keywords (<= max_keywords).', + ' - lexical: keyword/정확 매칭이 중요할 때 (숫자, 버전, 고유명사 등).', + ' - balanced: 일반 질의.', + ' - semantic: 개념/요약/의도 중심일 때.', + '9) Avoid stop/common words (예: "글", "포스트", "블로그", "소개", "정리"). Keep within the user context; avoid over-broad topics or time spans.', + '10) Remove near-duplicates: if rewrites/keywords are synonymous or highly similar, include only one.', + '', + `Schema: ${schemaHint}`, + '', + `Question (Korean):\n${params.question}`, + '', + 'Respond with ONLY the JSON object. No markdown, no explanation.', + ].join('\n'); +}; diff --git a/src/repositories/post.repository.ts b/src/repositories/post.repository.ts index 545dc57..ffc17c0 100644 --- a/src/repositories/post.repository.ts +++ b/src/repositories/post.repository.ts @@ -9,6 +9,7 @@ export interface Post { tags: string[]; created_at: string; user_id: string; + is_public:boolean; } export interface SimilarChunk { @@ -18,14 +19,36 @@ export interface SimilarChunk { similarityScore: number; } +export interface TextSearchHit { + postId: string; + postTitle: string; + postChunk: string; + textScore: number; +} + // ========= READ QUERIES ========= export const findPostById = async (postId: number): Promise => { const pool = getDb(); - const { rows } = await pool.query( - 'SELECT id, title, content, tags, created_at, user_id FROM blog_post WHERE id = $1', + // Some databases may not have a `tags` column on blog_post. + // Select existing columns and populate `tags` as an empty array fallback. + const { rows } = await pool.query( + 'SELECT id, title, content, created_at, user_id, is_public FROM blog_post WHERE id = $1', [postId] ); - return rows.length > 0 ? rows[0] : null; + if (rows.length === 0) return null; + + const row = rows[0] as any; + const post: Post = { + id: row.id, + title: row.title, + content: row.content, + // Fallback: DB has no tags column; keep empty list so prompts render gracefully + tags: Array.isArray(row.tags) ? row.tags : [], + created_at: row.created_at, + user_id: row.user_id, + is_public: row.is_public, + }; + return post; }; export const findSimilarChunks = async ( @@ -131,4 +154,208 @@ export const storeContentEmbeddings = async ( await pool.query('ROLLBACK'); throw error; } -}; \ No newline at end of file +}; + +// ========= READ QUERIES (V2 dynamic) ========= +export const findSimilarChunksV2 = async (params: { + userId: string; + embedding: number[]; + categoryId?: number; + from?: string; // ISO UTC + to?: string; // ISO UTC + threshold?: number; // 0..1 + topK?: number; // default 5, max 10 + weights?: { chunk: number; title: number }; + sort?: 'created_at_desc' | 'created_at_asc'; +}): Promise => { + const pool = getDb(); + const wChunk = Math.max(0, Math.min(1, params.weights?.chunk ?? 0.7)); + const wTitle = Math.max(0, Math.min(1, params.weights?.title ?? 0.3)); + const thr = params.threshold != null ? Math.max(0, Math.min(1, params.threshold)) : 0.2; + const limit = Math.min(10, Math.max(1, params.topK ?? 5)); + + const parts: string[] = []; + const values: any[] = []; + + // $1: userId, $2: embedding + values.push(params.userId); + values.push(pgvector.toSql(params.embedding)); + + const hasCategory = typeof params.categoryId === 'number'; + const hasTime = !!(params.from && params.to); + + if (hasCategory) { + const catParam = values.length + 1; // next index + parts.push(` + WITH category_ids AS ( + SELECT DISTINCT cc.descendant_id + FROM category_closure cc + WHERE cc.ancestor_id = $${catParam} + ), + filtered_posts AS ( + SELECT bp.id AS post_id, bp.title AS post_title, bp.created_at + FROM blog_post bp + WHERE bp.user_id = $1 AND bp.category_id IN (SELECT descendant_id FROM category_ids) + )`); + values.push(params.categoryId); + } else { + parts.push(` + WITH filtered_posts AS ( + SELECT id AS post_id, title AS post_title, created_at + FROM blog_post + WHERE user_id = $1 + )`); + } + + // base select and threshold + const thrParam = values.length + 1; + parts.push(` + SELECT + fp.post_id, + fp.post_title, + pc.content AS post_chunk, + (${wChunk} * (1.0 - (pc.embedding <=> $2))) + (${wTitle} * (1.0 - (pte.embedding <=> $2))) AS similarity_score, + fp.created_at + FROM filtered_posts fp + JOIN post_chunks pc ON pc.post_id = fp.post_id + JOIN post_title_embeddings pte ON pte.post_id = fp.post_id + WHERE (1.0 - (pc.embedding <=> $2)) > $${thrParam} + `); + values.push(thr); + + if (hasTime) { + const fromParam = values.length + 1; + const toParam = values.length + 2; + parts.push(` AND fp.created_at BETWEEN $${fromParam} AND $${toParam}`); + values.push(params.from, params.to); + } + + let orderBy = 'similarity_score DESC'; + if (params.sort === 'created_at_desc') orderBy = 'similarity_score DESC, fp.created_at DESC'; + if (params.sort === 'created_at_asc') orderBy = 'similarity_score DESC, fp.created_at ASC'; + + const limitParam = values.length + 1; + parts.push(` ORDER BY ${orderBy} LIMIT $${limitParam}`); + values.push(limit); + + const sql = parts.join('\n'); + + const { rows } = await pool.query(sql, values); + return rows.map((row) => ({ + postId: row.post_id, + postTitle: row.post_title, + postChunk: row.post_chunk, + similarityScore: row.similarity_score, + })); +}; + +export const textSearchChunksV2 = async (params: { + userId: string; + query?: string; + keywords?: string[]; + categoryId?: number; + from?: string; + to?: string; + topK?: number; + sort?: 'created_at_desc' | 'created_at_asc'; +}): Promise => { + const pool = getDb(); + const limit = Math.min(10, Math.max(1, params.topK ?? 5)); + + const values: any[] = []; + values.push(params.userId); + + const hasCategory = typeof params.categoryId === 'number'; + const hasTime = !!(params.from && params.to); + const hasQuery = !!params.query && params.query.trim().length > 0; + const keywords = (params.keywords || []).filter((k) => typeof k === 'string' && k.trim().length > 0); + + const withParts: string[] = []; + if (hasCategory) { + const catParam = values.length + 1; + withParts.push(` + category_ids AS ( + SELECT DISTINCT cc.descendant_id FROM category_closure cc WHERE cc.ancestor_id = $${catParam} + ), + filtered_posts AS ( + SELECT bp.id AS post_id, bp.title AS post_title, bp.created_at + FROM blog_post bp + WHERE bp.user_id = $1 AND bp.category_id IN (SELECT descendant_id FROM category_ids) + )`); + values.push(params.categoryId); + } else { + withParts.push(` + filtered_posts AS ( + SELECT id AS post_id, title AS post_title, created_at + FROM blog_post + WHERE user_id = $1 + )`); + } + + let base = ` + SELECT + fp.post_id, + fp.post_title, + pc.content AS post_chunk, + 0::float8 AS content_sim, + 0::float8 AS title_sim, + fp.created_at + FROM filtered_posts fp + JOIN post_chunks pc ON pc.post_id = fp.post_id + `; + if (hasQuery) { + const qParam = values.length + 1; + base = ` + SELECT + fp.post_id, + fp.post_title, + pc.content AS post_chunk, + COALESCE(similarity(pc.content, $${qParam}), 0) AS content_sim, + COALESCE(similarity(fp.post_title, $${qParam}), 0) AS title_sim, + fp.created_at + FROM filtered_posts fp + JOIN post_chunks pc ON pc.post_id = fp.post_id + `; + values.push(params.query); + } + + const whereParts: string[] = []; + if (hasTime) { + const fromParam = values.length + 1; + const toParam = values.length + 2; + whereParts.push(`fp.created_at BETWEEN $${fromParam} AND $${toParam}`); + values.push(params.from, params.to); + } + + const likePatterns: string[] = []; + for (const k of keywords) { + likePatterns.push(`%${k}%`); + } + if (likePatterns.length > 0) { + const arrParam = values.length + 1; + whereParts.push(`(pc.content ILIKE ANY($${arrParam}) OR fp.post_title ILIKE ANY($${arrParam}))`); + values.push(likePatterns); + } + + const whereSql = whereParts.length > 0 ? `WHERE ${whereParts.join(' AND ')}` : ''; + + let orderBy = 'content_sim DESC'; + if (params.sort === 'created_at_desc') orderBy = 'content_sim DESC, fp.created_at DESC'; + if (params.sort === 'created_at_asc') orderBy = 'content_sim DESC, fp.created_at ASC'; + + const limitParam = values.length + 1; + const sql = `${withParts.length > 0 ? 'WITH ' + withParts.join(',\n') : ''} +${base} +${whereSql} +ORDER BY ${orderBy} +LIMIT $${limitParam}`; + values.push(limit); + + const { rows } = await pool.query(sql, values); + return rows.map((row) => ({ + postId: row.post_id, + postTitle: row.post_title, + postChunk: row.post_chunk, + textScore: Math.max(Number(row.content_sim) || 0, Number(row.title_sim) || 0), + })); +}; diff --git a/src/routes/ai.v2.routes.ts b/src/routes/ai.v2.routes.ts new file mode 100644 index 0000000..2558023 --- /dev/null +++ b/src/routes/ai.v2.routes.ts @@ -0,0 +1,14 @@ +import { Router } from 'express'; +import { askV2Handler } from '../controllers/ai.v2.controller'; +import { authMiddleware } from '../middlewares/auth.middleware'; + +const aiV2Router = Router(); + +aiV2Router.get('/health', (req, res) => { + res.status(200).json({ status: 'ok', v: 'v2' }); +}); + +aiV2Router.post('/ask', authMiddleware, askV2Handler); + +export default aiV2Router; + diff --git a/src/services/hybrid-search.service.ts b/src/services/hybrid-search.service.ts new file mode 100644 index 0000000..592e723 --- /dev/null +++ b/src/services/hybrid-search.service.ts @@ -0,0 +1,104 @@ +import { createEmbeddings } from './embedding.service'; +import * as postRepository from '../repositories/post.repository'; +import { SearchPlan } from '../types/ai.v2.types'; + +export type HybridSearchResult = { + postId: string; + postTitle: string; + postChunk: string; + similarityScore: number; +}[]; + +type Candidate = { + postId: string; + postTitle: string; + postChunk: string; + vecScore?: number; + textScore?: number; +}; + +export const runHybridSearch = async ( + question: string, + userId: string, + plan: SearchPlan +): Promise => { + const queries = [question, ...((plan.rewrites as string[]) || [])]; + const alpha = Math.max(0, Math.min(1, (plan.hybrid as any)?.alpha ?? 0.7)); + + const from = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.from : undefined; + const to = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.to : undefined; + const categoryId = (plan.filters as any)?.category_ids?.[0]; + + const embeddings = await createEmbeddings(queries); + + const byKey = new Map(); + + for (let i = 0; i < embeddings.length; i++) { + const emb = embeddings[i]; + const rows = await postRepository.findSimilarChunksV2({ + userId, + embedding: emb, + categoryId, + from, + to, + threshold: plan.threshold, + topK: plan.top_k, + weights: plan.weights, + sort: plan.sort, + }); + for (const r of rows) { + const key = `${r.postId}:${r.postChunk}`; + const prev = byKey.get(key); + const curVec = Number(r.similarityScore) || 0; + if (!prev) { + byKey.set(key, { postId: r.postId, postTitle: r.postTitle, postChunk: r.postChunk, vecScore: curVec }); + } else { + prev.vecScore = Math.max(prev.vecScore || 0, curVec); + } + } + } + + const textRows = await postRepository.textSearchChunksV2({ + userId, + query: question, + keywords: (plan.keywords as string[]) || [], + categoryId, + from, + to, + topK: plan.top_k, + sort: plan.sort, + }); + for (const r of textRows) { + const key = `${r.postId}:${r.postChunk}`; + const prev = byKey.get(key); + const curText = Number(r.textScore) || 0; + if (!prev) { + byKey.set(key, { postId: r.postId, postTitle: r.postTitle, postChunk: r.postChunk, textScore: curText }); + } else { + prev.textScore = Math.max(prev.textScore || 0, curText); + } + } + + const list = Array.from(byKey.values()); + const vecVals = list.map((c) => c.vecScore || 0); + const textVals = list.map((c) => c.textScore || 0); + const vMin = Math.min(...vecVals, 0); + const vMax = Math.max(...vecVals, 0); + const tMin = Math.min(...textVals, 0); + const tMax = Math.max(...textVals, 0); + + const norm = (v: number, lo: number, hi: number) => (hi > lo ? (v - lo) / (hi - lo) : 0); + + const fused = list + .map((c) => { + const v = norm(c.vecScore || 0, vMin, vMax); + const t = norm(c.textScore || 0, tMin, tMax); + const score = alpha * v + (1 - alpha) * t; + return { postId: c.postId, postTitle: c.postTitle, postChunk: c.postChunk, similarityScore: score }; + }) + .sort((a, b) => b.similarityScore - a.similarityScore) + .slice(0, Math.min(10, Math.max(1, plan.top_k || 5))); + + return fused; +}; + diff --git a/src/services/qa.service.ts b/src/services/qa.service.ts index 2361bbc..e0b308f 100644 --- a/src/services/qa.service.ts +++ b/src/services/qa.service.ts @@ -1,14 +1,10 @@ import { createEmbeddings } from './embedding.service'; import { PassThrough } from 'stream'; -import OpenAI from 'openai'; import config from '../config'; import * as postRepository from '../repositories/post.repository'; import * as personaRepository from '../repositories/persona.repository'; import * as qaPrompts from '../prompts/qa.prompts'; - -const openai = new OpenAI({ - apiKey: config.OPENAI_API_KEY, -}); +import { generate } from '../llm'; const preprocessContent = (content: string): string => { const plainText = content.replace(/<[^>]*>/g, ' ').replace(/\s+/g, ' ').trim(); @@ -27,20 +23,45 @@ const getSpeechTonePrompt = async (speechTone: number, userId: string): Promise< return "간결하고 명확한 말투로 답변해"; // Default } +type LlmOverride = { + provider?: 'openai' | 'gemini'; + model?: string; + options?: { temperature?: number; top_p?: number; max_output_tokens?: number }; +}; + export const answerStream = async ( question: string, userId: string, categoryId?: number, speechTone: number = -1, - postId?: number + postId?: number, + llm?: LlmOverride ): Promise => { const stream = new PassThrough(); - - let messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[] = []; - let tools: OpenAI.Chat.Completions.ChatCompletionTool[] | undefined = undefined; + try { + console.log( + JSON.stringify({ type: 'debug.qa.start', questionLen: question?.length || 0, userId, categoryId, postId, speechTone, llm }) + ); + } catch {} + + let messages: { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] = []; + let tools: + | { + type: 'function'; + function: { name: string; description?: string; parameters?: Record }; + }[] + | undefined = undefined; (async () => { const speechTonePrompt = await getSpeechTonePrompt(speechTone, userId); + const toSimpleMessages = ( + raw: any[] + ): { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] => { + return (raw || []).map((m: any) => ({ + role: m.role, + content: typeof m.content === 'string' ? m.content : JSON.stringify(m.content), + })); + }; if (postId) { const post = await postRepository.findPostById(postId); @@ -48,20 +69,31 @@ export const answerStream = async ( if (!post) { stream.write(`event: error\ndata: ${JSON.stringify({ code: 404, message: 'Post not found' })}\n\n`); stream.end(); + try { console.warn(JSON.stringify({ type: 'debug.qa.post', status: 'not_found', postId })); } catch {} return; } - - if (post.user_id !== userId) { - stream.write(`event: error\ndata: ${JSON.stringify({ code: 403, message: 'Forbidden' })}\n\n`); - stream.end(); - return; + + // Enforce conditional ownership: if post is not public, require owner + if (!post.is_public && post.user_id !== userId) { + stream.write(`event: error\n`); + stream.write(`data: ${JSON.stringify({ code: 403, message: 'Forbidden' })}\n\n`); + stream.end(); + try { console.warn(JSON.stringify({ type: 'debug.qa.post', status: 'forbidden', postId })); } catch {} + return; } const processedContent = preprocessContent(post.content); stream.write(`event: exist_in_post_status\ndata: true\n\n`); stream.write(`event: context\ndata: ${JSON.stringify([{ postId: post.id, postTitle: post.title }])}\n\n`); + try { + console.log( + JSON.stringify({ type: 'debug.qa.path', mode: 'post', postId: post.id, processedLen: processedContent.length }) + ); + } catch {} - messages = qaPrompts.createPostContextPrompt(post, processedContent, question, speechTonePrompt); + messages = toSimpleMessages( + qaPrompts.createPostContextPrompt(post, processedContent, question, speechTonePrompt) + ); } else { const [questionEmbedding] = await createEmbeddings([question]); @@ -72,8 +104,15 @@ export const answerStream = async ( const context = similarChunks.map(chunk => ({ postId: chunk.postId, postTitle: chunk.postTitle })); stream.write(`event: context\ndata: ${JSON.stringify(context)}\n\n`); - - messages = qaPrompts.createRagPrompt(question, similarChunks, speechTonePrompt); + try { + console.log( + JSON.stringify({ type: 'debug.qa.path', mode: 'rag', similarChunks: similarChunks.length, contextPreview: context.slice(0, 3) }) + ); + } catch {} + + messages = toSimpleMessages( + qaPrompts.createRagPrompt(question, similarChunks, speechTonePrompt) + ); tools = [ { type: "function", @@ -92,34 +131,47 @@ export const answerStream = async ( ]; } - const responseStream = await openai.chat.completions.create({ - model: config.CHAT_MODEL, + const llmStream = await generate({ + provider: llm?.provider || 'openai', + model: llm?.model || config.CHAT_MODEL, messages, tools, - tool_choice: tools ? 'auto' : undefined, - stream: true, + options: llm?.options, + meta: { userId, categoryId, postId }, + }); + try { + console.log( + JSON.stringify({ + type: 'debug.qa.call', + provider: llm?.provider || 'openai', + model: llm?.model || config.CHAT_MODEL, + messages: messages.length, + tools: (tools || []).length, + hasOptions: !!llm?.options, + }) + ); + } catch {} + + llmStream.on('data', (chunk) => { + try { + const str = Buffer.isBuffer(chunk) ? chunk.toString('utf8') : String(chunk); + console.log( + JSON.stringify({ + type: 'debug.qa.chunk', + at: Date.now(), + bytes: Buffer.byteLength(str, 'utf8'), + preview: str.slice(0, 40).replace(/\n/g, '\\n'), + }) + ); + } catch {} + stream.write(chunk); + }); + llmStream.on('end', () => { + stream.end(); + }); + llmStream.on('error', (e) => { + try { console.error(JSON.stringify({ type: 'debug.qa.llmError', message: (e as any)?.message || 'error' })); } catch {} }); - - for await (const chunk of responseStream) { - const content = chunk.choices[0]?.delta?.content || ""; - const toolCalls = chunk.choices[0]?.delta?.tool_calls; - - if (toolCalls) { - for (const toolCall of toolCalls) { - if (toolCall.function?.arguments) { - stream.write(`event: answer\ndata: ${JSON.stringify(toolCall.function.arguments)}\n\n`); - } - } - } else if (content) { - stream.write(`event: answer\ndata: ${JSON.stringify(content)}\n\n`); - } - - if (chunk.choices[0]?.finish_reason) { - stream.write(`event: end\ndata: [DONE]\n\n`); - stream.end(); - break; - } - } })().catch(err => { console.error('Stream process error:', err); diff --git a/src/services/qa.v2.service.ts b/src/services/qa.v2.service.ts new file mode 100644 index 0000000..8a8e651 --- /dev/null +++ b/src/services/qa.v2.service.ts @@ -0,0 +1,271 @@ +import { PassThrough } from 'stream'; +import { generate } from '../llm'; +import config from '../config'; +import * as qaPrompts from '../prompts/qa.prompts'; +import * as postRepository from '../repositories/post.repository'; +import * as personaRepository from '../repositories/persona.repository'; +import { generateSearchPlan } from './search-plan.service'; +import { runSemanticSearch } from './semantic-search.service'; +import { runHybridSearch } from './hybrid-search.service'; +import { createEmbeddings } from './embedding.service'; + +const preprocessContent = (content: string): string => { + const plainText = content.replace(/<[^>]*>/g, ' ').replace(/\s+/g, ' ').trim(); + return plainText.length > 40000 ? plainText.substring(0, 40000) : plainText; +}; + +const getSpeechTonePrompt = async (speechTone: number, userId: string): Promise => { + if (speechTone === -1) return '간결하고 명확한 말투로 답변해'; + if (speechTone === -2) + return '아래의 블로그 본문 컨텍스트를 참고하여 본문의 말투를 파악해 최대한 비슷한 말투로 답변해'; + + const persona = await personaRepository.findPersonaById(speechTone, userId); + if (persona) return `${persona.name}: ${persona.description}`; + return '간결하고 명확한 말투로 답변해'; +}; + +type LlmOverride = { + provider?: 'openai' | 'gemini'; + model?: string; + options?: { temperature?: number; top_p?: number; max_output_tokens?: number }; +}; + +export const answerStreamV2 = async ( + question: string, + userId: string, + categoryId?: number, + speechTone: number = -1, + postId?: number, + llm?: LlmOverride +): Promise => { + const stream = new PassThrough(); + + (async () => { + const speechTonePrompt = await getSpeechTonePrompt(speechTone, userId); + + let messages: { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] = []; + let tools: + | { + type: 'function'; + function: { name: string; description?: string; parameters?: Record }; + }[] + | undefined = undefined; + + const toSimpleMessages = ( + raw: any[] + ): { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] => { + return (raw || []).map((m: any) => ({ + role: m.role, + content: typeof m.content === 'string' ? m.content : JSON.stringify(m.content), + })); + }; + + if (postId) { + // Post-centric path (same as v1 with added v2 pre-events) + const post = await postRepository.findPostById(postId); + if (!post) { + stream.write(`event: error\n`); + stream.write(`data: ${JSON.stringify({ code: 404, message: 'Post not found' })}\n\n`); + stream.end(); + return; + } + if (!post.is_public && post.user_id !== userId) { + stream.write(`event: error\n`); + stream.write(`data: ${JSON.stringify({ code: 403, message: 'Forbidden' })}\n\n`); + stream.end(); + return; + } + // Emit plan event for transparency + stream.write(`event: search_plan\n`); + stream.write( + `data: ${JSON.stringify({ mode: 'post', filters: { post_id: postId, user_id: userId } })}\n\n` + ); + + const processed = preprocessContent(post.content); + const ctx = [{ postId: post.id, postTitle: post.title }]; + stream.write(`event: search_result\n`); + stream.write(`data: ${JSON.stringify(ctx)}\n\n`); + stream.write(`event: exist_in_post_status\n`); + stream.write(`data: true\n\n`); + stream.write(`event: context\n`); + stream.write(`data: ${JSON.stringify(ctx)}\n\n`); + + messages = toSimpleMessages( + qaPrompts.createPostContextPrompt(post, processed, question, speechTonePrompt) + ); + } else { + // Plan generation + const planPair = await generateSearchPlan(question, { user_id: userId, category_id: categoryId }); + if (!planPair) { + // Fallback to v1 RAG silently + const [questionEmbedding] = await createEmbeddings([question]); + const similarChunks = await postRepository.findSimilarChunks(userId, questionEmbedding, categoryId); + const context = similarChunks.map((c) => ({ postId: c.postId, postTitle: c.postTitle })); + stream.write(`event: search_plan\n`); + stream.write(`data: ${JSON.stringify({ mode: 'rag', fallback: true })}\n\n`); + stream.write(`event: search_result\n`); + stream.write(`data: ${JSON.stringify(context)}\n\n`); + stream.write(`event: exist_in_post_status\n`); + stream.write(`data: ${JSON.stringify(similarChunks.length > 0)}\n\n`); + stream.write(`event: context\n`); + stream.write(`data: ${JSON.stringify(context)}\n\n`); + + messages = toSimpleMessages( + qaPrompts.createRagPrompt(question, similarChunks, speechTonePrompt) + ); + if (similarChunks.length === 0) { + tools = undefined; + } else { + tools = [ + { + type: 'function', + function: { + name: 'report_content_insufficient', + description: '카테고리는 맞지만 본문 컨텍스트가 부족할 때 호출', + parameters: { + type: 'object', + properties: { + text: { + type: 'string', + description: + '답변 말투 및 규칙을 지켜 해당 내용이 아직 부족하다는 안내를 합니다. 그 후 본문 컨텍스트를 참고해 질문과 관련된 답변할 수 있는 내용을 언급하고 해당 내용에 대한 질문을 직접적으로 유도합니다.', + }, + }, + required: ['text'], + }, + }, + }, + ]; + } + } else { + const plan: any = planPair.normalized; + stream.write(`event: search_plan\n`); + stream.write(`data: ${JSON.stringify(plan)}\n\n`); + // Console debug for emitted search plan + try { + console.log( + JSON.stringify( + { + type: 'debug.sse.search_plan', + userId, + categoryId, + plan_summary: { + mode: plan.mode, + top_k: plan.top_k, + threshold: plan.threshold, + weights: plan.weights, + sort: plan.sort, + limit: plan.limit, + hybrid: plan.hybrid, + time: plan?.filters?.time || null, + rewrites_len: Array.isArray(plan.rewrites) ? plan.rewrites.length : 0, + keywords_len: Array.isArray(plan.keywords) ? plan.keywords.length : 0, + }, + }, + null, + 0, + ), + ); + } catch {} + + let rows: { postId: string; postTitle: string; postChunk: string; similarityScore: number }[] = []; + if (plan.hybrid?.enabled) { + if (Array.isArray(plan.rewrites) && plan.rewrites.length > 0) { + stream.write(`event: rewrite\n`); + stream.write(`data: ${JSON.stringify(plan.rewrites)}\n\n`); + } + if (Array.isArray(plan.keywords) && plan.keywords.length > 0) { + stream.write(`event: keywords\n`); + stream.write(`data: ${JSON.stringify(plan.keywords)}\n\n`); + } + rows = await runHybridSearch( + question, + userId, + plan + ); + const hybridContext = rows.map((r) => ({ postId: r.postId, postTitle: r.postTitle })); + stream.write(`event: hybrid_result\n`); + stream.write(`data: ${JSON.stringify(hybridContext)}\n\n`); + + if (!rows.length) { + rows = await runSemanticSearch(question, userId, plan); + } + } else { + rows = await runSemanticSearch(question, userId, plan); + } + + const context = rows.map((r) => ({ postId: r.postId, postTitle: r.postTitle })); + stream.write(`event: search_result\n`); + stream.write(`data: ${JSON.stringify(context)}\n\n`); + stream.write(`event: exist_in_post_status\n`); + stream.write(`data: ${JSON.stringify(rows.length > 0)}\n\n`); + stream.write(`event: context\n`); + stream.write(`data: ${JSON.stringify(context)}\n\n`); + + messages = toSimpleMessages( + qaPrompts.createRagPrompt( + question, + rows.map((r) => ({ + postId: r.postId, + postTitle: r.postTitle, + postChunk: r.postChunk, + similarityScore: r.similarityScore, + })) as any, + speechTonePrompt + ) + ); + // If no context was found, avoid tool-calls to force direct natural-language guidance + if (rows.length === 0) { + tools = undefined; + } else { + tools = [ + { + type: 'function', + function: { + name: 'report_content_insufficient', + description: '카테고리는 맞지만 본문 컨텍스트가 부족할 때 호출', + parameters: { + type: 'object', + properties: { + text: { + type: 'string', + description: + '답변 말투 및 규칙을 지켜 해당 내용이 아직 부족하다는 안내를 합니다. 그 후 본문 컨텍스트를 참고해 질문과 관련된 답변할 수 있는 내용을 언급하고 해당 내용에 대한 질문을 직접적으로 유도합니다.', + }, + }, + required: ['text'], + }, + }, + }, + ]; + } + } + } + + const llmStream = await generate({ + provider: llm?.provider || 'openai', + model: llm?.model || config.CHAT_MODEL, + messages, + tools, + options: llm?.options, + meta: { userId, categoryId, postId }, + }); + + llmStream.on('data', (chunk) => stream.write(chunk)); + llmStream.on('end', () => stream.end()); + llmStream.on('error', () => { + stream.write(`event: error\n`); + stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`); + stream.end(); + }); + })().catch((err) => { + try { + console.error('v2 Stream process error:', err); + } catch {} + stream.write(`event: error\n`); + stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`); + stream.end(); + }); + + return stream; +}; diff --git a/src/services/search-plan.service.ts b/src/services/search-plan.service.ts new file mode 100644 index 0000000..507a1c4 --- /dev/null +++ b/src/services/search-plan.service.ts @@ -0,0 +1,376 @@ +import OpenAI from 'openai'; +import config from '../config'; +import { buildSearchPlanPrompt, getSearchPlanSchemaJson } from '../prompts/qa.v2.prompts'; +import { planSchema, type SearchPlan } from '../types/ai.v2.types'; +import { nowUtc, toAbsoluteRangeKst } from '../utils/time'; + +const openai = new OpenAI({ apiKey: config.OPENAI_API_KEY }); + +export type PlanContext = { + user_id: string; + category_id?: number; + post_id?: number; + timezone?: string; // default Asia/Seoul +}; + +export const generateSearchPlan = async ( + question: string, + ctx: PlanContext +): Promise<{ plan: SearchPlan; normalized: SearchPlan } | null> => { + const timezone = ctx.timezone || 'Asia/Seoul'; + const now = nowUtc(); + const nowUtcIso = now.toISOString(); + const nowKstIso = new Date(now.getTime() + 9 * 3600 * 1000).toISOString(); + + const prompt = buildSearchPlanPrompt({ + now_utc: nowUtcIso, + now_kst: nowKstIso, + timezone, + user_id: ctx.user_id, + category_id: ctx.category_id, + post_id: ctx.post_id, + question, + }); + + + try { + // Debug prompt before request + try { + console.log( + JSON.stringify( + { + type: 'debug.plan.prompt', + model: 'gpt-5-mini', + prompt_len: prompt.length, + head: prompt.slice(0, 600), + tail: prompt.slice(Math.max(0, prompt.length - 600)), + }, + null, + 0, + ), + ); + } catch {} + + const response: any = await (openai as any).responses.create({ + model: 'gpt-5-mini', + input: prompt, + text: { format: { type: 'json_schema', name: 'SearchPlan', schema: getSearchPlanSchemaJson()} }, + reasoning: { effort: 'minimal' }, + max_output_tokens: 1500, + }); + + // Debug peek: log response shapes before JSON extraction + try { + const outputs = (response as any)?.output || []; + const outputSummary = outputs.map((o: any) => ({ + role: o?.role, + content: (o?.content || []).map((c: any) => ({ + type: c?.type, + hasText: typeof c?.text === 'string', + textLen: typeof c?.text === 'string' ? (c.text as string).length : undefined, + hasJson: !!c?.json, + })), + })); + const outputText = (response as any)?.output_text; + console.log( + JSON.stringify( + { type: 'debug.plan.response.peek', has_output_text: !!outputText, output_text_len: typeof outputText === 'string' ? outputText.length : undefined, output_summary: outputSummary }, + null, + 0, + ), + ); + } catch {} + + // Extract structured JSON if available, otherwise parse text + let parsed: any = null; + try { + const outputs = (response as any)?.output || []; + for (const o of outputs) { + for (const c of (o?.content || [])) { + if (c && (c.type === 'json' || c.type === 'output_json') && c.json) { + parsed = c.json; + break; + } + } + if (parsed) break; + } + } catch {} + // Also check output_text for JSON string if using Responses API response_format + if (!parsed && typeof (response as any)?.output_text === 'string') { + const s = ((response as any).output_text as string).trim(); + if (s.startsWith('{')) { + try { parsed = JSON.parse(s); } catch {} + } + } + if (!parsed) { + const texts: string[] = []; + const ot = (response as any)?.output_text; + if (typeof ot === 'string') texts.push(ot); + try { + const outputs = (response as any)?.output || []; + for (const o of outputs) { + for (const c of (o?.content || [])) { + const t = typeof c?.text === 'string' ? c.text : undefined; + if (t) texts.push(t); + } + } + } catch {} + const raw = texts.join('').trim(); + // Debug: log raw text before JSON.parse + try { + console.log( + JSON.stringify( + { type: 'debug.plan.raw_text', len: raw.length, head: raw.slice(0, 200) }, + null, + 0, + ), + ); + } catch {} + if (!raw) { + // Graceful fallback: unable to parse structured output + return null; + } + + // Robust extraction of first balanced JSON object + const tryParse = (s: string): any | null => { + try { return JSON.parse(s); } catch { return null; } + }; + let candidate = tryParse(raw); + if (!candidate) { + const start = raw.indexOf('{'); + if (start >= 0) { + let depth = 0; + let inStr = false; + let esc = false; + for (let i = start; i < raw.length; i++) { + const ch = raw[i]; + if (inStr) { + if (esc) esc = false; + else if (ch === '\\') esc = true; + else if (ch === '"') inStr = false; + } else { + if (ch === '"') inStr = true; + else if (ch === '{') depth++; + else if (ch === '}') { + depth--; + if (depth === 0) { + const sub = raw.slice(start, i + 1); + candidate = tryParse(sub); + if (candidate) break; + } + } + } + } + if (!candidate) { + const last = raw.lastIndexOf('}'); + if (last > start) candidate = tryParse(raw.slice(start, last + 1)); + } + } + } + if (!candidate) { + try { + console.warn(JSON.stringify({ type: 'debug.plan.parse_fail', note: 'could not extract JSON from raw' })); + } catch {} + return null; + } + parsed = candidate; + } + + // If still no parsed plan at this point, try a fallback call without text.format + if (!parsed) { + try { + const response2: any = await (openai as any).responses.create({ + model: config.CHAT_MODEL || 'gpt-5-mini', + input: prompt, + max_output_tokens: 700, + }); + // Debug peek for fallback + try { + const outputs = (response2 as any)?.output || []; + const outputSummary = outputs.map((o: any) => ({ + role: o?.role, + content: (o?.content || []).map((c: any) => ({ type: c?.type, hasText: typeof c?.text === 'string', textLen: typeof c?.text === 'string' ? (c.text as string).length : undefined })) + })); + const outputText = (response2 as any)?.output_text; + console.log(JSON.stringify({ type: 'debug.plan.fallback.peek', has_output_text: !!outputText, output_text_len: typeof outputText === 'string' ? outputText.length : undefined, output_summary: outputSummary })); + } catch {} + + // Parse fallback response + let parsed2: any = null; + try { + const outputs = (response2 as any)?.output || []; + for (const o of outputs) { + for (const c of (o?.content || [])) { + const t = typeof c?.text === 'string' ? c.text : undefined; + if (t) { + const s = t.trim(); + if (s.startsWith('{')) { try { parsed2 = JSON.parse(s); } catch {} } + if (parsed2) break; + } + } + if (parsed2) break; + } + } catch {} + if (!parsed2) { + const ot = (response2 as any)?.output_text; + if (typeof ot === 'string') { + const s = ot.trim(); + try { parsed2 = JSON.parse(s); } catch {} + } + } + if (!parsed2) { + console.warn(JSON.stringify({ type: 'debug.plan.fallback.parse_fail' })); + // proceed to chat completions fallback + } + if (parsed2) parsed = parsed2; + } catch { + // continue to chat completions fallback + } + } + + // Final fallback: Chat Completions with JSON object mode + if (!parsed) { + try { + const sys = 'You output ONLY a single JSON object matching the SearchPlan shape. No extra text.'; + const userMsg = prompt; + const cc: any = await (openai as any).chat.completions.create({ + model: config.CHAT_MODEL || 'gpt-5-mini', + messages: [ + { role: 'system', content: sys }, + { role: 'user', content: userMsg }, + ], + response_format: { type: 'json_object' }, + max_tokens: 700, + }); + // Debug + try { + console.log(JSON.stringify({ type: 'debug.plan.cc.peek', choices: (cc as any)?.choices?.length || 0 })); + } catch {} + const content = (cc as any)?.choices?.[0]?.message?.content || ''; + if (typeof content === 'string' && content.trim().startsWith('{')) { + parsed = JSON.parse(content); + } + } catch (e) { + try { console.warn(JSON.stringify({ type: 'debug.plan.cc.error', message: (e as any)?.message || 'error' })); } catch {} + return null; + } + } + const plan = planSchema.parse(parsed); + + // Normalize weights sum to 1 + const sum = (plan.weights?.chunk ?? 0) + (plan.weights?.title ?? 0); + const weights = sum > 0 ? { chunk: plan.weights.chunk / sum, title: plan.weights.title / sum } : { chunk: 0.7, title: 0.3 }; + + // Normalize time range to absolute if provided + let normPlan: SearchPlan = { ...plan, weights }; + + const clamp = (n: number, lo: number, hi: number) => Math.min(hi, Math.max(lo, n)); + + const stopwords = new Set([ + '글', + '포스트', + '블로그', + '소개', + '정리', + '내용', + '최신', + '최근', + '정보', + ]); + + const cleanList = (arr: string[] | undefined, max: number) => { + const uniq = new Set(); + for (const s of arr || []) { + const t = String(s || '').trim(); + if (!t) continue; + if (t.length < 2) continue; + if (stopwords.has(t)) continue; + const key = t.toLowerCase(); + if (uniq.has(key)) continue; + uniq.add(key); + } + return Array.from(uniq).slice(0, max); + }; + if ((plan as any)?.filters?.time) { + const abs = toAbsoluteRangeKst(plan.filters?.time as any, now); + if (abs) { + normPlan = { + ...normPlan, + filters: { ...normPlan.filters, time: { type: 'absolute', from: abs.from, to: abs.to } as any }, + }; + } else { + // drop invalid time + const { time, ...rest } = normPlan.filters || ({} as any); + normPlan = { ...normPlan, filters: rest as any }; + } + } + + // Enforce bounds just in case + normPlan.top_k = Math.min(10, Math.max(1, normPlan.top_k || 5)); + normPlan.limit = Math.min(20, Math.max(1, normPlan.limit || 5)); + normPlan.threshold = Math.min(1, Math.max(0, normPlan.threshold ?? 0.2)); + const maxRewrites = clamp(plan.hybrid?.max_rewrites ?? 3, 0, 4); + const maxKeywords = clamp(plan.hybrid?.max_keywords ?? 6, 0, 8); + + // Map retrieval_bias -> alpha (fallback to provided alpha or default) + const bias = (plan.hybrid as any)?.retrieval_bias || 'balanced'; + const biasAlpha = bias === 'lexical' ? 0.3 : bias === 'semantic' ? 0.75 : 0.5; + const alpha = clamp(((plan.hybrid as any)?.alpha ?? biasAlpha) as number, 0, 1); + + normPlan.hybrid = { + enabled: !!plan.hybrid?.enabled, + retrieval_bias: bias, + alpha, + max_rewrites: maxRewrites, + max_keywords: maxKeywords, + } as any; + normPlan.rewrites = cleanList(plan.rewrites, maxRewrites) as any; + normPlan.keywords = cleanList(plan.keywords, maxKeywords) as any; + if (!normPlan.mode) normPlan.mode = (ctx.post_id ? 'post' : 'rag') as any; + + // Note: Only filters.time is kept here to satisfy the SearchPlan schema. + // user_id/category_ids/post_id will be injected later by the query layer. + + // Console debug: final parsed + normalized plan + try { + const timeInfo = (normPlan as any)?.filters?.time; + console.log( + JSON.stringify( + { + type: 'debug.plan.final', + ctx: { user_id: ctx.user_id, category_id: ctx.category_id, post_id: ctx.post_id }, + summary: { + mode: normPlan.mode, + top_k: normPlan.top_k, + threshold: normPlan.threshold, + weights: normPlan.weights, + sort: normPlan.sort, + limit: normPlan.limit, + hybrid: { + enabled: !!normPlan.hybrid?.enabled, + retrieval_bias: normPlan.hybrid?.retrieval_bias, + alpha: normPlan.hybrid?.alpha, + max_rewrites: normPlan.hybrid?.max_rewrites, + max_keywords: normPlan.hybrid?.max_keywords, + }, + time: timeInfo ? { type: timeInfo.type, from: timeInfo.from, to: timeInfo.to } : null, + rewrites_len: (normPlan.rewrites || []).length, + keywords_len: (normPlan.keywords || []).length, + }, + plan, + normalized: normPlan, + }, + null, + 0, + ), + ); + } catch {} + + return { plan, normalized: normPlan }; + } catch (e) { + try { + console.error(JSON.stringify({ type: 'debug.plan.error', message: (e as any)?.message || 'error' })); + } catch {} + return null; + } +}; diff --git a/src/services/semantic-search.service.ts b/src/services/semantic-search.service.ts new file mode 100644 index 0000000..05561ac --- /dev/null +++ b/src/services/semantic-search.service.ts @@ -0,0 +1,36 @@ +import { createEmbeddings } from './embedding.service'; +import * as postRepository from '../repositories/post.repository'; +import { SearchPlan } from '../types/ai.v2.types'; + +export type SemanticSearchResult = { + postId: string; + postTitle: string; + postChunk: string; + similarityScore: number; +}[]; + +export const runSemanticSearch = async ( + question: string, + userId: string, + plan: SearchPlan +): Promise => { + const [embedding] = await createEmbeddings([question]); + + const from = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.from : undefined; + const to = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.to : undefined; + + const rows = await postRepository.findSimilarChunksV2({ + userId, + embedding, + categoryId: (plan.filters as any)?.category_ids?.[0], + from, + to, + threshold: plan.threshold, + topK: plan.top_k, + weights: plan.weights, + sort: plan.sort, + }); + + return rows; +}; + diff --git a/src/types/ai.types.ts b/src/types/ai.types.ts index e596d1a..b9ed539 100644 --- a/src/types/ai.types.ts +++ b/src/types/ai.types.ts @@ -28,6 +28,19 @@ export const askSchema = z.object({ category_id: z.number().optional(), post_id: z.number().optional(), speech_tone: z.number().optional(), + llm: z + .object({ + provider: z.enum(['openai', 'gemini']).optional(), + model: z.string().optional(), + options: z + .object({ + temperature: z.number().optional(), + top_p: z.number().optional(), + max_output_tokens: z.number().optional(), + }) + .optional(), + }) + .optional(), }), }); diff --git a/src/types/ai.v2.types.ts b/src/types/ai.v2.types.ts new file mode 100644 index 0000000..6e3353b --- /dev/null +++ b/src/types/ai.v2.types.ts @@ -0,0 +1,106 @@ +import { z } from 'zod'; + +// ===== Plan JSON Schema (Zod) ===== + +// Time filter schema: support multiple shapes to reduce LLM fragility +export const timeFilterSchema = z.discriminatedUnion('type', [ + // Absolute ISO range + z + .object({ type: z.literal('absolute'), from: z.string(), to: z.string() }) + .strict(), + // Relative window: N units up to today (KST) + z + .object({ + type: z.literal('relative'), + unit: z.enum(['day', 'week', 'month', 'year']), + value: z.number().int().min(1).max(365), + }) + .strict(), + // Month of a year (default year=now) + z + .object({ type: z.literal('month'), year: z.number().int().optional(), month: z.number().int().min(1).max(12) }) + .strict(), + // Quarter of a year (default year=now) + z + .object({ type: z.literal('quarter'), year: z.number().int().optional(), quarter: z.number().int().min(1).max(4) }) + .strict(), + // Single year + z.object({ type: z.literal('year'), year: z.number().int() }).strict(), + // Named presets (limited set) + z + .object({ + type: z.literal('named'), + preset: z.enum([ + 'all_time', + 'all', + 'today', + 'yesterday', + 'last_7_days', + 'last_14_days', + 'last_30_days', + 'this_month', + 'last_month', + ]), + }) + .strict(), + // Free-form label, e.g., "2006_to_now", "2024-Q3", "2019-2022", "2024-09" + z.object({ type: z.literal('label'), label: z.string().min(1) }).strict(), +]); + +export const planSchema = z.object({ + mode: z.enum(['rag', 'post']).default('rag'), + top_k: z.number().int().min(1).max(10).default(5), + threshold: z.number().min(0).max(1).default(0.2), + weights: z + .object({ chunk: z.number().min(0).max(1), title: z.number().min(0).max(1) }) + .default({ chunk: 0.7, title: 0.3 }), + rewrites: z.array(z.string()).default([]), + keywords: z.array(z.string()).default([]), + hybrid: z + .object({ + enabled: z.boolean().default(false), + // LLM outputs retrieval_bias label; server maps to alpha + retrieval_bias: z.enum(['lexical', 'balanced', 'semantic']).default('balanced'), + alpha: z.number().min(0).max(1).optional(), + max_rewrites: z.number().int().min(0).max(4).default(3), + max_keywords: z.number().int().min(0).max(8).default(6), + }) + .default({ enabled: false, retrieval_bias: 'balanced', max_rewrites: 3, max_keywords: 6 }), + filters: z + .object({ + time: timeFilterSchema.optional(), + }) + .strict() + .optional(), + sort: z.enum(['created_at_desc', 'created_at_asc']).default('created_at_desc'), + limit: z.number().int().min(1).max(20).default(5), +}); + +export type SearchPlan = z.infer; + +// ===== API: /ai/v2/ask ===== + +export const askV2Schema = z.object({ + body: z.object({ + question: z.string(), + user_id: z.string(), + category_id: z.number().optional(), + post_id: z.number().optional(), + speech_tone: z.number().optional(), + llm: z + .object({ + provider: z.enum(['openai', 'gemini']).optional(), + model: z.string().optional(), + options: z + .object({ + temperature: z.number().optional(), + top_p: z.number().optional(), + max_output_tokens: z.number().optional(), + }) + .optional(), + }) + .optional(), + }), +}); + +export type AskV2Request = z.infer['body']; diff --git a/src/utils/cost.ts b/src/utils/cost.ts new file mode 100644 index 0000000..62334db --- /dev/null +++ b/src/utils/cost.ts @@ -0,0 +1,40 @@ +export type Pricing = { + input_per_1k: number; + output_per_1k: number; + cached_input_per_1k?: number; + currency: 'USD' | 'KRW' | string; +}; + +const PRICING_TABLE: Record = { + // OpenAI + 'gpt-5-mini': { input_per_1k: 0.00025, output_per_1k: 0.002, cached_input_per_1k: 0.000025, currency: 'USD' }, + 'gpt-5-nano': { input_per_1k: 0.00005, output_per_1k: 0.0004, cached_input_per_1k: 0.000005, currency: 'USD' }, + 'gpt-4o': { input_per_1k: 0.005, output_per_1k: 0.015, currency: 'USD' }, + 'gpt-4o-mini': { input_per_1k: 0.0005, output_per_1k: 0.0015, currency: 'USD' }, + // Embeddings + 'text-embedding-3-small': { input_per_1k: 0.00002, output_per_1k: 0, currency: 'USD' }, + // Gemini (example values — update per official pricing if needed) + 'gemini-2.5-flash': { input_per_1k: 0.0001, output_per_1k: 0.0004, currency: 'USD' }, +}; + +export const getModelPricing = (model: string): Pricing | null => { + if (!model) return null; + const key = model.toLowerCase(); + if (PRICING_TABLE[key]) return PRICING_TABLE[key]; + // naive aliasing for common variants + if (key.startsWith('gpt-4o')) return PRICING_TABLE['gpt-4o']; + if (key.startsWith('gpt-5-mini')) return PRICING_TABLE['gpt-5-mini']; + return null; +}; + +export const calcCost = (tokens: number, per_1k: number): number => { + if (!per_1k) return 0; + return (tokens / 1000) * per_1k; +}; + +export const formatCost = (amount: number, currency: string, round: number = 4): string => { + const factor = Math.pow(10, Math.max(0, round)); + const rounded = Math.round(amount * factor) / factor; + return `${rounded} ${currency}`; +}; + diff --git a/src/utils/time.ts b/src/utils/time.ts new file mode 100644 index 0000000..b950068 --- /dev/null +++ b/src/utils/time.ts @@ -0,0 +1,211 @@ +// Minimal KST time utilities and range normalization + +const KST_OFFSET_MINUTES = 9 * 60; // UTC+9 + +const toDate = (isoOrDate: string | Date): Date => (isoOrDate instanceof Date ? isoOrDate : new Date(isoOrDate)); + +export const nowUtc = (): Date => new Date(); + +export const toKst = (d: Date): Date => { + // Convert UTC date to KST by adding offset + return new Date(d.getTime() + KST_OFFSET_MINUTES * 60 * 1000); +}; + +export const fromKstToUtc = (d: Date): Date => { + return new Date(d.getTime() - KST_OFFSET_MINUTES * 60 * 1000); +}; + +const startOfDay = (d: Date): Date => new Date(d.getFullYear(), d.getMonth(), d.getDate(), 0, 0, 0, 0); +const endOfDay = (d: Date): Date => new Date(d.getFullYear(), d.getMonth(), d.getDate(), 23, 59, 59, 999); + +export const startOfMonth = (year: number, monthIndex0: number): Date => new Date(year, monthIndex0, 1, 0, 0, 0, 0); +export const endOfMonth = (year: number, monthIndex0: number): Date => new Date(year, monthIndex0 + 1, 0, 23, 59, 59, 999); + +export const startOfQuarter = (year: number, quarter: number): Date => { + const m0 = (quarter - 1) * 3; // 0-based month index + return new Date(year, m0, 1, 0, 0, 0, 0); +}; +export const endOfQuarter = (year: number, quarter: number): Date => { + const m0 = quarter * 3 - 1; // end month index + return new Date(year, m0 + 1, 0, 23, 59, 59, 999); +}; + +export type AbsoluteRange = { from: string; to: string }; + +export const toAbsoluteRangeKst = (input: { type: string; [k: string]: any }, base: Date = nowUtc()): AbsoluteRange | null => { + try { + const baseKst = toKst(base); + const year = baseKst.getFullYear(); + // Named presets + if (input.type === 'named') { + const p = String(input.preset || '').toLowerCase(); + const endK = endOfDay(baseKst); + const beginOfTodayK = startOfDay(baseKst); + const endUtc = fromKstToUtc(endK).toISOString(); + const todayStartUtc = fromKstToUtc(beginOfTodayK).toISOString(); + if (p === 'all' || p === 'all_time') return null; // no time filter + if (p === 'today') return { from: todayStartUtc, to: endUtc }; + if (p === 'yesterday') { + const yK = new Date(beginOfTodayK.getTime()); + yK.setDate(yK.getDate() - 1); + return { from: fromKstToUtc(startOfDay(yK)).toISOString(), to: fromKstToUtc(endOfDay(yK)).toISOString() }; + } + const daysBack = (n: number) => { + const toK = endK; + const fromK = new Date(toK.getTime()); + fromK.setDate(fromK.getDate() - (n - 1)); + return { from: fromKstToUtc(startOfDay(fromK)).toISOString(), to: fromKstToUtc(toK).toISOString() }; + }; + if (p === 'last_7_days') return daysBack(7); + if (p === 'last_14_days') return daysBack(14); + if (p === 'last_30_days') return daysBack(30); + if (p === 'this_month') { + const fromK = startOfMonth(year, baseKst.getMonth()); + const toK = endOfMonth(year, baseKst.getMonth()); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + if (p === 'last_month') { + const m = baseKst.getMonth(); + const yAdj = m === 0 ? year - 1 : year; + const mAdj = m === 0 ? 11 : m - 1; + const fromK = startOfMonth(yAdj, mAdj); + const toK = endOfMonth(yAdj, mAdj); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + return null; // unknown named: drop filter + } + if (input.type === 'relative') { + const unit = String(input.unit); + const value = Math.max(1, parseInt(String(input.value || '1'), 10)); + const toK = endOfDay(baseKst); + const fromK = new Date(toK.getTime()); + if (unit === 'day') fromK.setDate(fromK.getDate() - value + 1); + else if (unit === 'week') fromK.setDate(fromK.getDate() - value * 7 + 1); + else if (unit === 'month') fromK.setMonth(fromK.getMonth() - value); + else if (unit === 'year') fromK.setFullYear(fromK.getFullYear() - value); + const fromUtc = fromKstToUtc(startOfDay(fromK)); + const toUtc = fromKstToUtc(toK); + return { from: fromUtc.toISOString(), to: toUtc.toISOString() }; + } + if (input.type === 'absolute') { + const from = new Date(input.from); + const to = new Date(input.to); + return { from: from.toISOString(), to: to.toISOString() }; + } + if (input.type === 'month') { + const m = Math.max(1, Math.min(12, parseInt(String(input.month), 10))); + const y = input.year ? parseInt(String(input.year), 10) : year; + const fromK = startOfMonth(y, m - 1); + const toK = endOfMonth(y, m - 1); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + if (input.type === 'year') { + const y = parseInt(String(input.year), 10); + const fromK = startOfMonth(y, 0); + const toK = endOfMonth(y, 11); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + if (input.type === 'quarter') { + const q = Math.max(1, Math.min(4, parseInt(String(input.quarter), 10))); + const y = input.year ? parseInt(String(input.year), 10) : year; + const fromK = startOfQuarter(y, q); + const toK = endOfQuarter(y, q); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + if (input.type === 'label') { + const raw = String(input.label || '').trim(); + if (!raw) return null; + const s = raw.replace(/\s+/g, '').toLowerCase(); + const endK = endOfDay(baseKst); + // Support common named tokens expressed as labels + const startTodayK = startOfDay(baseKst); + const toUtcStr = fromKstToUtc(endK).toISOString(); + const fromTodayUtcStr = fromKstToUtc(startTodayK).toISOString(); + const daysBack = (n: number) => { + const toK = endK; + const fromK = new Date(toK.getTime()); + fromK.setDate(fromK.getDate() - (n - 1)); + return { from: fromKstToUtc(startOfDay(fromK)).toISOString(), to: fromKstToUtc(toK).toISOString() }; + }; + if (s === 'all' || s === 'all_time') return null; // drop filter + if (s === 'today') return { from: fromTodayUtcStr, to: toUtcStr }; + if (s === 'yesterday') { + const yK = new Date(startTodayK.getTime()); + yK.setDate(yK.getDate() - 1); + return { from: fromKstToUtc(startOfDay(yK)).toISOString(), to: fromKstToUtc(endOfDay(yK)).toISOString() }; + } + if (s === 'last_7_days') return daysBack(7); + if (s === 'last_14_days') return daysBack(14); + if (s === 'last_30_days') return daysBack(30); + if (s === 'this_month') { + const fromK = startOfMonth(year, baseKst.getMonth()); + const toK = endOfMonth(year, baseKst.getMonth()); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + if (s === 'last_month') { + const m = baseKst.getMonth(); + const yAdj = m === 0 ? year - 1 : year; + const mAdj = m === 0 ? 11 : m - 1; + const fromK = startOfMonth(yAdj, mAdj); + const toK = endOfMonth(yAdj, mAdj); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + // 1) YYYY_to_now / YYYY-to-now + let m = s.match(/^(\d{4})(?:_|-|to)+now$/); + if (m) { + const y = parseInt(m[1], 10); + const fromK = startOfMonth(y, 0); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(endK).toISOString() }; + } + // 2) YYYY-YYYY / YYYY..YYYY / YYYY_to_YYYY + m = s.match(/^(\d{4})(?:\.|_|-|to){1,2}(\d{4})$/); + if (m) { + const y1 = parseInt(m[1], 10); + const y2 = parseInt(m[2], 10); + const a = Math.min(y1, y2); + const b = Math.max(y1, y2); + const fromK = startOfMonth(a, 0); + const toK = endOfMonth(b, 11); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + // 3) YYYY-Qn or Qn-YYYY + m = s.match(/^(\d{4})(?:-|_)q([1-4])$/); + if (m) { + const y = parseInt(m[1], 10); + const q = parseInt(m[2], 10); + const fromK = startOfQuarter(y, q); + const toK = endOfQuarter(y, q); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + m = s.match(/^q([1-4])(?:-|_)?(\d{4})$/); + if (m) { + const q = parseInt(m[1], 10); + const y = parseInt(m[2], 10); + const fromK = startOfQuarter(y, q); + const toK = endOfQuarter(y, q); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + // 4) YYYY-MM + m = s.match(/^(\d{4})(?:-|_)?(\d{1,2})$/); + if (m) { + const y = parseInt(m[1], 10); + const month = Math.max(1, Math.min(12, parseInt(m[2], 10))); + const fromK = startOfMonth(y, month - 1); + const toK = endOfMonth(y, month - 1); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + // 5) YYYY + m = s.match(/^(\d{4})$/); + if (m) { + const y = parseInt(m[1], 10); + const fromK = startOfMonth(y, 0); + const toK = endOfMonth(y, 11); + return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() }; + } + return null; // unrecognized label + } + } catch { + // ignore + } + return null; +}; diff --git a/src/utils/tokenizer.ts b/src/utils/tokenizer.ts new file mode 100644 index 0000000..bc662f4 --- /dev/null +++ b/src/utils/tokenizer.ts @@ -0,0 +1,32 @@ +import { get_encoding, type TiktokenEncoding } from '@dqbd/tiktoken'; + +const encodingForModel = (model?: string): TiktokenEncoding => { + const lower = (model ?? '').toLowerCase(); + if (lower.includes('gpt-5') || lower.includes('gpt-4o') || lower.includes('o1') || lower.includes('o3')) { + return 'o200k_base' as TiktokenEncoding; + } + return 'cl100k_base' as TiktokenEncoding; +}; + +export const countTextTokens = (text: string, model: string): number => { + const encKey = encodingForModel(model); + const enc = get_encoding(encKey); + try { + const tokens = enc.encode(text || ''); + return tokens.length; + } finally { + // no explicit free in @dqbd/tiktoken browser build; safe to let GC handle + } +}; + +type SimpleMessage = { role: string; content: string }; + +export const countChatMessagesTokens = (messages: SimpleMessage[], model: string): number => { + // Approximate: sum content token counts + minimal role overhead + const overheadPerMsg = 3; // rough + const roleOverhead = 1; + return messages.reduce((sum, m) => { + return sum + countTextTokens(m.content || '', model) + overheadPerMsg + roleOverhead; + }, 0); +}; +