diff --git a/TASK.md b/TASK.md
index 803edc7..4104ee2 100644
--- a/TASK.md
+++ b/TASK.md
@@ -1,23 +1,71 @@
-## 리팩토링 계획
-
-### Phase 1: Repository 계층 도입 (데이터 로직 분리)
-
-1.  **[Repo] `src/repositories` 디렉토리 생성**: 데이터베이스 쿼리 로직을 모아둘 디렉토리를 생성합니다.
-2.  **[Repo] `post.repository.ts` 생성 및 이전**:
-    *   `blog_post`, `post_chunks`, `post_title_embeddings` 테이블 관련 쿼리를 이 파일로 옮깁니다.
-    *   `qa.service.ts`의 `findPostById`, `findSimilarChunks` 로직을 이전합니다.
-    *   `embedding.service.ts`의 `storeTitleEmbedding`, `storeContentEmbeddings` 로직을 이전합니다.
-3.  **[Repo] `persona.repository.ts` 생성 및 이전**:
-    *   `persona` 테이블 관련 쿼리를 이 파일로 옮깁니다.
-    *   `qa.service.ts`의 `getSpeechTonePrompt` 내부 DB 조회 로직을 `findPersonaById`와 같은 함수로 분리하여 이전합니다.
-4.  **[Service] 서비스 계층 수정**:
-    *   `qa.service.ts`와 `embedding.service.ts`가 DB에 직접 접근하는 대신, 새로 만든 Repository의 함수를 호출하도록 코드를 수정합니다.
-
-### Phase 2: 프롬프트 관리 분리
-
-5.  **[Prompt] `src/prompts` 디렉토리 생성**: 프롬프트 템플릿을 관리할 디렉토리를 생성합니다.
-6.  **[Prompt] `qa.prompts.ts` 파일 생성**:
-    *   `qa.service.ts`에 하드코딩된 시스템 프롬프트와 사용자 메시지 생성 로직을 이 파일로 옮깁니다.
-    *   `createRagPrompt`, `createPostContextPrompt`와 같이 동적으로 프롬프트를 생성하는 함수를 만듭니다.
-7.  **[Service] `qa.service.ts` 수정**:
-    *   `qa.prompts.ts`에서 프롬프트 생성 함수를 가져와(import) 사용하도록 수정합니다.
\ No newline at end of file
+## 하이브리드 서치 확장(질문 재작성 + 키워드)
+
+목표
+- 리콜 향상: 질문을 LLM이 재작성(rewrites)하고, 핵심 키워드를 생성하여 벡터+텍스트 양쪽에서 검색.
+- 정밀/비용 제어: 재작성/키워드 개수를 제한하고, 융합 가중치로 결과를 안정적으로 선별.
+
+Plan JSON 확장(초안)
+```json
+{
+  "rewrites": ["..."],
+  "keywords": ["..."],
+  "hybrid": { "enabled": true, "alpha": 0.7, "max_rewrites": 4, "max_keywords": 8 }
+}
+```
+- 기존 필드(`top_k`, `threshold`, `weights`, `filters.time`, `sort`, `limit`)와 공존.
+- 서버에서 상한 강제(rewrites<=4, keywords<=8), 품질이 낮거나 중복인 항목 제거.
+
+프롬프트 가이드(확장)
+- 역할: ‘검색 계획 + 재작성/키워드 생성’
+- 출력: 기존 계획 JSON + `rewrites[]`, `keywords[]`, `hybrid{}`
+- 규칙:
+  - 불용/범용 단어(예: "글", "포스트", "블로그") 지양
+  - 과도한 시기/주제 확장 금지(사용자 블로그 컨텍스트 벗어나지 않기)
+  - 중복/동의어 반복 최소화(문장 유사도 과다 시 제거)
+
+하이브리드 검색 파이프라인
+1) 멀티 벡터 검색: 원 질문 + rewrites 각각 임베딩 → pgvector Top-K 검색(합집합)
+2) 텍스트 검색: keywords로 `post_chunks.content`/`blog_post.title` 텍스트 매칭 → Top-K 추출
+3) 랭킹 융합: 점수 정규화 후 `score = α·vec + (1-α)·text` 또는 RRF로 병합 → 상위 N 청크 선택
+4) 컨텍스트 구성: v1과 동일 프롬프트로 최종 LLM 호출
+
+저장소/인덱스(제안)
+- PostgreSQL 확장: `pg_trgm`(간단/범용) 또는 `tsvector`(정교)
+- 1차안(pg_trgm)
+  - DDL: 
+    - `CREATE EXTENSION IF NOT EXISTS pg_trgm;`
+    - `CREATE INDEX IF NOT EXISTS idx_pc_content_trgm ON post_chunks USING gin (content gin_trgm_ops);`
+    - `CREATE INDEX IF NOT EXISTS idx_bp_title_trgm ON blog_post USING gin (title gin_trgm_ops);`
+  - 질의: `similarity(content, $q) > $min OR content ILIKE ANY($patterns)` 등으로 text_score 계산
+
+SSE 확장(선택)
+- `event: rewrite` / `data: ["..."]`
+- `event: keywords` / `data: ["..."]`
+- `event: hybrid_result` / `data: [{ postId, postTitle }]`
+
+보안/안전
+- 재작성/키워드 출력 스키마 강제(JSON), 길이/개수 상한
+- 금칙어/민감어 필터(선택), 카테고리/기간 필터는 서버가 최종 결정
+
+세부 구현 계획(하이브리드 서치)
+1) 스키마/프롬프트
+  - [ ] `src/types/ai.v2.types.ts`: Plan 스키마에 `rewrites[]`, `keywords[]`, `hybrid{enabled,alpha,max_rewrites,max_keywords}` 추가
+  - [ ] `src/prompts/qa.v2.prompts.ts`: 프롬프트/JSON Schema에 확장 필드 반영 + few-shot 보강
+2) 플래너 서비스
+  - [ ] `search-plan.service.ts`: 확장 필드 파싱/검증, 중복/불용어 필터링, 상한 강제
+3) 텍스트 검색 저장소
+  - [ ] `post.repository.ts`: `textSearchChunksV2({ userId, query, keywords[], from?, to?, topK })`
+  - [ ] (옵션) DDL 문서화: pg_trgm 인덱스 생성 스크립트 추가
+4) 하이브리드 서비스
+  - [ ] `hybrid-search.service.ts` 구현: 멀티 임베딩, 텍스트 검색, 정규화, α 융합/RRF, 상위 N 반환
+5) 오케스트레이션/이벤트
+  - [ ] `qa.v2.service.ts`: plan.hybrid.enabled 시 하이브리드 경로 분기, (선택) `rewrite`/`keywords`/`hybrid_result` 송신
+6) 테스트/튜닝
+  - [ ] 통합 테스트: 재작성/키워드 포함 질의에서 리콜↑ 확인
+  - [ ] 가중치 α, Top-K 상수, 불용어/중복 필터 기준 튜닝
+  - [ ] 0건/오류 폴백(v1 RAG) 검증
+
+권장 단계적 도입
+- Phase 1: 스키마/프롬프트/하이브리드 파이프라인 구현(기본 α=0.7, rewrites<=3, keywords<=6)
+- Phase 2: SSE 관측성 이벤트 추가, 품질 튜닝
+- Phase 3: 인덱스/성능 최적화, 불용어 사전/NER 보정
diff --git a/docs/api.md b/docs/api.md
new file mode 100644
index 0000000..2e55bef
--- /dev/null
+++ b/docs/api.md
@@ -0,0 +1,171 @@
+# Bubblog AI API 문서 (v1 ~ v2)
+
+본 문서는 `/ai` (v1)와 `/ai/v2` (v2) 엔드포인트를 정리합니다. 서버는 Express 기반이며, `POST /ask` 류는 Server‑Sent Events(SSE)로 답변을 스트리밍합니다.
+
+## 기본 정보
+- Base Path
+  - v1: `/ai`
+  - v2: `/ai/v2`
+- 인증
+  - `POST /ask` 엔드포인트는 `Authorization: Bearer <JWT>` 필요
+  - 임베딩 생성 엔드포인트는 인증 없이 사용 가능
+- 본문 형식: `application/json`
+- SSE 수신: `Content-Type: text/event-stream`
+  - 이벤트명은 `event:` 라인으로, 데이터는 `data:` 라인으로 전송됩니다.
+  - 일반 텍스트 콘텐츠는 `event: answer`로 분할 전송되며, 종료 시 `event: end` + `data: [DONE]`가 송신됩니다.
+
+## v1 엔드포인트 (`/ai`)
+
+### GET `/ai/health`
+- 인증: 불필요
+- 응답(200): `{ "status": "ok" }`
+
+### POST `/ai/embeddings/title`
+- 인증: 불필요
+- 요청 Body
+  - `post_id`(number, required)
+  - `title`(string, required)
+- 동작: 제목 임베딩 생성 후 저장
+- 응답(200): `{ "ok": true }`
+
+### POST `/ai/embeddings/content`
+- 인증: 불필요
+- 요청 Body
+  - `post_id`(number, required)
+  - `content`(string, required)
+- 동작: 본문을 약 512 토큰 단위로 중첩(50) 청킹 → 임베딩 생성/저장
+- 응답(200): `{ "post_id": number, "chunk_count": number, "success": true }`
+
+### POST `/ai/ask` (SSE)
+- 인증: 필요 (`Authorization: Bearer <JWT>`)
+- 요청 Body
+  - `question`(string, required)
+  - `user_id`(string, required)
+  - `category_id`(number, optional)
+  - `post_id`(number, optional) — 지정 시 해당 글 컨텍스트에 국한하여 답변
+  - `speech_tone`(number, optional)
+    - `-1`: 간결하고 명확한 말투(기본)
+    - `-2`: 해당 글의 말투를 최대한 모사
+    - 양의 정수: 페르소나 ID(해당 유저의 등록된 페르소나 참조)
+  - `llm`(object, optional)
+    - `provider`: `openai` | `gemini`
+    - `model`: string (미지정 시 서버 기본값 사용)
+    - `options`: `{ temperature?, top_p?, max_output_tokens? }`
+- SSE 이벤트(주요)
+  - `exist_in_post_status`: `true|false` — 관련 컨텍스트 존재 여부
+  - `context`: `[ { postId, postTitle }, ... ]` — 검색/선택된 컨텍스트 요약
+  - `answer`: 모델의 부분 응답 텍스트(여러 번 전송)
+  - `end`: 종료 시 `data: [DONE]`
+  - `error`: `{ code?, message }` — 예: `post_id`가 없거나 권한 없음(403), 없음(404)
+- 예시(curl)
+  ```bash
+  curl -N \
+    -H "Authorization: Bearer <JWT>" \
+    -H "Content-Type: application/json" \
+    -X POST http://localhost:3000/ai/ask \
+    -d '{
+      "question": "카테고리 A 관련 요약 해줘",
+      "user_id": "u_123",
+      "category_id": 1,
+      "speech_tone": -1
+    }'
+  ```
+
+## v2 엔드포인트 (`/ai/v2`)
+
+### GET `/ai/v2/health`
+- 인증: 불필요
+- 응답(200): `{ "status": "ok", "v": "v2" }`
+
+### POST `/ai/v2/ask` (SSE)
+- 인증: 필요 (`Authorization: Bearer <JWT>`)
+- 요청 Body
+  - `question`(string, required)
+  - `user_id`(string, required)
+  - `category_id`(number, optional)
+  - `post_id`(number, optional)
+  - `speech_tone`(number, optional)
+    - `-1`: 기본 말투(간결/명확)
+    - `-2`: 해당 글(post 모드) 말투 모사
+    - 양수: 페르소나 ID(해당 유저의 등록 페르소나)
+  - `llm`(object, optional)
+    - `provider`: `openai` | `gemini`
+    - `model`: string (미지정 시 서버 기본값 사용)
+    - `options`: `{ temperature?: number, top_p?: number, max_output_tokens?: number }`
+- 동작 개요
+  - 서버가 질문을 토대로 “검색 계획(JSON)”을 생성·검증·정규화한 뒤, 계획에 따라 시맨틱 또는 하이브리드 검색을 수행하고 결과를 SSE로 스트리밍합니다.
+  - `post_id`가 있으면 post 모드(단일 글 컨텍스트)로 처리하며, 간략한 `search_plan`/`search_result` 이벤트 후 본문 기반 답변을 스트리밍합니다.
+- 하이브리드 검색(벡터+텍스트)
+  - 계획에 `hybrid.enabled: true`인 경우 활성화됩니다.
+  - `rewrites`(재작성 질의)와 `keywords`(핵심 키워드)를 생성하여 벡터/텍스트 두 경로로 후보를 수집하고, `hybrid.retrieval_bias` 라벨을 서버가 `alpha` 값으로 매핑해 점수를 융합하여 상위 `top_k`를 선택합니다.
+    - 매핑(기본): `lexical → 0.3`, `balanced → 0.5`, `semantic → 0.75`
+    - 결합식: `score = alpha*vec + (1-alpha)*text` (각 경로 점수 min-max 정규화 후)
+  - SSE로 `rewrite`, `keywords`, `hybrid_result` 이벤트가 필요한 경우에만 송신됩니다. 하이브리드 결과가 없으면 시맨틱 검색으로 폴백합니다.
+- SSE 이벤트 순서(일반적인 흐름)
+  1) `search_plan`: 정규화된 검색 계획(JSON)
+     - 예시 데이터(정규화):
+       ```json
+       {
+         "mode": "rag",
+         "top_k": 5,
+         "threshold": 0.2,
+         "weights": { "chunk": 0.7, "title": 0.3 },
+         "filters": {
+           "time": { "type": "absolute", "from": "2025-09-01T00:00:00.000Z", "to": "2025-09-30T23:59:59.999Z" }
+         },
+         "sort": "created_at_desc",
+         "limit": 5,
+         "hybrid": { "enabled": true, "retrieval_bias": "balanced", "alpha": 0.5, "max_rewrites": 3, "max_keywords": 6 },
+         "rewrites": ["프로젝트 X 요약", "프로젝트 X 핵심"],
+         "keywords": ["프로젝트 X", "핵심", "요약"]
+       }
+       ```
+       - 비고:
+         - `filters.time`만 포함됩니다. `user_id`/`category_id`/`post_id` 등은 서버가 검색 시 내부적으로 적용합니다.
+         - `hybrid.retrieval_bias`는 LLM 라벨이며 서버가 `alpha`로 변환해 사용합니다.
+       - post 모드에서는 간략한 형태 예: `{ "mode": "post", "filters": { "post_id": 123, "user_id": "u_123" } }`.
+  2) (하이브리드 사용 시) `rewrite`: `string[]`
+  3) (하이브리드 사용 시) `keywords`: `string[]`
+  4) (하이브리드 사용 시) `hybrid_result`: `[ { postId, postTitle }, ... ]`
+  5) `search_result`: `[ { postId, postTitle }, ... ]` — 최종 컨텍스트 요약(하이브리드 또는 시맨틱)
+  6) `exist_in_post_status`: `true|false`
+  7) `context`: `[ { postId, postTitle }, ... ]`
+  8) `answer` — 모델 부분 응답(여러 번)
+  9) `end` — `data: [DONE]`
+  - 오류 시 `error`: `{ code?: number, message: string }`
+- 폴백 동작
+  - 플래너 실패 시 `search_plan`으로 `{ "mode": "rag", "fallback": true }`가 송신되며, v1 스타일 RAG로 컨텍스트를 구성합니다.
+
+- 예시(curl)
+  ```bash
+  curl -N \
+    -H "Authorization: Bearer <JWT>" \
+    -H "Content-Type: application/json" \
+    -X POST http://localhost:3000/ai/v2/ask \
+    -d '{
+      "question": "최근 한 달 블로그에서 프로젝트 X 관련 내용 요약",
+      "user_id": "u_123",
+      "category_id": 3,
+      "llm": { "provider": "openai", "model": "gpt-5-mini", "options": { "temperature": 0.2, "top_p": 0.9, "max_output_tokens": 800 } }
+    }'
+  ```
+
+## 참고 사항
+- `post_id`가 지정된 요청에서 해당 글이 존재하지 않으면 SSE로 `error` 이벤트(404)가 송신되고 스트림이 종료됩니다.
+- `post.is_public`이 `false`인 글은 요청 `user_id`가 글 소유자와 다르면 `error` 이벤트(403)로 차단됩니다. `post.is_public`이 `true`면 누구나 접근 가능합니다.
+- v1/v2 모두 모델 응답 텍스트는 `answer` 이벤트로 분할 전송됩니다. 클라이언트는 누적하여 최종 답변을 구성해야 합니다.
+- EventSource(브라우저) 사용 예시
+  ```js
+  const es = new EventSource('/ai/v2/ask', { withCredentials: true }); // 헤더 인증이 필요한 경우 fetch/XHR 권장
+  es.addEventListener('search_plan', (e) => console.log('plan', e.data));
+  es.addEventListener('search_result', (e) => console.log('result', e.data));
+  es.addEventListener('context', (e) => console.log('ctx', e.data));
+  es.addEventListener('answer', (e) => renderAppend(JSON.parse(e.data)));
+  es.addEventListener('end', () => es.close());
+  es.addEventListener('error', (e) => es.close());
+  ```
+
+## 요약
+- v1 `/ai/ask`: 컨텍스트 존재 여부와 요약(`exist_in_post_status`, `context`) 후 답변 스트리밍
+- v2 `/ai/v2/ask`: 위 흐름에 더해 검색 계획(`search_plan`)과 검색 결과 요약(`search_result`)을 추가로 제공
+- 임베딩 API(v1): 게시물 제목/본문 임베딩 생성 및 저장
diff --git a/docs/migrations/2025-01-pgtrgm.sql b/docs/migrations/2025-01-pgtrgm.sql
new file mode 100644
index 0000000..168a891
--- /dev/null
+++ b/docs/migrations/2025-01-pgtrgm.sql
@@ -0,0 +1,9 @@
+-- Enable pg_trgm and add GIN indexes for text search on content and title
+CREATE EXTENSION IF NOT EXISTS pg_trgm;
+
+CREATE INDEX IF NOT EXISTS idx_pc_content_trgm
+  ON post_chunks USING gin (content gin_trgm_ops);
+
+CREATE INDEX IF NOT EXISTS idx_bp_title_trgm
+  ON blog_post USING gin (title gin_trgm_ops);
+
diff --git a/docs/migrations/README.md b/docs/migrations/README.md
new file mode 100644
index 0000000..3f4de87
--- /dev/null
+++ b/docs/migrations/README.md
@@ -0,0 +1,27 @@
+# Migrations Guide
+
+This folder contains SQL scripts for optional indexes/extensions used by the AI services.
+
+## Apply pg_trgm for text search
+
+File: `2025-01-pgtrgm.sql`
+
+Purpose:
+- Enable `pg_trgm` extension and add GIN indexes on `post_chunks.content` and `blog_post.title` to accelerate partial/fuzzy text search used in hybrid search.
+
+Run (with `DATABASE_URL`):
+
+```bash
+psql "$DATABASE_URL" -f docs/migrations/2025-01-pgtrgm.sql
+```
+
+Or inside `psql`:
+
+```sql
+\i docs/migrations/2025-01-pgtrgm.sql
+```
+
+Notes:
+- Indexes increase disk usage and write overhead; create only on columns used for text search.
+- The extension must be installed once per database.
+
diff --git a/docs/reports/REPORT-askv2.md b/docs/reports/REPORT-askv2.md
new file mode 100644
index 0000000..c9d4229
--- /dev/null
+++ b/docs/reports/REPORT-askv2.md
@@ -0,0 +1,134 @@
+# 보고서: /ai/v2/ask — 구조, 동작, 하이브리드 검색 계획
+
+## 1) 개요
+- 목표: v1의 고정형 RAG 한계를 보완. “검색 계획(JSON)”을 LLM이 생성 → 서버가 안전하게 표준화/검증 → 시맨틱/하이브리드 검색 수행 → 최종 답변을 SSE로 스트리밍.
+- 상태: `POST /ai/v2/ask`(SSE) 운영 중. v1은 유지하며, v2에서 계획 기반 검색과 관측성을 강화.
+
+## 2) v1의 한계와 v2 도입 효과
+- 고정 파라미터 → 동적 계획
+  - v1: 임계치(0.2), LIMIT(5), 가중치(0.7/0.3) 등 고정.
+  - v2: `top_k`, `threshold`, `weights`, `sort`, `limit`을 질문/맥락에 맞게 동적 제어(서버가 범위 강제).
+- 시간/정렬 의도 미반영 → 자연어 시간 해석
+  - v1: “최근/지난주/9월/작년” 같은 시간 의도 반영 불가.
+  - v2: `filters.time`(상대/월/분기/연도)을 받아 `KST 절대범위(from/to)`로 변환해 쿼리에 반영.
+- 리콜/정밀도 균형 한계 → 하이브리드
+  - v1: 임베딩 기반 RAG만.
+  - v2: 벡터+텍스트 하이브리드 융합으로 재현율 향상 및 키워드 민감 질의 대응.
+- 관측성 부족 → SSE 메타 이벤트
+  - v1: 검색/선택 근거가 불투명.
+  - v2: `search_plan`, `rewrite`, `keywords`, `hybrid_result`, `search_result` 등으로 의사결정 가시화.
+- 안전성/정합성
+  - v1: 클라이언트 입력 이탈 감지 어려움.
+  - v2: JSON Schema(strict) + Zod 검증 + 필터 강제 주입으로 안전한 실행계획만 수행.
+- 비용/토큰 관리
+  - v1: 고정 상수로 세밀한 제어 어려움.
+  - v2: `top_k`/향후 `limit`/dedupe로 토큰 예산을 상황별 최적화 가능.
+
+![v1 구조](../structureDiagram/askV1-structure.png)
+
+## 3) 검색 계획(Planner)과 안전 표준화
+- 타입/스키마(`src/types/ai.v2.types.ts`)
+  - `mode`: `rag | post`
+  - `top_k`(1..10), `threshold`(0..1), `weights`(`chunk`,`title`; 합 1로 정규화)
+  - `filters`: `{ user_id, category_ids?, post_id?, time? }`
+  - `sort`: `created_at_desc | created_at_asc`, `limit`(1..20)
+  - `hybrid`: `{ enabled, retrieval_bias, alpha?, max_rewrites, max_keywords }`
+  - `rewrites`: string[], `keywords`: string[]
+- 프롬프트 규칙(`src/prompts/qa.v2.prompts.ts`)
+  - 서버 제공 필터(`user_id`, `category_id`, `post_id`)는 고정(FIXED)이며 변경 금지.
+  - 플래너는 `top_k`, `threshold`, `filters.time`, `sort`, `limit`, `hybrid`, `rewrites`, `keywords`만 결정.
+  - 하이브리드 시 `hybrid.retrieval_bias ∈ {lexical, balanced, semantic}` 라벨을 결정.
+- 생성/정규화(`src/services/search-plan.service.ts`)
+  - OpenAI Responses(JSON Schema strict) → Zod 파싱/검증.
+  - 값 범위 강제, `weights` 합 1로 정규화.
+  - 불용어/중복 제거로 `rewrites`/`keywords` 정제.
+  - 시간 필터를 `KST 절대범위`로 변환(`src/utils/time.ts`).
+  - `retrieval_bias → alpha` 매핑(서버): `lexical→0.3`, `balanced→0.5`, `semantic→0.75`(필요 시 서버가 `alpha`를 직접 클램프/주입).
+  - `user_id/category_id/post_id`는 서버가 최종 주입(정합성 보장). 실패 시 v1 RAG 폴백.
+
+## 4) 검색 엔진: 시맨틱 + 하이브리드
+- 시맨틱(`src/services/semantic-search.service.ts` → `findSimilarChunksV2`)
+  - 입력: 질문 임베딩, `threshold/top_k/weights/sort`, 카테고리/시간 필터.
+  - 점수: `w_chunk*(1 - chunk_dist) + w_title*(1 - title_dist)`.
+  - 저장소: `postRepository.findSimilarChunksV2(...)` 호출, 파라미터 바인딩 기반 안전 쿼리.
+- 하이브리드(`src/services/hybrid-search.service.ts`)
+  - 입력: 원 질문 + `rewrites`(재작성), `keywords`(키워드), `alpha`(서버 매핑).
+  - 벡터 경로: 각 query 임베딩 → 시맨틱 후보 수집 → 동일 청크는 최대 vecScore로 병합.
+  - 텍스트 경로: `textSearchChunksV2`(키워드/질의 기반 텍스트 검색) → 동일 청크는 최대 textScore로 병합.
+  - 정규화/융합: min-max 정규화 후 `score = alpha*vec + (1-alpha)*text`로 결합 → 상위 `top_k`를 반환.
+  - 폴백: 하이브리드 결과 비었을 때 시맨틱 경로로 재시도.
+  - SSE 메타: `rewrite`, `keywords`, `hybrid_result`로 중간 산출물/요약을 별도 송신.
+
+![v2 구조](../structureDiagram/askV2-structure.png)
+
+## 5) 파이프라인(SSE 이벤트)과 모드
+- 컨트롤/오케스트레이션
+  - `src/controllers/ai.v2.controller.ts`: SSE 헤더, 스트림 라우팅.
+  - `src/services/qa.v2.service.ts`: 계획 생성 → 검색 실행(하이브리드/시맨틱) → LLM 호출까지 오케스트레이션.
+- post 모드(`post_id` 존재)
+  - 접근 정책: `post.is_public=false`면 소유자만(403), 미존재 시 404.
+  - 이벤트: `search_plan`(간략) → `search_result` → `exist_in_post_status:true` → `context` → `answer`* → `end`.
+  - 컨텍스트: 해당 글 본문만 사용(전처리 후). 하이브리드 미사용.
+- rag/하이브리드 모드
+  - 이벤트 순서(일반):
+    1) `search_plan`: 정규화된 계획(JSON). `hybrid.enabled`가 true면 `retrieval_bias`와 서버 계산 `alpha`가 함께 포함.
+    2) `rewrite`(선택): 계획의 재작성 질의 목록.
+    3) `keywords`(선택): 계획의 핵심 키워드 목록.
+    4) `hybrid_result`(선택): 하이브리드 후보 요약.
+    5) `search_result`: 최종 컨텍스트 요약.
+    6) `exist_in_post_status`: `true | false`.
+    7) `context`: `[ { postId, postTitle }, ... ]`.
+    8) `answer`*: 모델 부분 응답.
+    9) `end`: 종료 시그널(`[DONE]`).
+  - 오류 시: `error` 송신.
+
+## 6) LLM/비용/프로바이더
+- 기본 모델: `openai/gpt-5-mini`(환경변수로 변경 가능), 임베딩: `text-embedding-3-small`.
+- OpenAI: Responses API 스트리밍 우선, 실패 시 논스트림/Chat Completions 폴백.
+- Gemini: 현재 논스트림 결과를 SSE 청크로 분할 전송.
+- 비용 로깅: 프롬프트/완료 토큰 및 추정 비용 로깅(`src/llm/*`).
+- 툴콜: 컨텍스트 부족 시 `report_content_insufficient` 툴을 통해 안내 문구 유도.
+
+## 7) 보안·정합성
+- 생성 계획 방어선: JSON Schema(strict) + Zod 검증 + 서버 측 범위 강제.
+- 필터 주입: `user_id/category_id/post_id`는 서버가 최종 주입하여 일탈 방지.
+- SQL 안전성: 파라미터 바인딩/화이트리스트 템플릿.
+- 데이터 최소화: SSE에는 ID/제목 위주 요약만 전송.
+- 접근 제어: post 모드에서 `is_public`/소유자 검사.
+
+## 8) 현재 동작과 한계(향후 과제)
+- 현재
+  - `top_k`로 리콜 폭 제어, 시맨틱/하이브리드 모두 적용.
+  - `limit`은 계획에는 존재하나 컨텍스트 축약에는 미적용(향후 포스트 단위 dedupe와 함께 적용 예정).
+  - 청크 단위 랭킹 → 동일 포스트 다수 노출 가능.
+  - 카테고리 배열은 첫 항목만 반영.
+- 향후
+  - 포스트 단위 dedupe + `limit` 적용으로 다양성/비용 균형.
+  - `retrieval_bias → alpha` 매핑의 AB 테스트/텔레메트리 튜닝(예: {0.25,0.5,0.8}).
+  - 점수 정규화 고도화(z-score/quantile) 및 RRF 옵션 도입 검토.
+  - 텍스트 경로 향상(전처리/순위 함수/키워드 확장) 및 프로바이더 다변화.
+  - SSE `search_sql`/`search_debug`로 투명성 강화.
+
+## 9) 파일 맵(핵심)
+- 라우트/컨트롤러: `src/routes/ai.v2.routes.ts`, `src/controllers/ai.v2.controller.ts`, `src/app.ts`
+- 플래너/타입/프롬프트: `src/services/search-plan.service.ts`, `src/types/ai.v2.types.ts`, `src/prompts/qa.v2.prompts.ts`
+- 시간 유틸: `src/utils/time.ts`
+- 검색: `src/services/semantic-search.service.ts`, `src/services/hybrid-search.service.ts`, `src/repositories/post.repository.ts`
+- 오케스트레이션: `src/services/qa.v2.service.ts`
+- LLM: `src/llm/*`
+
+## 10) 예시
+- 요청
+```json
+{
+  "question": "지난달 프로젝트 X 관련 핵심만 3개 보여줘",
+  "user_id": "u_123",
+  "category_id": 3,
+  "llm": { "provider": "openai", "options": { "temperature": 0.2 } }
+}
+```
+- 이벤트 예시
+  - `search_plan`(rag, time=지난달, top_k/threshold/weights/sort/limit/hybrid 포함; hybrid.retrieval_bias와 서버 주입 alpha 함께 표기)
+  - `rewrite`/`keywords`(선택)
+  - `hybrid_result`(선택)
+  - `search_result` → `exist_in_post_status` → `context` → `answer`* → `end`
diff --git a/docs/structureDiagram/askV1-structure.png b/docs/structureDiagram/askV1-structure.png
new file mode 100644
index 0000000..daad8e9
Binary files /dev/null and b/docs/structureDiagram/askV1-structure.png differ
diff --git a/docs/structureDiagram/askV2-structure.png b/docs/structureDiagram/askV2-structure.png
new file mode 100644
index 0000000..c621efc
Binary files /dev/null and b/docs/structureDiagram/askV2-structure.png differ
diff --git a/package-lock.json b/package-lock.json
index 65109c4..5814cd7 100644
--- a/package-lock.json
+++ b/package-lock.json
@@ -10,6 +10,7 @@
       "license": "ISC",
       "dependencies": {
         "@dqbd/tiktoken": "^1.0.13",
+        "@google/genai": "^0.2.0",
         "cors": "^2.8.5",
         "dotenv": "^16.4.5",
         "express": "^4.19.2",
@@ -47,6 +48,18 @@
       "resolved": "https://registry.npmjs.org/@dqbd/tiktoken/-/tiktoken-1.0.22.tgz",
       "integrity": "sha512-RYhO8xeHkMNX5Ixqf4M1Ve3siCYJY/dI0yLnlX4M4oIEDOvjMIQ+E+3OUpAaZcWTaMtQJzGcDAghYfllpx3i/w=="
     },
+    "node_modules/@google/genai": {
+      "version": "0.2.0",
+      "resolved": "https://registry.npmjs.org/@google/genai/-/genai-0.2.0.tgz",
+      "integrity": "sha512-r7EiRHSqc6D1lDIMvM4OemjUwPpUbYb9jTxe1eLCiFbooHrmPc6U9z3n56E/iWzigkZmjRh4IC0CMzoB1aql9w==",
+      "dependencies": {
+        "google-auth-library": "^9.14.2",
+        "ws": "^8.18.0"
+      },
+      "engines": {
+        "node": ">=18.0.0"
+      }
+    },
     "node_modules/@jridgewell/resolve-uri": {
       "version": "3.1.2",
       "resolved": "https://registry.npmjs.org/@jridgewell/resolve-uri/-/resolve-uri-3.1.2.tgz",
@@ -284,6 +297,14 @@
         "node": ">=0.4.0"
       }
     },
+    "node_modules/agent-base": {
+      "version": "7.1.4",
+      "resolved": "https://registry.npmjs.org/agent-base/-/agent-base-7.1.4.tgz",
+      "integrity": "sha512-MnA+YT8fwfJPgBx3m60MNqakm30XOkyIoH1y6huTQvC0PwZG7ki8NacLBcrPbNoo8vEZy7Jpuk7+jMO+CUovTQ==",
+      "engines": {
+        "node": ">= 14"
+      }
+    },
     "node_modules/agentkeepalive": {
       "version": "4.6.0",
       "resolved": "https://registry.npmjs.org/agentkeepalive/-/agentkeepalive-4.6.0.tgz",
@@ -330,6 +351,33 @@
       "integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==",
       "dev": true
     },
+    "node_modules/base64-js": {
+      "version": "1.5.1",
+      "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
+      "integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==",
+      "funding": [
+        {
+          "type": "github",
+          "url": "https://github.com/sponsors/feross"
+        },
+        {
+          "type": "patreon",
+          "url": "https://www.patreon.com/feross"
+        },
+        {
+          "type": "consulting",
+          "url": "https://feross.org/support"
+        }
+      ]
+    },
+    "node_modules/bignumber.js": {
+      "version": "9.3.1",
+      "resolved": "https://registry.npmjs.org/bignumber.js/-/bignumber.js-9.3.1.tgz",
+      "integrity": "sha512-Ko0uX15oIUS7wJ3Rb30Fs6SkVbLmPBAKdlm7q9+ak9bbIeFf0MwuBsQV6z7+X768/cHsfg+WlysDWJcmthjsjQ==",
+      "engines": {
+        "node": "*"
+      }
+    },
     "node_modules/binary-extensions": {
       "version": "2.3.0",
       "resolved": "https://registry.npmjs.org/binary-extensions/-/binary-extensions-2.3.0.tgz",
@@ -535,7 +583,6 @@
       "version": "4.4.1",
       "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.1.tgz",
       "integrity": "sha512-KcKCqiftBJcZr++7ykoDIEwSa3XWowTfNPo92BYxjXiyYEVrUQh2aLyhxBCwww+heortUFxEJYcRzosstTEBYQ==",
-      "dev": true,
       "dependencies": {
         "ms": "^2.1.3"
       },
@@ -747,6 +794,11 @@
       "resolved": "https://registry.npmjs.org/ms/-/ms-2.0.0.tgz",
       "integrity": "sha512-Tpp60P6IUJDTuOq/5Z8cdskzJujfwqfOTkrwIwj7IRISpnkJnT6SyJ4PCPnGMoFjC9ddhal5KVIYtAt97ix05A=="
     },
+    "node_modules/extend": {
+      "version": "3.0.2",
+      "resolved": "https://registry.npmjs.org/extend/-/extend-3.0.2.tgz",
+      "integrity": "sha512-fjquC59cD7CyW6urNXK0FBufkZcoiGG80wTuPujX590cB5Ttln20E2UB4S/WARVqhXffZl2LNgS+gQdPIIim/g=="
+    },
     "node_modules/fill-range": {
       "version": "7.1.1",
       "resolved": "https://registry.npmjs.org/fill-range/-/fill-range-7.1.1.tgz",
@@ -859,6 +911,34 @@
         "url": "https://github.com/sponsors/ljharb"
       }
     },
+    "node_modules/gaxios": {
+      "version": "6.7.1",
+      "resolved": "https://registry.npmjs.org/gaxios/-/gaxios-6.7.1.tgz",
+      "integrity": "sha512-LDODD4TMYx7XXdpwxAVRAIAuB0bzv0s+ywFonY46k126qzQHT9ygyoa9tncmOiQmmDrik65UYsEkv3lbfqQ3yQ==",
+      "dependencies": {
+        "extend": "^3.0.2",
+        "https-proxy-agent": "^7.0.1",
+        "is-stream": "^2.0.0",
+        "node-fetch": "^2.6.9",
+        "uuid": "^9.0.1"
+      },
+      "engines": {
+        "node": ">=14"
+      }
+    },
+    "node_modules/gcp-metadata": {
+      "version": "6.1.1",
+      "resolved": "https://registry.npmjs.org/gcp-metadata/-/gcp-metadata-6.1.1.tgz",
+      "integrity": "sha512-a4tiq7E0/5fTjxPAaH4jpjkSv/uCaU2p5KC6HVGrvl0cDjA8iBZv4vv1gyzlmK0ZUKqwpOyQMKzZQe3lTit77A==",
+      "dependencies": {
+        "gaxios": "^6.1.1",
+        "google-logging-utils": "^0.0.2",
+        "json-bigint": "^1.0.0"
+      },
+      "engines": {
+        "node": ">=14"
+      }
+    },
     "node_modules/get-intrinsic": {
       "version": "1.3.0",
       "resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz",
@@ -906,6 +986,49 @@
         "node": ">= 6"
       }
     },
+    "node_modules/google-auth-library": {
+      "version": "9.15.1",
+      "resolved": "https://registry.npmjs.org/google-auth-library/-/google-auth-library-9.15.1.tgz",
+      "integrity": "sha512-Jb6Z0+nvECVz+2lzSMt9u98UsoakXxA2HGHMCxh+so3n90XgYWkq5dur19JAJV7ONiJY22yBTyJB1TSkvPq9Ng==",
+      "dependencies": {
+        "base64-js": "^1.3.0",
+        "ecdsa-sig-formatter": "^1.0.11",
+        "gaxios": "^6.1.1",
+        "gcp-metadata": "^6.1.0",
+        "gtoken": "^7.0.0",
+        "jws": "^4.0.0"
+      },
+      "engines": {
+        "node": ">=14"
+      }
+    },
+    "node_modules/google-auth-library/node_modules/jwa": {
+      "version": "2.0.1",
+      "resolved": "https://registry.npmjs.org/jwa/-/jwa-2.0.1.tgz",
+      "integrity": "sha512-hRF04fqJIP8Abbkq5NKGN0Bbr3JxlQ+qhZufXVr0DvujKy93ZCbXZMHDL4EOtodSbCWxOqR8MS1tXA5hwqCXDg==",
+      "dependencies": {
+        "buffer-equal-constant-time": "^1.0.1",
+        "ecdsa-sig-formatter": "1.0.11",
+        "safe-buffer": "^5.0.1"
+      }
+    },
+    "node_modules/google-auth-library/node_modules/jws": {
+      "version": "4.0.0",
+      "resolved": "https://registry.npmjs.org/jws/-/jws-4.0.0.tgz",
+      "integrity": "sha512-KDncfTmOZoOMTFG4mBlG0qUIOlc03fmzH+ru6RgYVZhPkyiy/92Owlt/8UEN+a4TXR1FQetfIpJE8ApdvdVxTg==",
+      "dependencies": {
+        "jwa": "^2.0.0",
+        "safe-buffer": "^5.0.1"
+      }
+    },
+    "node_modules/google-logging-utils": {
+      "version": "0.0.2",
+      "resolved": "https://registry.npmjs.org/google-logging-utils/-/google-logging-utils-0.0.2.tgz",
+      "integrity": "sha512-NEgUnEcBiP5HrPzufUkBzJOD/Sxsco3rLNo1F1TNf7ieU8ryUzBhqba8r756CjLX7rn3fHl6iLEwPYuqpoKgQQ==",
+      "engines": {
+        "node": ">=14"
+      }
+    },
     "node_modules/gopd": {
       "version": "1.2.0",
       "resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz",
@@ -917,6 +1040,37 @@
         "url": "https://github.com/sponsors/ljharb"
       }
     },
+    "node_modules/gtoken": {
+      "version": "7.1.0",
+      "resolved": "https://registry.npmjs.org/gtoken/-/gtoken-7.1.0.tgz",
+      "integrity": "sha512-pCcEwRi+TKpMlxAQObHDQ56KawURgyAf6jtIY046fJ5tIv3zDe/LEIubckAO8fj6JnAxLdmWkUfNyulQ2iKdEw==",
+      "dependencies": {
+        "gaxios": "^6.0.0",
+        "jws": "^4.0.0"
+      },
+      "engines": {
+        "node": ">=14.0.0"
+      }
+    },
+    "node_modules/gtoken/node_modules/jwa": {
+      "version": "2.0.1",
+      "resolved": "https://registry.npmjs.org/jwa/-/jwa-2.0.1.tgz",
+      "integrity": "sha512-hRF04fqJIP8Abbkq5NKGN0Bbr3JxlQ+qhZufXVr0DvujKy93ZCbXZMHDL4EOtodSbCWxOqR8MS1tXA5hwqCXDg==",
+      "dependencies": {
+        "buffer-equal-constant-time": "^1.0.1",
+        "ecdsa-sig-formatter": "1.0.11",
+        "safe-buffer": "^5.0.1"
+      }
+    },
+    "node_modules/gtoken/node_modules/jws": {
+      "version": "4.0.0",
+      "resolved": "https://registry.npmjs.org/jws/-/jws-4.0.0.tgz",
+      "integrity": "sha512-KDncfTmOZoOMTFG4mBlG0qUIOlc03fmzH+ru6RgYVZhPkyiy/92Owlt/8UEN+a4TXR1FQetfIpJE8ApdvdVxTg==",
+      "dependencies": {
+        "jwa": "^2.0.0",
+        "safe-buffer": "^5.0.1"
+      }
+    },
     "node_modules/has-flag": {
       "version": "3.0.0",
       "resolved": "https://registry.npmjs.org/has-flag/-/has-flag-3.0.0.tgz",
@@ -977,6 +1131,18 @@
         "node": ">= 0.8"
       }
     },
+    "node_modules/https-proxy-agent": {
+      "version": "7.0.6",
+      "resolved": "https://registry.npmjs.org/https-proxy-agent/-/https-proxy-agent-7.0.6.tgz",
+      "integrity": "sha512-vK9P5/iUfdl95AI+JVyUuIcVtd4ofvtrOr3HNtM2yxC9bnMbEdp3x01OhQNnjb8IJYi38VlTE3mBXwcfvywuSw==",
+      "dependencies": {
+        "agent-base": "^7.1.2",
+        "debug": "4"
+      },
+      "engines": {
+        "node": ">= 14"
+      }
+    },
     "node_modules/humanize-ms": {
       "version": "1.2.1",
       "resolved": "https://registry.npmjs.org/humanize-ms/-/humanize-ms-1.2.1.tgz",
@@ -1057,6 +1223,25 @@
         "node": ">=0.12.0"
       }
     },
+    "node_modules/is-stream": {
+      "version": "2.0.1",
+      "resolved": "https://registry.npmjs.org/is-stream/-/is-stream-2.0.1.tgz",
+      "integrity": "sha512-hFoiJiTl63nn+kstHGBtewWSKnQLpyb155KHheA1l39uvtO9nWIop1p3udqPcUd/xbF1VLMO4n7OI6p7RbngDg==",
+      "engines": {
+        "node": ">=8"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/sindresorhus"
+      }
+    },
+    "node_modules/json-bigint": {
+      "version": "1.0.0",
+      "resolved": "https://registry.npmjs.org/json-bigint/-/json-bigint-1.0.0.tgz",
+      "integrity": "sha512-SiPv/8VpZuWbvLSMtTDU8hEfrZWg/mH/nV/b4o0CYbSxu1UIQPLdwKOCIyLQX+VIPO5vrLX3i8qtqFyhdPSUSQ==",
+      "dependencies": {
+        "bignumber.js": "^9.0.0"
+      }
+    },
     "node_modules/jsonwebtoken": {
       "version": "9.0.2",
       "resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.2.tgz",
@@ -1922,6 +2107,18 @@
         "node": ">= 0.4.0"
       }
     },
+    "node_modules/uuid": {
+      "version": "9.0.1",
+      "resolved": "https://registry.npmjs.org/uuid/-/uuid-9.0.1.tgz",
+      "integrity": "sha512-b+1eJOlsR9K8HJpow9Ok3fiWOWSIcIzXodvv0rQjVoOVNpWMpxf1wZNpt4y9h10odCNrqnYp1OBzRktckBe3sA==",
+      "funding": [
+        "https://github.com/sponsors/broofa",
+        "https://github.com/sponsors/ctavan"
+      ],
+      "bin": {
+        "uuid": "dist/bin/uuid"
+      }
+    },
     "node_modules/v8-compile-cache-lib": {
       "version": "3.0.1",
       "resolved": "https://registry.npmjs.org/v8-compile-cache-lib/-/v8-compile-cache-lib-3.0.1.tgz",
@@ -1958,6 +2155,26 @@
         "webidl-conversions": "^3.0.0"
       }
     },
+    "node_modules/ws": {
+      "version": "8.18.3",
+      "resolved": "https://registry.npmjs.org/ws/-/ws-8.18.3.tgz",
+      "integrity": "sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg==",
+      "engines": {
+        "node": ">=10.0.0"
+      },
+      "peerDependencies": {
+        "bufferutil": "^4.0.1",
+        "utf-8-validate": ">=5.0.2"
+      },
+      "peerDependenciesMeta": {
+        "bufferutil": {
+          "optional": true
+        },
+        "utf-8-validate": {
+          "optional": true
+        }
+      }
+    },
     "node_modules/xtend": {
       "version": "4.0.2",
       "resolved": "https://registry.npmjs.org/xtend/-/xtend-4.0.2.tgz",
diff --git a/package.json b/package.json
index 712d202..f4594d4 100644
--- a/package.json
+++ b/package.json
@@ -7,7 +7,8 @@
     "dev": "nodemon --watch 'src/**/*.ts' --exec 'ts-node' src/server.ts",
     "build": "tsc",
     "start": "node dist/server.js",
-    "test": "echo \"Error: no test specified\" && exit 1"
+    "test": "echo \"Error: no test specified\" && exit 1",
+    "db:migrate:pgtrgm": "psql \"$DATABASE_URL\" -f docs/migrations/2025-01-pgtrgm.sql"
   },
   "keywords": [],
   "author": "",
@@ -24,6 +25,7 @@
   },
   "dependencies": {
     "@dqbd/tiktoken": "^1.0.13",
+    "@google/genai": "^0.2.0",
     "cors": "^2.8.5",
     "dotenv": "^16.4.5",
     "express": "^4.19.2",
diff --git a/readme.md b/readme.md
index a656468..fa301f2 100644
--- a/readme.md
+++ b/readme.md
@@ -87,3 +87,45 @@ CREATE INDEX idx_post_title_embeddings_embedding
 ## 4. AI 서버 작동
 
 ### `.env` 파일 작성 (루트 디렉토리 `BUBBLOG_AI`에 위치)
+
+---
+
+## 5. 텍스트 검색 인덱스 (pg_trgm)
+
+하이브리드 검색의 키워드/부분일치 성능 향상을 위해 `pg_trgm` 확장 및 GIN 인덱스를 추가합니다.
+
+### 5.1 확장 및 인덱스 생성 스크립트
+
+프로젝트에 제공된 스크립트를 사용하세요:
+
+`docs/migrations/2025-01-pgtrgm.sql`
+
+내용:
+
+```sql
+CREATE EXTENSION IF NOT EXISTS pg_trgm;
+CREATE INDEX IF NOT EXISTS idx_pc_content_trgm ON post_chunks USING gin (content gin_trgm_ops);
+CREATE INDEX IF NOT EXISTS idx_bp_title_trgm   ON blog_post   USING gin (title   gin_trgm_ops);
+```
+
+### 5.2 적용 방법
+
+- 환경변수 `DATABASE_URL`이 설정된 경우:
+
+```bash
+psql "$DATABASE_URL" -f docs/migrations/2025-01-pgtrgm.sql
+```
+
+- 또는 npm 스크립트 사용:
+
+```bash
+npm run db:migrate:pgtrgm
+```
+
+- 또는 수동 실행(PostgreSQL 쉘 접속 후):
+
+```sql
+\i docs/migrations/2025-01-pgtrgm.sql
+```
+
+주의: 인덱스는 쓰기 비용과 디스크 사용량을 증가시킵니다. 텍스트 검색에 사용하는 컬럼(`post_chunks.content`, `blog_post.title`)에만 생성하세요.
diff --git a/src/app.ts b/src/app.ts
index 6968488..dfd79c4 100644
--- a/src/app.ts
+++ b/src/app.ts
@@ -1,6 +1,7 @@
 import express, { Express, Request, Response, NextFunction } from 'express';
 import cors from 'cors';
 import aiRouter from './routes/ai.routes';
+import aiV2Router from './routes/ai.v2.routes';
 
 const app: Express = express();
 
@@ -21,6 +22,7 @@ app.get('/', (request: Request, response: Response) => {
 });
 
 app.use('/ai', aiRouter);
+app.use('/ai/v2', aiV2Router);
 
 // Central Error Handler
 app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
diff --git a/src/config.ts b/src/config.ts
index 18993cc..5bd402a 100644
--- a/src/config.ts
+++ b/src/config.ts
@@ -12,7 +12,13 @@ const configSchema = z.object({
   TOKEN_AUDIENCE: z.string().default('bubblog'),
   ALGORITHM: z.string().default('HS256'),
   EMBED_MODEL: z.string().default('text-embedding-3-small'),
-  CHAT_MODEL: z.string().default('gpt-4o'),
+  // 기본 LLM 모델: GPT-5 계열
+  CHAT_MODEL: z.string().default('gpt-5-mini'),
+  GEMINI_API_KEY: z.string().optional(),
+  GEMINI_CHAT_MODEL: z.string().default('gemini-2.5-flash'),
+  GEMINI_THINKING_BUDGET: z.string().optional(),
+  LLM_COST_LOG: z.string().default('false'),
+  LLM_COST_ROUND: z.coerce.number().default(4),
 });
 
 const config = configSchema.parse(process.env);
diff --git a/src/controllers/ai.controller.ts b/src/controllers/ai.controller.ts
index e6ab029..d276f23 100644
--- a/src/controllers/ai.controller.ts
+++ b/src/controllers/ai.controller.ts
@@ -49,14 +49,48 @@ export const askHandler = async (
   next: NextFunction
 ) => {
   try {
-    const { question, user_id, category_id, speech_tone, post_id } = req.body;
+    const { question, user_id, category_id, speech_tone, post_id, llm } = req.body as any;
 
-    res.setHeader('Content-Type', 'text/event-stream');
-    res.setHeader('Cache-Control', 'no-cache');
+    // SSE headers and anti-buffering hints
+    res.setHeader('Content-Type', 'text/event-stream; charset=utf-8');
+    res.setHeader('Cache-Control', 'no-cache, no-transform');
     res.setHeader('Connection', 'keep-alive');
+    // Nginx buffering off
+    res.setHeader('X-Accel-Buffering', 'no');
+    // Flush headers early so clients start processing immediately
+    (res as any).flushHeaders?.();
+    // Reduce Nagle’s algorithm buffering on the socket for faster flush
+    (res.socket as any)?.setNoDelay?.(true);
+    // Prime the SSE stream to break proxy buffering thresholds
+    res.write(':ok\n\n');
 
-    const stream = await answerStream(question, user_id, category_id, speech_tone, post_id);
-    stream.pipe(res);
+    const stream = await answerStream(question, user_id, category_id, speech_tone, post_id, llm);
+    // Manually bridge to ensure flushing of SSE deltas
+    stream.on('data', (chunk) => {
+      const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(String(chunk));
+      res.write(buf);
+      const canFlush = typeof (res as any).flush === 'function';
+      // try to flush if supported by runtime/middleware
+      (res as any).flush?.();
+      try {
+        console.log(
+          JSON.stringify({ type: 'debug.sse.write', at: Date.now(), bytes: buf.length, flushed: canFlush })
+        );
+      } catch {}
+    });
+    stream.on('end', () => {
+      res.end();
+    });
+    stream.on('error', () => {
+      res.end();
+    });
+
+    // Cleanup if client disconnects
+    req.on('close', () => {
+      try {
+        stream.destroy();
+      } catch {}
+    });
 
   } catch (error) {
     next(error);
diff --git a/src/controllers/ai.v2.controller.ts b/src/controllers/ai.v2.controller.ts
new file mode 100644
index 0000000..ed1f579
--- /dev/null
+++ b/src/controllers/ai.v2.controller.ts
@@ -0,0 +1,29 @@
+import { Request, Response, NextFunction } from 'express';
+import { AskV2Request } from '../types/ai.v2.types';
+import { answerStreamV2 } from '../services/qa.v2.service';
+
+export const askV2Handler = async (
+  req: Request<{}, {}, AskV2Request>,
+  res: Response,
+  next: NextFunction
+) => {
+  try {
+    const { question, user_id, category_id, speech_tone, post_id, llm } = req.body as any;
+    res.setHeader('Content-Type', 'text/event-stream');
+    res.setHeader('Cache-Control', 'no-cache');
+    res.setHeader('Connection', 'keep-alive');
+
+    const stream = await answerStreamV2(
+      question,
+      user_id,
+      category_id,
+      speech_tone,
+      post_id,
+      llm
+    );
+    stream.pipe(res);
+  } catch (error) {
+    next(error);
+  }
+};
+
diff --git a/src/llm/index.ts b/src/llm/index.ts
new file mode 100644
index 0000000..3c67eb4
--- /dev/null
+++ b/src/llm/index.ts
@@ -0,0 +1,162 @@
+import { PassThrough } from 'stream';
+import { GenerateRequest } from './types';
+import { getDefaultChat } from './model-registry';
+import { generateOpenAIStream } from './providers/openai-responses';
+import { generateGeminiStream } from './providers/gemini';
+import { countChatMessagesTokens, countTextTokens } from '../utils/tokenizer';
+import { calcCost, getModelPricing } from '../utils/cost';
+import config from '../config';
+import { randomUUID } from 'crypto';
+
+export const generate = async (req: GenerateRequest): Promise<PassThrough> => {
+  const merged = { ...req };
+  if (!merged.provider || !merged.model) {
+    const def = getDefaultChat();
+    merged.provider = merged.provider || def.provider;
+    merged.model = merged.model || def.modelId;
+  }
+
+  const doLog = (config.LLM_COST_LOG || '').toString().toLowerCase() === 'true';
+  const round = config.LLM_COST_ROUND ?? 4;
+  const corrId = randomUUID();
+  const model = merged.model as string;
+  const provider = merged.provider as string;
+
+  // Pre-call logging: prompt tokens + estimated input cost
+  const messages = merged.messages || [];
+  let promptTokens = 0;
+  try {
+    promptTokens = countChatMessagesTokens(messages as any, model);
+  } catch {
+    // ignore
+  }
+  const pricing = getModelPricing(model);
+  const estInputCost = pricing ? calcCost(promptTokens, pricing.input_per_1k) : 0;
+  if (doLog) {
+    const pre = {
+      type: 'llm.request',
+      corrId,
+      provider,
+      model,
+      promptTokens,
+      estInputCost,
+      userId: merged.meta?.userId,
+      categoryId: merged.meta?.categoryId,
+      postId: merged.meta?.postId,
+    };
+    console.log(JSON.stringify(pre));
+  }
+
+  const startedAt = Date.now();
+
+  const providerStream =
+    merged.provider === 'openai'
+      ? await generateOpenAIStream(merged)
+      : merged.provider === 'gemini'
+      ? await generateGeminiStream(merged)
+      : (() => {
+          const s = new PassThrough();
+          s.write(`event: error\n`);
+          s.write(`data: ${JSON.stringify({ message: 'Unknown provider' })}\n\n`);
+          s.end();
+          return s;
+        })();
+
+  // Wrap provider stream to accumulate output tokens
+  const outer = new PassThrough();
+  let buffer = '';
+  let outputText = '';
+
+  // Debug: start info
+  try {
+    console.log(
+      JSON.stringify({ type: 'debug.llm.start', provider, model, messages: (messages || []).length })
+    );
+  } catch {}
+
+  const flushBuffer = () => {
+    // Split by double newline to get SSE events
+    const chunks = buffer.split('\n\n');
+    // Keep last partial
+    buffer = chunks.pop() || '';
+    for (const block of chunks) {
+      const lines = block.split('\n');
+      let evt: string | null = null;
+      let dataLine: string | null = null;
+      for (const line of lines) {
+        if (line.startsWith('event:')) evt = line.slice(6).trim();
+        if (line.startsWith('data:')) dataLine = line.slice(5).trim();
+      }
+      if (evt === 'answer' && dataLine) {
+        try {
+          const parsed = JSON.parse(dataLine);
+          if (typeof parsed === 'string') outputText += parsed;
+          else outputText += JSON.stringify(parsed);
+        } catch {
+          outputText += dataLine;
+        }
+      }
+      outer.write(block + '\n\n');
+    }
+  };
+
+  providerStream.on('data', (chunk) => {
+    const str = Buffer.isBuffer(chunk) ? chunk.toString('utf8') : String(chunk);
+    buffer += str;
+    flushBuffer();
+  });
+  providerStream.on('end', () => {
+    if (buffer.length > 0) {
+      outer.write(buffer);
+      buffer = '';
+    }
+
+    const completionTokens = (() => {
+      try {
+        return countTextTokens(outputText, model);
+      } catch {
+        return 0;
+      }
+    })();
+    const durationMs = Date.now() - startedAt;
+    if (doLog) {
+      const inputCost = pricing ? calcCost(promptTokens, pricing.input_per_1k) : 0;
+      const outputCost = pricing ? calcCost(completionTokens, pricing.output_per_1k) : 0;
+      const totalCost = inputCost + outputCost;
+      const post = {
+        type: 'llm.response',
+        corrId,
+        provider,
+        model,
+        promptTokens,
+        completionTokens,
+        inputCost,
+        outputCost,
+        totalCost,
+        durationMs,
+      };
+      console.log(JSON.stringify(post));
+    }
+    try {
+      console.log(
+        JSON.stringify({ type: 'debug.llm.end', provider, model, durationMs, outputChars: outputText.length })
+      );
+    } catch {}
+    outer.end();
+  });
+  providerStream.on('error', (e) => {
+    if (doLog) {
+      console.log(
+        JSON.stringify({ type: 'llm.error', corrId, provider, model, message: (e as any)?.message || 'error' })
+      );
+    }
+    try {
+      console.error(
+        JSON.stringify({ type: 'debug.llm.error', provider, model, message: (e as any)?.message || 'error' })
+      );
+    } catch {}
+    outer.emit('error', e);
+  });
+
+  return outer;
+};
diff --git a/src/llm/model-registry.ts b/src/llm/model-registry.ts
new file mode 100644
index 0000000..14cbdfe
--- /dev/null
+++ b/src/llm/model-registry.ts
@@ -0,0 +1,12 @@
+import { ProviderName } from './types';
+
+type ModelEntry = {
+  provider: ProviderName;
+  modelId: string;
+};
+
+// Minimal registry for now; can expand with tokenizer/pricing later.
+const DEFAULT_CHAT: ModelEntry = { provider: 'openai', modelId: 'gpt-5-mini' };
+
+export const getDefaultChat = (): ModelEntry => DEFAULT_CHAT;
+
diff --git a/src/llm/providers/gemini.ts b/src/llm/providers/gemini.ts
new file mode 100644
index 0000000..a399317
--- /dev/null
+++ b/src/llm/providers/gemini.ts
@@ -0,0 +1,75 @@
+import { PassThrough } from 'stream';
+import config from '../../config';
+import { GenerateRequest } from '../types';
+
+// Using @google/genai per project plan; keep types loose for compatibility
+// eslint-disable-next-line @typescript-eslint/no-var-requires
+const { GoogleGenAI } = require('@google/genai');
+
+const buildPromptFromMessages = (messages: { role: string; content: string }[]) => {
+  // Simple concatenation preserving roles
+  return messages
+    .map((m) => `[${m.role}]\n${m.content}`)
+    .join('\n\n');
+};
+
+export const generateGeminiStream = async (req: GenerateRequest): Promise<PassThrough> => {
+  const stream = new PassThrough();
+  try {
+    const modelId = req.model || process.env.GEMINI_CHAT_MODEL || 'gemini-2.5-flash';
+    const apiKey = (config as any).GEMINI_API_KEY || process.env.GEMINI_API_KEY;
+    if (!apiKey) {
+      stream.write(`event: error\n`);
+      stream.write(`data: ${JSON.stringify({ message: 'Gemini API key not configured' })}\n\n`);
+      stream.end();
+      return stream;
+    }
+
+    const ai = new GoogleGenAI({ apiKey });
+
+    const text = buildPromptFromMessages(req.messages || []);
+
+    const generationConfig: any = {};
+    if (req.options?.temperature != null) generationConfig.temperature = req.options.temperature;
+    if (req.options?.top_p != null) generationConfig.topP = req.options.top_p;
+    if (req.options?.max_output_tokens != null) generationConfig.maxOutputTokens = req.options.max_output_tokens;
+
+    const thinkingBudget = parseInt(process.env.GEMINI_THINKING_BUDGET || '0', 10) || 0;
+    const configBlock: any = thinkingBudget > 0 ? { thinkingConfig: { thinkingBudget } } : {};
+
+    // Non-streaming first, then chunk SSE
+    const result = await ai.models.generateContent({
+      model: modelId,
+      contents: [
+        {
+          role: 'user',
+          parts: [{ text }],
+        },
+      ],
+      generationConfig,
+      config: configBlock,
+    });
+
+    // Try common text access paths
+    const outputText = (result?.response?.text && result.response.text()) || (result?.text && result.text()) || '';
+
+    const finalText = typeof outputText === 'string' ? outputText : '';
+
+    const chunkSize = 400;
+    for (let i = 0; i < finalText.length; i += chunkSize) {
+      const chunk = finalText.slice(i, i + chunkSize);
+      stream.write(`event: answer\n`);
+      stream.write(`data: ${JSON.stringify(chunk)}\n\n`);
+    }
+    stream.write(`event: end\n`);
+    stream.write(`data: [DONE]\n\n`);
+    stream.end();
+    return stream;
+  } catch (err) {
+    stream.write(`event: error\n`);
+    stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+    stream.end();
+    return stream;
+  }
+};
+
diff --git a/src/llm/providers/openai-responses.ts b/src/llm/providers/openai-responses.ts
new file mode 100644
index 0000000..65cc1ec
--- /dev/null
+++ b/src/llm/providers/openai-responses.ts
@@ -0,0 +1,320 @@
+import { PassThrough } from 'stream';
+import OpenAI from 'openai';
+import config from '../../config';
+import { GenerateRequest, OpenAIStyleMessage, OpenAIStyleTool } from '../types';
+
+const openai = new OpenAI({ apiKey: config.OPENAI_API_KEY });
+
+const toResponsesInput = (messages: OpenAIStyleMessage[] = []) => {
+  // Convert simple chat-style messages to Responses API input format
+  // Responses API expects 'input_text' as the content type (not 'text').
+  return messages.map((m) => ({
+    role: m.role,
+    content: [{ type: 'input_text', text: m.content }],
+  }));
+};
+
+const toResponsesTools = (tools: OpenAIStyleTool[] = []) => {
+  // Map Chat Completions style tool definitions to Responses API format
+  // Chat: { type: 'function', function: { name, description, parameters } }
+  // Responses: { type: 'function', name, description, parameters }
+  return tools
+    .filter((t) => t && (t as any).type === 'function' && (t as any).function?.name)
+    .map((t) => ({
+      type: 'function',
+      name: t.function.name,
+      description: t.function.description,
+      parameters: t.function.parameters,
+    }));
+};
+
+export const generateOpenAIStream = async (req: GenerateRequest): Promise<PassThrough> => {
+  const stream = new PassThrough();
+  // Guard to avoid writing after stream end
+  let closed = false;
+  const safeWrite = (chunk: string) => {
+    if (!closed && !stream.writableEnded && !stream.destroyed) {
+      stream.write(chunk);
+    }
+  };
+  const safeEnd = () => {
+    if (!closed && !stream.writableEnded && !stream.destroyed) {
+      closed = true;
+      stream.end();
+    } else {
+      closed = true;
+    }
+  };
+  const model = req.model || 'gpt-5-mini';
+  const messages = req.messages || [];
+  const toolsChat = (req.tools || []) as OpenAIStyleTool[];
+
+  // For gpt-5-* prefer Responses API. For other models, fall back to Chat Completions streaming.
+  const isGpt5Family = /(^|\b)gpt-5/i.test(model);
+
+  // Debug: basic call info
+  try {
+    console.log(
+      JSON.stringify({
+        type: 'debug.openai.start',
+        model,
+        isGpt5Family,
+        hasTools: Array.isArray(req.tools) && req.tools.length > 0,
+        options: {
+          temperature: req.options?.temperature,
+          top_p: req.options?.top_p,
+          max_output_tokens: req.options?.max_output_tokens,
+          reasoning_effort: (req as any)?.options?.reasoning_effort,
+          text_verbosity: (req as any)?.options?.text_verbosity,
+        },
+      })
+    );
+  } catch {}
+
+  try {
+    if (isGpt5Family) {
+      // Prefer Responses API streaming for gpt-5
+      try {
+        const respParams: any = {
+          model,
+          input: toResponsesInput(messages) as any,
+          tools: toolsChat && toolsChat.length > 0 ? toResponsesTools(toolsChat) : undefined,
+          max_output_tokens: req.options?.max_output_tokens,
+        };
+        // GPT-5 family: omit temperature/top_p; allow reasoning/text controls
+        if (req.options?.reasoning_effort) {
+          respParams.reasoning = { effort: req.options.reasoning_effort };
+        } else {
+          // 기본값: 생각(추론) 강도를 최소화하여 지연을 줄임
+          respParams.reasoning = { effort: 'minimal' };
+        }
+        if (req.options?.text_verbosity) {
+          respParams.text = { verbosity: req.options.text_verbosity };
+        } else {
+          // Encourage text output on GPT-5 if not specified
+          respParams.text = { verbosity: 'low' };
+        }
+        try {
+          console.log(
+            JSON.stringify({ type: 'debug.openai.path', path: 'responses.stream', paramsKeys: Object.keys(respParams) })
+          );
+        } catch {}
+        const responsesStream: any = await (openai as any).responses.stream(respParams);
+
+        // let loggedFirstDelta = false;
+        responsesStream.on('response.output_text.delta', (ev: any) => {
+          const text = typeof ev === 'string' ? ev : ev?.delta ?? '';
+          if (text) {
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(text)}\n\n`);
+            // try { console.log(JSON.stringify({ type: 'debug.openai.delta', len: String(text).length, at: Date.now() })); } catch {}
+            // if (!loggedFirstDelta) {
+              // try { console.log(JSON.stringify({ type: 'debug.openai.delta', len: String(text).length })); } catch {}
+              // loggedFirstDelta = true;
+            // }
+          }
+        });
+
+        // Stream tool-call arguments as answer chunks to maintain SSE shape
+        responsesStream.on('response.tool_call.delta', (ev: any) => {
+          const argsDelta = ev?.arguments_delta || ev?.arguments || ev?.delta || '';
+          if (argsDelta) {
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(argsDelta)}\n\n`);
+          }
+        });
+        // Also handle non-delta tool_call events
+        responsesStream.on('response.tool_call', (ev: any) => {
+          const args = ev?.arguments || ev?.arguments_delta || '';
+          if (args) {
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(args)}\n\n`);
+          }
+        });
+
+        // Catch-all messages to ensure we don't miss alternative text events
+        responsesStream.on('message', (msg: any) => {
+          try {
+            const m = typeof msg === 'string' ? JSON.parse(msg) : msg;
+            if (!m) return;
+            // Prefer explicit output_text delta
+            if (m.type === 'response.output_text.delta' && m.delta) {
+              safeWrite(`event: answer\n`);
+              safeWrite(`data: ${JSON.stringify(m.delta)}\n\n`);
+            }
+            // Some SDKs may emit full output_text chunk at once
+            else if (m.type === 'response.output_text' && typeof m.text === 'string') {
+              safeWrite(`event: answer\n`);
+              safeWrite(`data: ${JSON.stringify(m.text)}\n\n`);
+            }
+            // Generic delta fallback
+            else if (m.type === 'response.delta' && typeof m.delta === 'string') {
+              safeWrite(`event: answer\n`);
+              safeWrite(`data: ${JSON.stringify(m.delta)}\n\n`);
+            }
+            // Log for visibility
+            console.log(
+              JSON.stringify({ type: 'debug.openai.msg', mtype: m.type, keys: Object.keys(m || {}) })
+            );
+          } catch (e) {
+            try { console.log(JSON.stringify({ type: 'debug.openai.msg_parse_error' })); } catch {}
+          }
+        });
+
+        responsesStream.on('response.completed', () => {
+          safeWrite(`event: end\n`);
+          safeWrite(`data: [DONE]\n\n`);
+          safeEnd();
+          try { console.log(JSON.stringify({ type: 'debug.openai.completed' })); } catch {}
+        });
+
+        responsesStream.on('error', (e: any) => {
+          safeWrite(`event: error\n`);
+          safeWrite(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+          safeEnd();
+          try {
+            console.error(
+              JSON.stringify({ type: 'debug.openai.error', path: 'responses.stream', message: (e as any)?.message })
+            );
+          } catch {}
+        });
+
+        // Do not await completion here; return immediately so callers can consume deltas in real-time
+        (async () => {
+          try {
+            await responsesStream.done();
+          } catch {}
+        })();
+        return stream;
+      } catch (e) {
+        // Fallback to non-streaming Responses if streaming path fails
+        try {
+          const createParams: any = {
+            model,
+            input: toResponsesInput(messages) as any,
+            // Avoid tools in non-streaming mode to ensure text output
+            max_output_tokens: req.options?.max_output_tokens,
+          };
+          if (req.options?.reasoning_effort) createParams.reasoning = { effort: req.options.reasoning_effort };
+          else createParams.reasoning = { effort: 'low' };
+          if (req.options?.text_verbosity) createParams.text = { verbosity: req.options.text_verbosity };
+          try {
+            console.log(
+              JSON.stringify({ type: 'debug.openai.path', path: 'responses.create', paramsKeys: Object.keys(createParams) })
+            );
+          } catch {}
+          const response = await openai.responses.create(createParams);
+          const text = (response as any).output_text ?? '';
+          const answerText = typeof text === 'string' ? text : '';
+          const fallbackText = (() => {
+            try {
+              const outputs = (response as any).output || [];
+              if (Array.isArray(outputs) && outputs.length > 0) {
+                const parts = outputs
+                  .flatMap((o: any) => o.content || [])
+                  .filter((c: any) => c.type === 'output_text')
+                  .map((c: any) => c.text)
+                  .join('');
+                return parts || '';
+              }
+            } catch {
+              // ignore
+            }
+            return '';
+          })();
+          const finalText = answerText || fallbackText;
+          const chunkSize = 400;
+          for (let i = 0; i < finalText.length; i += chunkSize) {
+            const chunk = finalText.slice(i, i + chunkSize);
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(chunk)}\n\n`);
+          }
+          safeWrite(`event: end\n`);
+          safeWrite(`data: [DONE]\n\n`);
+          safeEnd();
+          return stream;
+        } catch (e2) {
+          // fall through to chat completions streaming below
+          try {
+            console.error(
+              JSON.stringify({ type: 'debug.openai.error', path: 'responses.create', message: (e2 as any)?.message })
+            );
+          } catch {}
+        }
+      }
+    }
+
+    // Chat Completions streaming as universal fallback
+    try { console.log(JSON.stringify({ type: 'debug.openai.path', path: 'chat.completions.stream' })); } catch {}
+    let chatStream: any;
+
+    // temperature/top_p are not supported on reasoning models (e.g., GPT-5 family)
+    if (!isGpt5Family) {
+      chatStream = await openai.chat.completions.create({
+        model,
+        messages: messages as any,
+        tools: (req.tools as OpenAI.Chat.Completions.ChatCompletionTool[]) || undefined,
+        tool_choice: req.tools && req.tools.length > 0 ? 'auto' : undefined,
+        stream: true,
+        temperature: req.options?.temperature,
+        top_p: req.options?.top_p,
+        max_tokens: req.options?.max_output_tokens as any,
+      });
+    } else {
+      chatStream = await openai.chat.completions.create({
+        model,
+        messages: messages as any,
+        tools: (req.tools as OpenAI.Chat.Completions.ChatCompletionTool[]) || undefined,
+        tool_choice: req.tools && req.tools.length > 0 ? 'auto' : undefined,
+        stream: true,
+        max_tokens: req.options?.max_output_tokens as any,
+      });
+    }
+    
+    // Iterate asynchronously; return stream immediately to allow real-time consumption
+    (async () => {
+      try {
+        for await (const chunk of chatStream) {
+          const content = chunk.choices[0]?.delta?.content || '';
+          const toolCalls = chunk.choices[0]?.delta?.tool_calls;
+
+          if (toolCalls) {
+            for (const toolCall of toolCalls) {
+              if (toolCall.function?.arguments) {
+                safeWrite(`event: answer\n`);
+                safeWrite(`data: ${JSON.stringify(toolCall.function.arguments)}\n\n`);
+              }
+            }
+          } else if (content) {
+            safeWrite(`event: answer\n`);
+            safeWrite(`data: ${JSON.stringify(content)}\n\n`);
+          }
+
+          if (chunk.choices[0]?.finish_reason) {
+            safeWrite(`event: end\n`);
+            safeWrite(`data: [DONE]\n\n`);
+            safeEnd();
+            try { console.log(JSON.stringify({ type: 'debug.openai.completed', path: 'chat.completions.stream' })); } catch {}
+            break;
+          }
+        }
+      } catch (e) {
+        safeWrite(`event: error\n`);
+        safeWrite(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+        safeEnd();
+      }
+    })();
+
+    return stream;
+  } catch (err) {
+    safeWrite(`event: error\n`);
+    safeWrite(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+    safeEnd();
+    try {
+      console.error(
+        JSON.stringify({ type: 'debug.openai.error', path: 'top', message: (err as any)?.message, model, isGpt5Family })
+      );
+    } catch {}
+    return stream;
+  }
+};
diff --git a/src/llm/types.ts b/src/llm/types.ts
new file mode 100644
index 0000000..3c12a6f
--- /dev/null
+++ b/src/llm/types.ts
@@ -0,0 +1,35 @@
+export type ProviderName = 'openai' | 'gemini';
+
+export type OpenAIStyleMessage = {
+  role: 'system' | 'user' | 'assistant' | 'tool' | 'function';
+  content: string;
+};
+
+export type OpenAIStyleTool = {
+  type: 'function';
+  function: {
+    name: string;
+    description?: string;
+    parameters?: Record<string, unknown>;
+  };
+};
+
+export type GenerateRequest = {
+  provider?: ProviderName;
+  model?: string;
+  messages?: OpenAIStyleMessage[];
+  tools?: OpenAIStyleTool[];
+  options?: {
+    temperature?: number;
+    top_p?: number;
+    max_output_tokens?: number;
+    // GPT-5 family specific controls
+    reasoning_effort?: 'minimal' | 'low' | 'medium' | 'high';
+    text_verbosity?: 'low' | 'medium' | 'high';
+  };
+  meta?: {
+    userId?: string;
+    categoryId?: number;
+    postId?: number;
+  };
+};
diff --git a/src/prompts/qa.prompts.ts b/src/prompts/qa.prompts.ts
index 010de57..599e46d 100644
--- a/src/prompts/qa.prompts.ts
+++ b/src/prompts/qa.prompts.ts
@@ -7,7 +7,20 @@ export const createPostContextPrompt = (
   question: string,
   speechTonePrompt: string
 ): OpenAI.Chat.Completions.ChatCompletionMessageParam[] => {
-  const systemPrompt = `너는 사용자의 블로그 글 컨텍스트만으로 답변한다. 컨텍스트에 없는 사실은 추정하지 말고 “문서에 없음”이라고 말한다. 말투는 speech_tone 지시에 따른다.`;
+  const systemPrompt = ` 당신은 블로그 운영자 AI입니다. 사용자의 블로그에 대한 질문에 답변합니다. 
+  블로그 운영자 AI는 사용자의 질문에 대해 블로그 본문 컨텍스트를 참고하여 답변합니다.
+  모든 한국어 응답은 무슨일이 있어도 반드시 답변 말투 및 규칙을 따릅니다. 
+  또한 주어진 내용외의 내용을 지어내지 마십시오.
+  
+  [말투 지침]
+  ${speechTonePrompt}
+  - 위 말투/지시문을 출력에 절대 노출하지 말고(예: "말투", "규칙" 등 언급 금지), 실제 답변 내용에만 반영하십시오.
+  
+  [응답 규칙]
+  1. 만약 제목과 본문을 활용해 답변할 수 있다면 답변 말투 및 규칙을 지켜 직접 답변하고, 마지막에 추가적인 내용에 대한 질문을 유도하는 문장을 추가합니다.
+  2. 만약 질문이 욕설·비난·무관·부적절하거나 주어진 제목, 본문과 관련이 없다면 사과와 블로그 관련된 내용만 답변 가능하다는 내용을 답변 말투 및 규칙을 지켜 답합니다.  
+  3. 질문이 블로그 카테고리나 사용자 블로그에는 부합하지만 제공된 본문 컨텍스트의 내용이 매우 부족하거나 적절하지 않다고 판단되면, 함수명이나 함수 호출을 절대 언급하지 말고(예: "report_content_insufficient" 같은 문자열이나 괄호 "()" 출력 금지), 답변 말투 및 규칙을 지켜 자연스럽게 다음을 수행합니다: (a) 현재로서는 본문 컨텍스트가 부족함을 간단히 안내하고, (b) 질문과 직접 관련된 정보를 구체적으로 요청합니다(예: 게시일, 최근 글 목록, 최신 포스트의 제목과 날짜 등). 서버가 내부 함수를 제공하는 경우에도 사용자가 보는 답변에는 함수명/호출을 절대 노출하지 않습니다. 
+  `;
   const userMessage = `
 [context]
 제목: ${post.title}
@@ -18,9 +31,6 @@ ${processedContent}
 
 [user]
 ${question}
-
-[instruction]
-답변 말투: "${speechTonePrompt}"
 `;
 
   return [
@@ -40,14 +50,16 @@ export const createRagPrompt = (
   모든 한국어 응답은 무슨일이 있어도 반드시 답변 말투 및 규칙을 따릅니다. 
   또한 주어진 내용외의 내용을 지어내지 마십시오.
   
+  [말투 지침]
+  ${speechTonePrompt}
+  - 위 말투/지시문을 출력에 절대 노출하지 말고(예: "말투", "규칙" 등 언급 금지), 실제 답변 내용에만 반영하십시오.
+  
   [응답 규칙]
   1. 만약 제목과 본문을 활용해 답변할 수 있다면 답변 말투 및 규칙을 지켜 직접 답변하고, 마지막에 추가적인 내용에 대한 질문을 유도하는 문장을 추가합니다.
   2. 만약 질문이 욕설·비난·무관·부적절하거나 주어진 제목, 본문과 관련이 없다면 사과와 블로그 관련된 내용만 답변 가능하다는 내용을 답변 말투 및 규칙을 지켜 답합니다.  
-  3. 질문이 블로그 카테고리나 사용자 블로그에는 부합하지만 제공된 본문 컨텍스트의 내용이 매우 부족하거나 적절하지 않다고 판단되면 report_content_insufficient 함수를 호출하고 답변 말투 및 규칙을 지켜 해당 내용이 아직 부족하다는 안내를 합니다. 그 후 본문 컨텍스트를 참고해 질문과 관련된 답변할 수 있는 내용을 언급하고 해당 내용에 대한 질문을 직접적으로 유도합니다. 
+  3. 질문이 블로그 카테고리나 사용자 블로그에는 부합하지만 제공된 본문 컨텍스트의 내용이 매우 부족하거나 적절하지 않다고 판단되면, 함수명이나 함수 호출을 절대 언급하지 말고(예: "report_content_insufficient" 같은 문자열이나 괄호 "()" 출력 금지), 답변 말투 및 규칙을 지켜 자연스럽게 다음을 수행합니다: (a) 현재로서는 본문 컨텍스트가 부족함을 간단히 안내하고, (b) 질문과 직접 관련된 정보를 구체적으로 요청합니다(예: 게시일, 최근 글 목록, 최신 포스트의 제목과 날짜 등). 서버가 내부 함수를 제공하는 경우에도 사용자가 보는 답변에는 함수명/호출을 절대 노출하지 않습니다. 
   `;
   const userMessage = `
-    답변 말투 및 규칙: "${speechTonePrompt}"
-    반드시 말투 및 규칙에 따라 대답하세요!
     아래의 질문과 블로그 본문 컨텍스트를 참고하여 답변하세요.
     사용자의 질문: ${question}
     가장 근접한 블로그 본문 컨텍스트:
diff --git a/src/prompts/qa.v2.prompts.ts b/src/prompts/qa.v2.prompts.ts
new file mode 100644
index 0000000..a752324
--- /dev/null
+++ b/src/prompts/qa.v2.prompts.ts
@@ -0,0 +1,117 @@
+// Search Plan prompt templates for v2
+
+import { SearchPlan } from '../types/ai.v2.types';
+
+export const getSearchPlanSchemaJson = (): Record<string, unknown> => ({
+  type: 'object',
+  additionalProperties: false,
+  properties: {
+    mode: { enum: ['rag', 'post'] },
+    top_k: { type: 'integer', minimum: 1, maximum: 10 },
+    threshold: { type: 'number', minimum: 0, maximum: 1 },
+    weights: {
+      type: 'object',
+      additionalProperties: false,
+      properties: {
+        chunk: { type: 'number', minimum: 0, maximum: 1 },
+        title: { type: 'number', minimum: 0, maximum: 1 },
+      },
+      required: ['chunk', 'title'],
+    },
+    rewrites: { type: 'array', items: { type: 'string' } },
+    keywords: { type: 'array', items: { type: 'string' } },
+    hybrid: {
+      type: 'object',
+      additionalProperties: false,
+      properties: {
+        enabled: { type: 'boolean' },
+        retrieval_bias: { enum: ['lexical', 'balanced', 'semantic'] },
+        max_rewrites: { type: 'integer', minimum: 0, maximum: 4 },
+        max_keywords: { type: 'integer', minimum: 0, maximum: 8 },
+      },
+      required: ['enabled', 'retrieval_bias', 'max_rewrites', 'max_keywords'],
+    },
+    // Only time is allowed under filters. Responses JSON Schema requires closed objects
+    // with explicit required fields at each level. We constrain time to label-form only.
+    filters: {
+      type: 'object',
+      additionalProperties: false,
+      properties: {
+        time: {
+          type: 'object',
+          additionalProperties: false,
+          properties: {
+            type: { type: 'string', enum: ['label'] },
+            label: { type: 'string', minLength: 1 },
+          },
+          required: ['type', 'label'],
+        },
+      },
+      required: ['time'],
+    },
+    sort: { enum: ['created_at_desc', 'created_at_asc'] },
+    limit: { type: 'integer', minimum: 1, maximum: 20 },
+  },
+  required: ['mode', 'top_k', 'threshold', 'weights', 'rewrites', 'keywords', 'hybrid', 'filters', 'sort', 'limit'],
+});
+
+export const buildSearchPlanPrompt = (params: {
+  now_utc: string;
+  now_kst: string;
+  timezone: string;
+  user_id: string;
+  category_id?: number;
+  post_id?: number;
+  defaults?: Partial<SearchPlan>;
+  question: string;
+}): string => {
+  const defaults = JSON.stringify(
+    params.defaults || {
+      top_k: 5,
+      threshold: 0.2,
+      weights: { chunk: 0.7, title: 0.3 },
+      sort: 'created_at_desc',
+      limit: 5,
+    },
+  );
+
+  const schemaHint = JSON.stringify(getSearchPlanSchemaJson());
+
+  return [
+    'You are a Search Plan Generator for a Korean blogging platform.',
+    'Your task is to read the user question and output ONLY a JSON object that defines a safe search plan.',
+    '',
+    `now_utc: ${params.now_utc}`,
+    `now_kst: ${params.now_kst}`,
+    `timezone: ${params.timezone}`,
+    `user_id: ${params.user_id}`,
+    `category_id: ${params.category_id ?? ''}`,
+    `post_id: ${params.post_id ?? ''}`,
+    '',
+    'Rules:',
+    '1) Output ONLY a single JSON object matching the schema. No extra text.',
+    '2) Respect bounds: top_k 1..10, limit 1..20, threshold 0..1, weights in [0,1]. The server normalizes their sum.',
+    '3) Do NOT output any filters other than filters.time. The server injects user_id/category_id/post_id.',
+    '   - Your job: decide top_k, threshold, sort, limit; and optionally rewrites/keywords/hybrid only.',
+    '4) Mode must follow constraints: if post_id exists, use mode="post"; otherwise use mode="rag".',
+    '5) Time MUST be provided via a label only: filters.time = { "type": "label", "label": "..." }',
+    '   - Allowed labels (examples): "all_time"(no filter), "today", "yesterday", "last_7_days", "last_14_days", "last_30_days", "this_month", "last_month",',
+    '     or structured: "2006_to_now", "2019-2022", "2024-Q3", "Q3-2024", "2024-09", "2024".',
+    '   - For queries like "최근 글", prefer a short window label such as "last_7_days" or "last_30_days" (choose appropriately).',
+    '6) Do NOT include any temporal words or ranges inside rewrites/keywords. Temporal intent must live ONLY in filters.time.',
+    '7) If the question asks for N items (e.g., “N개”), set limit=N within bounds.',
+    '8) Keep weights to defaults unless a clear need implies otherwise.',
+    '8) When helpful for recall, set hybrid.enabled=true and choose hybrid.retrieval_bias ∈ {lexical, balanced, semantic}. Then generate concise rewrites (<= max_rewrites) and focused keywords (<= max_keywords).',
+    '   - lexical: keyword/정확 매칭이 중요할 때 (숫자, 버전, 고유명사 등).',
+    '   - balanced: 일반 질의.',
+    '   - semantic: 개념/요약/의도 중심일 때.',
+    '9) Avoid stop/common words (예: "글", "포스트", "블로그", "소개", "정리"). Keep within the user context; avoid over-broad topics or time spans.',
+    '10) Remove near-duplicates: if rewrites/keywords are synonymous or highly similar, include only one.',
+    '',
+    `Schema: ${schemaHint}`,
+    '',
+    `Question (Korean):\n${params.question}`,
+    '',
+    'Respond with ONLY the JSON object. No markdown, no explanation.',
+  ].join('\n');
+};
diff --git a/src/repositories/post.repository.ts b/src/repositories/post.repository.ts
index 545dc57..ffc17c0 100644
--- a/src/repositories/post.repository.ts
+++ b/src/repositories/post.repository.ts
@@ -9,6 +9,7 @@ export interface Post {
   tags: string[];
   created_at: string;
   user_id: string;
+  is_public:boolean;
 }
 
 export interface SimilarChunk {
@@ -18,14 +19,36 @@ export interface SimilarChunk {
   similarityScore: number;
 }
 
+export interface TextSearchHit {
+  postId: string;
+  postTitle: string;
+  postChunk: string;
+  textScore: number;
+}
+
 // ========= READ QUERIES =========
 export const findPostById = async (postId: number): Promise<Post | null> => {
   const pool = getDb();
-  const { rows } = await pool.query<Post>(
-    'SELECT id, title, content, tags, created_at, user_id FROM blog_post WHERE id = $1',
+  // Some databases may not have a `tags` column on blog_post.
+  // Select existing columns and populate `tags` as an empty array fallback.
+  const { rows } = await pool.query(
+    'SELECT id, title, content, created_at, user_id, is_public FROM blog_post WHERE id = $1',
     [postId]
   );
-  return rows.length > 0 ? rows[0] : null;
+  if (rows.length === 0) return null;
+
+  const row = rows[0] as any;
+  const post: Post = {
+    id: row.id,
+    title: row.title,
+    content: row.content,
+    // Fallback: DB has no tags column; keep empty list so prompts render gracefully
+    tags: Array.isArray(row.tags) ? row.tags : [],
+    created_at: row.created_at,
+    user_id: row.user_id,
+    is_public: row.is_public,
+  };
+  return post;
 };
 
 export const findSimilarChunks = async (
@@ -131,4 +154,208 @@ export const storeContentEmbeddings = async (
     await pool.query('ROLLBACK');
     throw error;
   }
-};
\ No newline at end of file
+};
+
+// ========= READ QUERIES (V2 dynamic) =========
+export const findSimilarChunksV2 = async (params: {
+  userId: string;
+  embedding: number[];
+  categoryId?: number;
+  from?: string; // ISO UTC
+  to?: string;   // ISO UTC
+  threshold?: number; // 0..1
+  topK?: number; // default 5, max 10
+  weights?: { chunk: number; title: number };
+  sort?: 'created_at_desc' | 'created_at_asc';
+}): Promise<SimilarChunk[]> => {
+  const pool = getDb();
+  const wChunk = Math.max(0, Math.min(1, params.weights?.chunk ?? 0.7));
+  const wTitle = Math.max(0, Math.min(1, params.weights?.title ?? 0.3));
+  const thr = params.threshold != null ? Math.max(0, Math.min(1, params.threshold)) : 0.2;
+  const limit = Math.min(10, Math.max(1, params.topK ?? 5));
+
+  const parts: string[] = [];
+  const values: any[] = [];
+
+  // $1: userId, $2: embedding
+  values.push(params.userId);
+  values.push(pgvector.toSql(params.embedding));
+
+  const hasCategory = typeof params.categoryId === 'number';
+  const hasTime = !!(params.from && params.to);
+
+  if (hasCategory) {
+    const catParam = values.length + 1; // next index
+    parts.push(`
+      WITH category_ids AS (
+        SELECT DISTINCT cc.descendant_id
+        FROM category_closure cc
+        WHERE cc.ancestor_id = $${catParam}
+      ),
+      filtered_posts AS (
+        SELECT bp.id AS post_id, bp.title AS post_title, bp.created_at
+        FROM blog_post bp
+        WHERE bp.user_id = $1 AND bp.category_id IN (SELECT descendant_id FROM category_ids)
+      )`);
+    values.push(params.categoryId);
+  } else {
+    parts.push(`
+      WITH filtered_posts AS (
+        SELECT id AS post_id, title AS post_title, created_at
+        FROM blog_post
+        WHERE user_id = $1
+      )`);
+  }
+
+  // base select and threshold
+  const thrParam = values.length + 1;
+  parts.push(`
+    SELECT
+      fp.post_id,
+      fp.post_title,
+      pc.content AS post_chunk,
+      (${wChunk} * (1.0 - (pc.embedding <=> $2))) + (${wTitle} * (1.0 - (pte.embedding <=> $2))) AS similarity_score,
+      fp.created_at
+    FROM filtered_posts fp
+    JOIN post_chunks pc ON pc.post_id = fp.post_id
+    JOIN post_title_embeddings pte ON pte.post_id = fp.post_id
+    WHERE (1.0 - (pc.embedding <=> $2)) > $${thrParam}
+  `);
+  values.push(thr);
+
+  if (hasTime) {
+    const fromParam = values.length + 1;
+    const toParam = values.length + 2;
+    parts.push(` AND fp.created_at BETWEEN $${fromParam} AND $${toParam}`);
+    values.push(params.from, params.to);
+  }
+
+  let orderBy = 'similarity_score DESC';
+  if (params.sort === 'created_at_desc') orderBy = 'similarity_score DESC, fp.created_at DESC';
+  if (params.sort === 'created_at_asc') orderBy = 'similarity_score DESC, fp.created_at ASC';
+
+  const limitParam = values.length + 1;
+  parts.push(` ORDER BY ${orderBy} LIMIT $${limitParam}`);
+  values.push(limit);
+
+  const sql = parts.join('\n');
+
+  const { rows } = await pool.query(sql, values);
+  return rows.map((row) => ({
+    postId: row.post_id,
+    postTitle: row.post_title,
+    postChunk: row.post_chunk,
+    similarityScore: row.similarity_score,
+  }));
+};
+
+export const textSearchChunksV2 = async (params: {
+  userId: string;
+  query?: string;
+  keywords?: string[];
+  categoryId?: number;
+  from?: string;
+  to?: string;
+  topK?: number;
+  sort?: 'created_at_desc' | 'created_at_asc';
+}): Promise<TextSearchHit[]> => {
+  const pool = getDb();
+  const limit = Math.min(10, Math.max(1, params.topK ?? 5));
+
+  const values: any[] = [];
+  values.push(params.userId);
+
+  const hasCategory = typeof params.categoryId === 'number';
+  const hasTime = !!(params.from && params.to);
+  const hasQuery = !!params.query && params.query.trim().length > 0;
+  const keywords = (params.keywords || []).filter((k) => typeof k === 'string' && k.trim().length > 0);
+
+  const withParts: string[] = [];
+  if (hasCategory) {
+    const catParam = values.length + 1;
+    withParts.push(`
+      category_ids AS (
+        SELECT DISTINCT cc.descendant_id FROM category_closure cc WHERE cc.ancestor_id = $${catParam}
+      ),
+      filtered_posts AS (
+        SELECT bp.id AS post_id, bp.title AS post_title, bp.created_at
+        FROM blog_post bp
+        WHERE bp.user_id = $1 AND bp.category_id IN (SELECT descendant_id FROM category_ids)
+      )`);
+    values.push(params.categoryId);
+  } else {
+    withParts.push(`
+      filtered_posts AS (
+        SELECT id AS post_id, title AS post_title, created_at
+        FROM blog_post
+        WHERE user_id = $1
+      )`);
+  }
+
+  let base = `
+    SELECT
+      fp.post_id,
+      fp.post_title,
+      pc.content AS post_chunk,
+      0::float8 AS content_sim,
+      0::float8 AS title_sim,
+      fp.created_at
+    FROM filtered_posts fp
+    JOIN post_chunks pc ON pc.post_id = fp.post_id
+  `;
+  if (hasQuery) {
+    const qParam = values.length + 1;
+    base = `
+      SELECT
+        fp.post_id,
+        fp.post_title,
+        pc.content AS post_chunk,
+        COALESCE(similarity(pc.content, $${qParam}), 0) AS content_sim,
+        COALESCE(similarity(fp.post_title, $${qParam}), 0) AS title_sim,
+        fp.created_at
+      FROM filtered_posts fp
+      JOIN post_chunks pc ON pc.post_id = fp.post_id
+    `;
+    values.push(params.query);
+  }
+
+  const whereParts: string[] = [];
+  if (hasTime) {
+    const fromParam = values.length + 1;
+    const toParam = values.length + 2;
+    whereParts.push(`fp.created_at BETWEEN $${fromParam} AND $${toParam}`);
+    values.push(params.from, params.to);
+  }
+
+  const likePatterns: string[] = [];
+  for (const k of keywords) {
+    likePatterns.push(`%${k}%`);
+  }
+  if (likePatterns.length > 0) {
+    const arrParam = values.length + 1;
+    whereParts.push(`(pc.content ILIKE ANY($${arrParam}) OR fp.post_title ILIKE ANY($${arrParam}))`);
+    values.push(likePatterns);
+  }
+
+  const whereSql = whereParts.length > 0 ? `WHERE ${whereParts.join(' AND ')}` : '';
+
+  let orderBy = 'content_sim DESC';
+  if (params.sort === 'created_at_desc') orderBy = 'content_sim DESC, fp.created_at DESC';
+  if (params.sort === 'created_at_asc') orderBy = 'content_sim DESC, fp.created_at ASC';
+
+  const limitParam = values.length + 1;
+  const sql = `${withParts.length > 0 ? 'WITH ' + withParts.join(',\n') : ''}
+${base}
+${whereSql}
+ORDER BY ${orderBy}
+LIMIT $${limitParam}`;
+  values.push(limit);
+
+  const { rows } = await pool.query(sql, values);
+  return rows.map((row) => ({
+    postId: row.post_id,
+    postTitle: row.post_title,
+    postChunk: row.post_chunk,
+    textScore: Math.max(Number(row.content_sim) || 0, Number(row.title_sim) || 0),
+  }));
+};
diff --git a/src/routes/ai.v2.routes.ts b/src/routes/ai.v2.routes.ts
new file mode 100644
index 0000000..2558023
--- /dev/null
+++ b/src/routes/ai.v2.routes.ts
@@ -0,0 +1,14 @@
+import { Router } from 'express';
+import { askV2Handler } from '../controllers/ai.v2.controller';
+import { authMiddleware } from '../middlewares/auth.middleware';
+
+const aiV2Router = Router();
+
+aiV2Router.get('/health', (req, res) => {
+  res.status(200).json({ status: 'ok', v: 'v2' });
+});
+
+aiV2Router.post('/ask', authMiddleware, askV2Handler);
+
+export default aiV2Router;
+
diff --git a/src/services/hybrid-search.service.ts b/src/services/hybrid-search.service.ts
new file mode 100644
index 0000000..592e723
--- /dev/null
+++ b/src/services/hybrid-search.service.ts
@@ -0,0 +1,104 @@
+import { createEmbeddings } from './embedding.service';
+import * as postRepository from '../repositories/post.repository';
+import { SearchPlan } from '../types/ai.v2.types';
+
+export type HybridSearchResult = {
+  postId: string;
+  postTitle: string;
+  postChunk: string;
+  similarityScore: number;
+}[];
+
+type Candidate = {
+  postId: string;
+  postTitle: string;
+  postChunk: string;
+  vecScore?: number;
+  textScore?: number;
+};
+
+export const runHybridSearch = async (
+  question: string,
+  userId: string,
+  plan: SearchPlan
+): Promise<HybridSearchResult> => {
+  const queries = [question, ...((plan.rewrites as string[]) || [])];
+  const alpha = Math.max(0, Math.min(1, (plan.hybrid as any)?.alpha ?? 0.7));
+
+  const from = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.from : undefined;
+  const to = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.to : undefined;
+  const categoryId = (plan.filters as any)?.category_ids?.[0];
+
+  const embeddings = await createEmbeddings(queries);
+
+  const byKey = new Map<string, Candidate>();
+
+  for (let i = 0; i < embeddings.length; i++) {
+    const emb = embeddings[i];
+    const rows = await postRepository.findSimilarChunksV2({
+      userId,
+      embedding: emb,
+      categoryId,
+      from,
+      to,
+      threshold: plan.threshold,
+      topK: plan.top_k,
+      weights: plan.weights,
+      sort: plan.sort,
+    });
+    for (const r of rows) {
+      const key = `${r.postId}:${r.postChunk}`;
+      const prev = byKey.get(key);
+      const curVec = Number(r.similarityScore) || 0;
+      if (!prev) {
+        byKey.set(key, { postId: r.postId, postTitle: r.postTitle, postChunk: r.postChunk, vecScore: curVec });
+      } else {
+        prev.vecScore = Math.max(prev.vecScore || 0, curVec);
+      }
+    }
+  }
+
+  const textRows = await postRepository.textSearchChunksV2({
+    userId,
+    query: question,
+    keywords: (plan.keywords as string[]) || [],
+    categoryId,
+    from,
+    to,
+    topK: plan.top_k,
+    sort: plan.sort,
+  });
+  for (const r of textRows) {
+    const key = `${r.postId}:${r.postChunk}`;
+    const prev = byKey.get(key);
+    const curText = Number(r.textScore) || 0;
+    if (!prev) {
+      byKey.set(key, { postId: r.postId, postTitle: r.postTitle, postChunk: r.postChunk, textScore: curText });
+    } else {
+      prev.textScore = Math.max(prev.textScore || 0, curText);
+    }
+  }
+
+  const list = Array.from(byKey.values());
+  const vecVals = list.map((c) => c.vecScore || 0);
+  const textVals = list.map((c) => c.textScore || 0);
+  const vMin = Math.min(...vecVals, 0);
+  const vMax = Math.max(...vecVals, 0);
+  const tMin = Math.min(...textVals, 0);
+  const tMax = Math.max(...textVals, 0);
+
+  const norm = (v: number, lo: number, hi: number) => (hi > lo ? (v - lo) / (hi - lo) : 0);
+
+  const fused = list
+    .map((c) => {
+      const v = norm(c.vecScore || 0, vMin, vMax);
+      const t = norm(c.textScore || 0, tMin, tMax);
+      const score = alpha * v + (1 - alpha) * t;
+      return { postId: c.postId, postTitle: c.postTitle, postChunk: c.postChunk, similarityScore: score };
+    })
+    .sort((a, b) => b.similarityScore - a.similarityScore)
+    .slice(0, Math.min(10, Math.max(1, plan.top_k || 5)));
+
+  return fused;
+};
+
diff --git a/src/services/qa.service.ts b/src/services/qa.service.ts
index 2361bbc..e0b308f 100644
--- a/src/services/qa.service.ts
+++ b/src/services/qa.service.ts
@@ -1,14 +1,10 @@
 import { createEmbeddings } from './embedding.service';
 import { PassThrough } from 'stream';
-import OpenAI from 'openai';
 import config from '../config';
 import * as postRepository from '../repositories/post.repository';
 import * as personaRepository from '../repositories/persona.repository';
 import * as qaPrompts from '../prompts/qa.prompts';
-
-const openai = new OpenAI({
-  apiKey: config.OPENAI_API_KEY,
-});
+import { generate } from '../llm';
 
 const preprocessContent = (content: string): string => {
   const plainText = content.replace(/<[^>]*>/g, ' ').replace(/\s+/g, ' ').trim();
@@ -27,20 +23,45 @@ const getSpeechTonePrompt = async (speechTone: number, userId: string): Promise<
   return "간결하고 명확한 말투로 답변해"; // Default
 }
 
+type LlmOverride = {
+  provider?: 'openai' | 'gemini';
+  model?: string;
+  options?: { temperature?: number; top_p?: number; max_output_tokens?: number };
+};
+
 export const answerStream = async (
   question: string,
   userId: string,
   categoryId?: number,
   speechTone: number = -1,
-  postId?: number
+  postId?: number,
+  llm?: LlmOverride
 ): Promise<PassThrough> => {
   const stream = new PassThrough();
-
-  let messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[] = [];
-  let tools: OpenAI.Chat.Completions.ChatCompletionTool[] | undefined = undefined;
+  try {
+    console.log(
+      JSON.stringify({ type: 'debug.qa.start', questionLen: question?.length || 0, userId, categoryId, postId, speechTone, llm })
+    );
+  } catch {}
+
+  let messages: { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] = [];
+  let tools:
+    | {
+        type: 'function';
+        function: { name: string; description?: string; parameters?: Record<string, unknown> };
+      }[]
+    | undefined = undefined;
 
   (async () => {
     const speechTonePrompt = await getSpeechTonePrompt(speechTone, userId);
+    const toSimpleMessages = (
+      raw: any[]
+    ): { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] => {
+      return (raw || []).map((m: any) => ({
+        role: m.role,
+        content: typeof m.content === 'string' ? m.content : JSON.stringify(m.content),
+      }));
+    };
 
     if (postId) {
       const post = await postRepository.findPostById(postId);
@@ -48,20 +69,31 @@ export const answerStream = async (
       if (!post) {
         stream.write(`event: error\ndata: ${JSON.stringify({ code: 404, message: 'Post not found' })}\n\n`);
         stream.end();
+        try { console.warn(JSON.stringify({ type: 'debug.qa.post', status: 'not_found', postId })); } catch {}
         return;
       }
-
-      if (post.user_id !== userId) {
-         stream.write(`event: error\ndata: ${JSON.stringify({ code: 403, message: 'Forbidden' })}\n\n`);
-         stream.end();
-         return;
+      
+      // Enforce conditional ownership: if post is not public, require owner
+      if (!post.is_public && post.user_id !== userId) {
+        stream.write(`event: error\n`);
+        stream.write(`data: ${JSON.stringify({ code: 403, message: 'Forbidden' })}\n\n`);
+        stream.end();
+        try { console.warn(JSON.stringify({ type: 'debug.qa.post', status: 'forbidden', postId })); } catch {}
+        return;
       }
 
       const processedContent = preprocessContent(post.content);
       stream.write(`event: exist_in_post_status\ndata: true\n\n`);
       stream.write(`event: context\ndata: ${JSON.stringify([{ postId: post.id, postTitle: post.title }])}\n\n`);
+      try {
+        console.log(
+          JSON.stringify({ type: 'debug.qa.path', mode: 'post', postId: post.id, processedLen: processedContent.length })
+        );
+      } catch {}
 
-      messages = qaPrompts.createPostContextPrompt(post, processedContent, question, speechTonePrompt);
+      messages = toSimpleMessages(
+        qaPrompts.createPostContextPrompt(post, processedContent, question, speechTonePrompt)
+      );
 
     } else {
       const [questionEmbedding] = await createEmbeddings([question]);
@@ -72,8 +104,15 @@ export const answerStream = async (
 
       const context = similarChunks.map(chunk => ({ postId: chunk.postId, postTitle: chunk.postTitle }));
       stream.write(`event: context\ndata: ${JSON.stringify(context)}\n\n`);
-
-      messages = qaPrompts.createRagPrompt(question, similarChunks, speechTonePrompt);
+      try {
+        console.log(
+          JSON.stringify({ type: 'debug.qa.path', mode: 'rag', similarChunks: similarChunks.length, contextPreview: context.slice(0, 3) })
+        );
+      } catch {}
+
+      messages = toSimpleMessages(
+        qaPrompts.createRagPrompt(question, similarChunks, speechTonePrompt)
+      );
       tools = [
           {
             type: "function",
@@ -92,34 +131,47 @@ export const answerStream = async (
       ];
     }
 
-    const responseStream = await openai.chat.completions.create({
-      model: config.CHAT_MODEL,
+    const llmStream = await generate({
+      provider: llm?.provider || 'openai',
+      model: llm?.model || config.CHAT_MODEL,
       messages,
       tools,
-      tool_choice: tools ? 'auto' : undefined,
-      stream: true,
+      options: llm?.options,
+      meta: { userId, categoryId, postId },
+    });
+    try {
+      console.log(
+        JSON.stringify({
+          type: 'debug.qa.call',
+          provider: llm?.provider || 'openai',
+          model: llm?.model || config.CHAT_MODEL,
+          messages: messages.length,
+          tools: (tools || []).length,
+          hasOptions: !!llm?.options,
+        })
+      );
+    } catch {}
+
+    llmStream.on('data', (chunk) => {
+      try {
+        const str = Buffer.isBuffer(chunk) ? chunk.toString('utf8') : String(chunk);
+        console.log(
+          JSON.stringify({
+            type: 'debug.qa.chunk',
+            at: Date.now(),
+            bytes: Buffer.byteLength(str, 'utf8'),
+            preview: str.slice(0, 40).replace(/\n/g, '\\n'),
+          })
+        );
+      } catch {}
+      stream.write(chunk);
+    });
+    llmStream.on('end', () => {
+      stream.end();
+    });
+    llmStream.on('error', (e) => {
+      try { console.error(JSON.stringify({ type: 'debug.qa.llmError', message: (e as any)?.message || 'error' })); } catch {}
     });
-
-    for await (const chunk of responseStream) {
-      const content = chunk.choices[0]?.delta?.content || "";
-      const toolCalls = chunk.choices[0]?.delta?.tool_calls;
-
-      if (toolCalls) {
-        for (const toolCall of toolCalls) {
-          if (toolCall.function?.arguments) {
-            stream.write(`event: answer\ndata: ${JSON.stringify(toolCall.function.arguments)}\n\n`);
-          }
-        }
-      } else if (content) {
-        stream.write(`event: answer\ndata: ${JSON.stringify(content)}\n\n`);
-      }
-
-      if (chunk.choices[0]?.finish_reason) {
-        stream.write(`event: end\ndata: [DONE]\n\n`);
-        stream.end();
-        break;
-      }
-    }
 
   })().catch(err => {
       console.error('Stream process error:', err);
diff --git a/src/services/qa.v2.service.ts b/src/services/qa.v2.service.ts
new file mode 100644
index 0000000..8a8e651
--- /dev/null
+++ b/src/services/qa.v2.service.ts
@@ -0,0 +1,271 @@
+import { PassThrough } from 'stream';
+import { generate } from '../llm';
+import config from '../config';
+import * as qaPrompts from '../prompts/qa.prompts';
+import * as postRepository from '../repositories/post.repository';
+import * as personaRepository from '../repositories/persona.repository';
+import { generateSearchPlan } from './search-plan.service';
+import { runSemanticSearch } from './semantic-search.service';
+import { runHybridSearch } from './hybrid-search.service';
+import { createEmbeddings } from './embedding.service';
+
+const preprocessContent = (content: string): string => {
+  const plainText = content.replace(/<[^>]*>/g, ' ').replace(/\s+/g, ' ').trim();
+  return plainText.length > 40000 ? plainText.substring(0, 40000) : plainText;
+};
+
+const getSpeechTonePrompt = async (speechTone: number, userId: string): Promise<string> => {
+  if (speechTone === -1) return '간결하고 명확한 말투로 답변해';
+  if (speechTone === -2)
+    return '아래의 블로그 본문 컨텍스트를 참고하여 본문의 말투를 파악해 최대한 비슷한 말투로 답변해';
+
+  const persona = await personaRepository.findPersonaById(speechTone, userId);
+  if (persona) return `${persona.name}: ${persona.description}`;
+  return '간결하고 명확한 말투로 답변해';
+};
+
+type LlmOverride = {
+  provider?: 'openai' | 'gemini';
+  model?: string;
+  options?: { temperature?: number; top_p?: number; max_output_tokens?: number };
+};
+
+export const answerStreamV2 = async (
+  question: string,
+  userId: string,
+  categoryId?: number,
+  speechTone: number = -1,
+  postId?: number,
+  llm?: LlmOverride
+): Promise<PassThrough> => {
+  const stream = new PassThrough();
+
+  (async () => {
+    const speechTonePrompt = await getSpeechTonePrompt(speechTone, userId);
+
+    let messages: { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] = [];
+    let tools:
+      | {
+          type: 'function';
+          function: { name: string; description?: string; parameters?: Record<string, unknown> };
+        }[]
+      | undefined = undefined;
+
+    const toSimpleMessages = (
+      raw: any[]
+    ): { role: 'system' | 'user' | 'assistant' | 'tool' | 'function'; content: string }[] => {
+      return (raw || []).map((m: any) => ({
+        role: m.role,
+        content: typeof m.content === 'string' ? m.content : JSON.stringify(m.content),
+      }));
+    };
+
+    if (postId) {
+      // Post-centric path (same as v1 with added v2 pre-events)
+      const post = await postRepository.findPostById(postId);
+      if (!post) {
+        stream.write(`event: error\n`);
+        stream.write(`data: ${JSON.stringify({ code: 404, message: 'Post not found' })}\n\n`);
+        stream.end();
+        return;
+      }
+      if (!post.is_public && post.user_id !== userId) {
+        stream.write(`event: error\n`);
+        stream.write(`data: ${JSON.stringify({ code: 403, message: 'Forbidden' })}\n\n`);
+        stream.end();
+        return;
+      }
+      // Emit plan event for transparency
+      stream.write(`event: search_plan\n`);
+      stream.write(
+        `data: ${JSON.stringify({ mode: 'post', filters: { post_id: postId, user_id: userId } })}\n\n`
+      );
+
+      const processed = preprocessContent(post.content);
+      const ctx = [{ postId: post.id, postTitle: post.title }];
+      stream.write(`event: search_result\n`);
+      stream.write(`data: ${JSON.stringify(ctx)}\n\n`);
+      stream.write(`event: exist_in_post_status\n`);
+      stream.write(`data: true\n\n`);
+      stream.write(`event: context\n`);
+      stream.write(`data: ${JSON.stringify(ctx)}\n\n`);
+
+      messages = toSimpleMessages(
+        qaPrompts.createPostContextPrompt(post, processed, question, speechTonePrompt)
+      );
+    } else {
+      // Plan generation
+      const planPair = await generateSearchPlan(question, { user_id: userId, category_id: categoryId });
+      if (!planPair) {
+        // Fallback to v1 RAG silently
+        const [questionEmbedding] = await createEmbeddings([question]);
+        const similarChunks = await postRepository.findSimilarChunks(userId, questionEmbedding, categoryId);
+        const context = similarChunks.map((c) => ({ postId: c.postId, postTitle: c.postTitle }));
+        stream.write(`event: search_plan\n`);
+        stream.write(`data: ${JSON.stringify({ mode: 'rag', fallback: true })}\n\n`);
+        stream.write(`event: search_result\n`);
+        stream.write(`data: ${JSON.stringify(context)}\n\n`);
+        stream.write(`event: exist_in_post_status\n`);
+        stream.write(`data: ${JSON.stringify(similarChunks.length > 0)}\n\n`);
+        stream.write(`event: context\n`);
+        stream.write(`data: ${JSON.stringify(context)}\n\n`);
+
+        messages = toSimpleMessages(
+          qaPrompts.createRagPrompt(question, similarChunks, speechTonePrompt)
+        );
+        if (similarChunks.length === 0) {
+          tools = undefined;
+        } else {
+          tools = [
+            {
+              type: 'function',
+              function: {
+                name: 'report_content_insufficient',
+                description: '카테고리는 맞지만 본문 컨텍스트가 부족할 때 호출',
+                parameters: {
+                  type: 'object',
+                  properties: {
+                    text: {
+                      type: 'string',
+                      description:
+                        '답변 말투 및 규칙을 지켜 해당 내용이 아직 부족하다는 안내를 합니다. 그 후 본문 컨텍스트를 참고해 질문과 관련된 답변할 수 있는 내용을 언급하고 해당 내용에 대한 질문을 직접적으로 유도합니다.',
+                    },
+                  },
+                  required: ['text'],
+                },
+              },
+            },
+          ];
+        }
+      } else {
+        const plan: any = planPair.normalized;
+        stream.write(`event: search_plan\n`);
+        stream.write(`data: ${JSON.stringify(plan)}\n\n`);
+        // Console debug for emitted search plan
+        try {
+          console.log(
+            JSON.stringify(
+              {
+                type: 'debug.sse.search_plan',
+                userId,
+                categoryId,
+                plan_summary: {
+                  mode: plan.mode,
+                  top_k: plan.top_k,
+                  threshold: plan.threshold,
+                  weights: plan.weights,
+                  sort: plan.sort,
+                  limit: plan.limit,
+                  hybrid: plan.hybrid,
+                  time: plan?.filters?.time || null,
+                  rewrites_len: Array.isArray(plan.rewrites) ? plan.rewrites.length : 0,
+                  keywords_len: Array.isArray(plan.keywords) ? plan.keywords.length : 0,
+                },
+              },
+              null,
+              0,
+            ),
+          );
+        } catch {}
+
+        let rows: { postId: string; postTitle: string; postChunk: string; similarityScore: number }[] = [];
+        if (plan.hybrid?.enabled) {
+          if (Array.isArray(plan.rewrites) && plan.rewrites.length > 0) {
+            stream.write(`event: rewrite\n`);
+            stream.write(`data: ${JSON.stringify(plan.rewrites)}\n\n`);
+          }
+          if (Array.isArray(plan.keywords) && plan.keywords.length > 0) {
+            stream.write(`event: keywords\n`);
+            stream.write(`data: ${JSON.stringify(plan.keywords)}\n\n`);
+          }
+          rows = await runHybridSearch(
+            question,
+            userId,
+            plan
+          );
+          const hybridContext = rows.map((r) => ({ postId: r.postId, postTitle: r.postTitle }));
+          stream.write(`event: hybrid_result\n`);
+          stream.write(`data: ${JSON.stringify(hybridContext)}\n\n`);
+
+          if (!rows.length) {
+            rows = await runSemanticSearch(question, userId, plan);
+          }
+        } else {
+          rows = await runSemanticSearch(question, userId, plan);
+        }
+
+        const context = rows.map((r) => ({ postId: r.postId, postTitle: r.postTitle }));
+        stream.write(`event: search_result\n`);
+        stream.write(`data: ${JSON.stringify(context)}\n\n`);
+        stream.write(`event: exist_in_post_status\n`);
+        stream.write(`data: ${JSON.stringify(rows.length > 0)}\n\n`);
+        stream.write(`event: context\n`);
+        stream.write(`data: ${JSON.stringify(context)}\n\n`);
+
+        messages = toSimpleMessages(
+          qaPrompts.createRagPrompt(
+            question,
+            rows.map((r) => ({
+              postId: r.postId,
+              postTitle: r.postTitle,
+              postChunk: r.postChunk,
+              similarityScore: r.similarityScore,
+            })) as any,
+            speechTonePrompt
+          )
+        );
+        // If no context was found, avoid tool-calls to force direct natural-language guidance
+        if (rows.length === 0) {
+          tools = undefined;
+        } else {
+          tools = [
+            {
+              type: 'function',
+              function: {
+                name: 'report_content_insufficient',
+                description: '카테고리는 맞지만 본문 컨텍스트가 부족할 때 호출',
+                parameters: {
+                  type: 'object',
+                  properties: {
+                    text: {
+                      type: 'string',
+                      description:
+                        '답변 말투 및 규칙을 지켜 해당 내용이 아직 부족하다는 안내를 합니다. 그 후 본문 컨텍스트를 참고해 질문과 관련된 답변할 수 있는 내용을 언급하고 해당 내용에 대한 질문을 직접적으로 유도합니다.',
+                    },
+                  },
+                  required: ['text'],
+                },
+              },
+            },
+          ];
+        }
+      }
+    }
+
+    const llmStream = await generate({
+      provider: llm?.provider || 'openai',
+      model: llm?.model || config.CHAT_MODEL,
+      messages,
+      tools,
+      options: llm?.options,
+      meta: { userId, categoryId, postId },
+    });
+
+    llmStream.on('data', (chunk) => stream.write(chunk));
+    llmStream.on('end', () => stream.end());
+    llmStream.on('error', () => {
+      stream.write(`event: error\n`);
+      stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+      stream.end();
+    });
+  })().catch((err) => {
+    try {
+      console.error('v2 Stream process error:', err);
+    } catch {}
+    stream.write(`event: error\n`);
+    stream.write(`data: ${JSON.stringify({ message: 'Internal server error' })}\n\n`);
+    stream.end();
+  });
+
+  return stream;
+};
diff --git a/src/services/search-plan.service.ts b/src/services/search-plan.service.ts
new file mode 100644
index 0000000..507a1c4
--- /dev/null
+++ b/src/services/search-plan.service.ts
@@ -0,0 +1,376 @@
+import OpenAI from 'openai';
+import config from '../config';
+import { buildSearchPlanPrompt, getSearchPlanSchemaJson } from '../prompts/qa.v2.prompts';
+import { planSchema, type SearchPlan } from '../types/ai.v2.types';
+import { nowUtc, toAbsoluteRangeKst } from '../utils/time';
+
+const openai = new OpenAI({ apiKey: config.OPENAI_API_KEY });
+
+export type PlanContext = {
+  user_id: string;
+  category_id?: number;
+  post_id?: number;
+  timezone?: string; // default Asia/Seoul
+};
+
+export const generateSearchPlan = async (
+  question: string,
+  ctx: PlanContext
+): Promise<{ plan: SearchPlan; normalized: SearchPlan } | null> => {
+  const timezone = ctx.timezone || 'Asia/Seoul';
+  const now = nowUtc();
+  const nowUtcIso = now.toISOString();
+  const nowKstIso = new Date(now.getTime() + 9 * 3600 * 1000).toISOString();
+
+  const prompt = buildSearchPlanPrompt({
+    now_utc: nowUtcIso,
+    now_kst: nowKstIso,
+    timezone,
+    user_id: ctx.user_id,
+    category_id: ctx.category_id,
+    post_id: ctx.post_id,
+    question,
+  });
+
+
+  try {
+    // Debug prompt before request
+    try {
+      console.log(
+        JSON.stringify(
+          {
+            type: 'debug.plan.prompt',
+            model: 'gpt-5-mini',
+            prompt_len: prompt.length,
+            head: prompt.slice(0, 600),
+            tail: prompt.slice(Math.max(0, prompt.length - 600)),
+          },
+          null,
+          0,
+        ),
+      );
+    } catch {}
+
+    const response: any = await (openai as any).responses.create({
+      model: 'gpt-5-mini',
+      input: prompt,
+      text: { format: { type: 'json_schema', name: 'SearchPlan', schema: getSearchPlanSchemaJson()} },
+      reasoning: { effort: 'minimal' },
+      max_output_tokens: 1500,
+    });
+
+    // Debug peek: log response shapes before JSON extraction
+    try {
+      const outputs = (response as any)?.output || [];
+      const outputSummary = outputs.map((o: any) => ({
+        role: o?.role,
+        content: (o?.content || []).map((c: any) => ({
+          type: c?.type,
+          hasText: typeof c?.text === 'string',
+          textLen: typeof c?.text === 'string' ? (c.text as string).length : undefined,
+          hasJson: !!c?.json,
+        })),
+      }));
+      const outputText = (response as any)?.output_text;
+      console.log(
+        JSON.stringify(
+          { type: 'debug.plan.response.peek', has_output_text: !!outputText, output_text_len: typeof outputText === 'string' ? outputText.length : undefined, output_summary: outputSummary },
+          null,
+          0,
+        ),
+      );
+    } catch {}
+
+    // Extract structured JSON if available, otherwise parse text
+    let parsed: any = null;
+    try {
+      const outputs = (response as any)?.output || [];
+      for (const o of outputs) {
+        for (const c of (o?.content || [])) {
+          if (c && (c.type === 'json' || c.type === 'output_json') && c.json) {
+            parsed = c.json;
+            break;
+          }
+        }
+        if (parsed) break;
+      }
+    } catch {}
+    // Also check output_text for JSON string if using Responses API response_format
+    if (!parsed && typeof (response as any)?.output_text === 'string') {
+      const s = ((response as any).output_text as string).trim();
+      if (s.startsWith('{')) {
+        try { parsed = JSON.parse(s); } catch {}
+      }
+    }
+    if (!parsed) {
+      const texts: string[] = [];
+      const ot = (response as any)?.output_text;
+      if (typeof ot === 'string') texts.push(ot);
+      try {
+        const outputs = (response as any)?.output || [];
+        for (const o of outputs) {
+          for (const c of (o?.content || [])) {
+            const t = typeof c?.text === 'string' ? c.text : undefined;
+            if (t) texts.push(t);
+          }
+        }
+      } catch {}
+      const raw = texts.join('').trim();
+      // Debug: log raw text before JSON.parse
+      try {
+        console.log(
+          JSON.stringify(
+            { type: 'debug.plan.raw_text', len: raw.length, head: raw.slice(0, 200) },
+            null,
+            0,
+          ),
+        );
+      } catch {}
+      if (!raw) {
+        // Graceful fallback: unable to parse structured output
+        return null;
+      }
+
+      // Robust extraction of first balanced JSON object
+      const tryParse = (s: string): any | null => {
+        try { return JSON.parse(s); } catch { return null; }
+      };
+      let candidate = tryParse(raw);
+      if (!candidate) {
+        const start = raw.indexOf('{');
+        if (start >= 0) {
+          let depth = 0;
+          let inStr = false;
+          let esc = false;
+          for (let i = start; i < raw.length; i++) {
+            const ch = raw[i];
+            if (inStr) {
+              if (esc) esc = false;
+              else if (ch === '\\') esc = true;
+              else if (ch === '"') inStr = false;
+            } else {
+              if (ch === '"') inStr = true;
+              else if (ch === '{') depth++;
+              else if (ch === '}') {
+                depth--;
+                if (depth === 0) {
+                  const sub = raw.slice(start, i + 1);
+                  candidate = tryParse(sub);
+                  if (candidate) break;
+                }
+              }
+            }
+          }
+          if (!candidate) {
+            const last = raw.lastIndexOf('}');
+            if (last > start) candidate = tryParse(raw.slice(start, last + 1));
+          }
+        }
+      }
+      if (!candidate) {
+        try {
+          console.warn(JSON.stringify({ type: 'debug.plan.parse_fail', note: 'could not extract JSON from raw' }));
+        } catch {}
+        return null;
+      }
+      parsed = candidate;
+    }
+
+    // If still no parsed plan at this point, try a fallback call without text.format
+    if (!parsed) {
+      try {
+        const response2: any = await (openai as any).responses.create({
+          model: config.CHAT_MODEL || 'gpt-5-mini',
+          input: prompt,
+          max_output_tokens: 700,
+        });
+        // Debug peek for fallback
+        try {
+          const outputs = (response2 as any)?.output || [];
+          const outputSummary = outputs.map((o: any) => ({
+            role: o?.role,
+            content: (o?.content || []).map((c: any) => ({ type: c?.type, hasText: typeof c?.text === 'string', textLen: typeof c?.text === 'string' ? (c.text as string).length : undefined }))
+          }));
+          const outputText = (response2 as any)?.output_text;
+          console.log(JSON.stringify({ type: 'debug.plan.fallback.peek', has_output_text: !!outputText, output_text_len: typeof outputText === 'string' ? outputText.length : undefined, output_summary: outputSummary }));
+        } catch {}
+
+        // Parse fallback response
+        let parsed2: any = null;
+        try {
+          const outputs = (response2 as any)?.output || [];
+          for (const o of outputs) {
+            for (const c of (o?.content || [])) {
+              const t = typeof c?.text === 'string' ? c.text : undefined;
+              if (t) {
+                const s = t.trim();
+                if (s.startsWith('{')) { try { parsed2 = JSON.parse(s); } catch {} }
+                if (parsed2) break;
+              }
+            }
+            if (parsed2) break;
+          }
+        } catch {}
+        if (!parsed2) {
+          const ot = (response2 as any)?.output_text;
+          if (typeof ot === 'string') {
+            const s = ot.trim();
+            try { parsed2 = JSON.parse(s); } catch {}
+          }
+        }
+        if (!parsed2) {
+          console.warn(JSON.stringify({ type: 'debug.plan.fallback.parse_fail' }));
+          // proceed to chat completions fallback
+        }
+        if (parsed2) parsed = parsed2;
+      } catch {
+        // continue to chat completions fallback
+      }
+    }
+
+    // Final fallback: Chat Completions with JSON object mode
+    if (!parsed) {
+      try {
+        const sys = 'You output ONLY a single JSON object matching the SearchPlan shape. No extra text.';
+        const userMsg = prompt;
+        const cc: any = await (openai as any).chat.completions.create({
+          model: config.CHAT_MODEL || 'gpt-5-mini',
+          messages: [
+            { role: 'system', content: sys },
+            { role: 'user', content: userMsg },
+          ],
+          response_format: { type: 'json_object' },
+          max_tokens: 700,
+        });
+        // Debug
+        try {
+          console.log(JSON.stringify({ type: 'debug.plan.cc.peek', choices: (cc as any)?.choices?.length || 0 }));
+        } catch {}
+        const content = (cc as any)?.choices?.[0]?.message?.content || '';
+        if (typeof content === 'string' && content.trim().startsWith('{')) {
+          parsed = JSON.parse(content);
+        }
+      } catch (e) {
+        try { console.warn(JSON.stringify({ type: 'debug.plan.cc.error', message: (e as any)?.message || 'error' })); } catch {}
+        return null;
+      }
+    }
+    const plan = planSchema.parse(parsed);
+
+    // Normalize weights sum to 1
+    const sum = (plan.weights?.chunk ?? 0) + (plan.weights?.title ?? 0);
+    const weights = sum > 0 ? { chunk: plan.weights.chunk / sum, title: plan.weights.title / sum } : { chunk: 0.7, title: 0.3 };
+
+    // Normalize time range to absolute if provided
+    let normPlan: SearchPlan = { ...plan, weights };
+
+    const clamp = (n: number, lo: number, hi: number) => Math.min(hi, Math.max(lo, n));
+
+    const stopwords = new Set([
+      '글',
+      '포스트',
+      '블로그',
+      '소개',
+      '정리',
+      '내용',
+      '최신',
+      '최근',
+      '정보',
+    ]);
+
+    const cleanList = (arr: string[] | undefined, max: number) => {
+      const uniq = new Set<string>();
+      for (const s of arr || []) {
+        const t = String(s || '').trim();
+        if (!t) continue;
+        if (t.length < 2) continue;
+        if (stopwords.has(t)) continue;
+        const key = t.toLowerCase();
+        if (uniq.has(key)) continue;
+        uniq.add(key);
+      }
+      return Array.from(uniq).slice(0, max);
+    };
+    if ((plan as any)?.filters?.time) {
+      const abs = toAbsoluteRangeKst(plan.filters?.time as any, now);
+      if (abs) {
+        normPlan = {
+          ...normPlan,
+          filters: { ...normPlan.filters, time: { type: 'absolute', from: abs.from, to: abs.to } as any },
+        };
+      } else {
+        // drop invalid time
+        const { time, ...rest } = normPlan.filters || ({} as any);
+        normPlan = { ...normPlan, filters: rest as any };
+      }
+    }
+
+    // Enforce bounds just in case
+    normPlan.top_k = Math.min(10, Math.max(1, normPlan.top_k || 5));
+    normPlan.limit = Math.min(20, Math.max(1, normPlan.limit || 5));
+    normPlan.threshold = Math.min(1, Math.max(0, normPlan.threshold ?? 0.2));
+    const maxRewrites = clamp(plan.hybrid?.max_rewrites ?? 3, 0, 4);
+    const maxKeywords = clamp(plan.hybrid?.max_keywords ?? 6, 0, 8);
+
+    // Map retrieval_bias -> alpha (fallback to provided alpha or default)
+    const bias = (plan.hybrid as any)?.retrieval_bias || 'balanced';
+    const biasAlpha = bias === 'lexical' ? 0.3 : bias === 'semantic' ? 0.75 : 0.5;
+    const alpha = clamp(((plan.hybrid as any)?.alpha ?? biasAlpha) as number, 0, 1);
+
+    normPlan.hybrid = {
+      enabled: !!plan.hybrid?.enabled,
+      retrieval_bias: bias,
+      alpha,
+      max_rewrites: maxRewrites,
+      max_keywords: maxKeywords,
+    } as any;
+    normPlan.rewrites = cleanList(plan.rewrites, maxRewrites) as any;
+    normPlan.keywords = cleanList(plan.keywords, maxKeywords) as any;
+    if (!normPlan.mode) normPlan.mode = (ctx.post_id ? 'post' : 'rag') as any;
+
+    // Note: Only filters.time is kept here to satisfy the SearchPlan schema.
+    //       user_id/category_ids/post_id will be injected later by the query layer.
+
+    // Console debug: final parsed + normalized plan
+    try {
+      const timeInfo = (normPlan as any)?.filters?.time;
+      console.log(
+        JSON.stringify(
+          {
+            type: 'debug.plan.final',
+            ctx: { user_id: ctx.user_id, category_id: ctx.category_id, post_id: ctx.post_id },
+            summary: {
+              mode: normPlan.mode,
+              top_k: normPlan.top_k,
+              threshold: normPlan.threshold,
+              weights: normPlan.weights,
+              sort: normPlan.sort,
+              limit: normPlan.limit,
+              hybrid: {
+                enabled: !!normPlan.hybrid?.enabled,
+                retrieval_bias: normPlan.hybrid?.retrieval_bias,
+                alpha: normPlan.hybrid?.alpha,
+                max_rewrites: normPlan.hybrid?.max_rewrites,
+                max_keywords: normPlan.hybrid?.max_keywords,
+              },
+              time: timeInfo ? { type: timeInfo.type, from: timeInfo.from, to: timeInfo.to } : null,
+              rewrites_len: (normPlan.rewrites || []).length,
+              keywords_len: (normPlan.keywords || []).length,
+            },
+            plan,
+            normalized: normPlan,
+          },
+          null,
+          0,
+        ),
+      );
+    } catch {}
+
+    return { plan, normalized: normPlan };
+  } catch (e) {
+    try {
+      console.error(JSON.stringify({ type: 'debug.plan.error', message: (e as any)?.message || 'error' }));
+    } catch {}
+    return null;
+  }
+};
diff --git a/src/services/semantic-search.service.ts b/src/services/semantic-search.service.ts
new file mode 100644
index 0000000..05561ac
--- /dev/null
+++ b/src/services/semantic-search.service.ts
@@ -0,0 +1,36 @@
+import { createEmbeddings } from './embedding.service';
+import * as postRepository from '../repositories/post.repository';
+import { SearchPlan } from '../types/ai.v2.types';
+
+export type SemanticSearchResult = {
+  postId: string;
+  postTitle: string;
+  postChunk: string;
+  similarityScore: number;
+}[];
+
+export const runSemanticSearch = async (
+  question: string,
+  userId: string,
+  plan: SearchPlan
+): Promise<SemanticSearchResult> => {
+  const [embedding] = await createEmbeddings([question]);
+
+  const from = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.from : undefined;
+  const to = (plan.filters as any)?.time?.type === 'absolute' ? (plan.filters as any).time.to : undefined;
+
+  const rows = await postRepository.findSimilarChunksV2({
+    userId,
+    embedding,
+    categoryId: (plan.filters as any)?.category_ids?.[0],
+    from,
+    to,
+    threshold: plan.threshold,
+    topK: plan.top_k,
+    weights: plan.weights,
+    sort: plan.sort,
+  });
+
+  return rows;
+};
+
diff --git a/src/types/ai.types.ts b/src/types/ai.types.ts
index e596d1a..b9ed539 100644
--- a/src/types/ai.types.ts
+++ b/src/types/ai.types.ts
@@ -28,6 +28,19 @@ export const askSchema = z.object({
     category_id: z.number().optional(),
     post_id: z.number().optional(),
     speech_tone: z.number().optional(),
+    llm: z
+      .object({
+        provider: z.enum(['openai', 'gemini']).optional(),
+        model: z.string().optional(),
+        options: z
+          .object({
+            temperature: z.number().optional(),
+            top_p: z.number().optional(),
+            max_output_tokens: z.number().optional(),
+          })
+          .optional(),
+      })
+      .optional(),
   }),
 });
 
diff --git a/src/types/ai.v2.types.ts b/src/types/ai.v2.types.ts
new file mode 100644
index 0000000..6e3353b
--- /dev/null
+++ b/src/types/ai.v2.types.ts
@@ -0,0 +1,106 @@
+import { z } from 'zod';
+
+// ===== Plan JSON Schema (Zod) =====
+
+// Time filter schema: support multiple shapes to reduce LLM fragility
+export const timeFilterSchema = z.discriminatedUnion('type', [
+  // Absolute ISO range
+  z
+    .object({ type: z.literal('absolute'), from: z.string(), to: z.string() })
+    .strict(),
+  // Relative window: N units up to today (KST)
+  z
+    .object({
+      type: z.literal('relative'),
+      unit: z.enum(['day', 'week', 'month', 'year']),
+      value: z.number().int().min(1).max(365),
+    })
+    .strict(),
+  // Month of a year (default year=now)
+  z
+    .object({ type: z.literal('month'), year: z.number().int().optional(), month: z.number().int().min(1).max(12) })
+    .strict(),
+  // Quarter of a year (default year=now)
+  z
+    .object({ type: z.literal('quarter'), year: z.number().int().optional(), quarter: z.number().int().min(1).max(4) })
+    .strict(),
+  // Single year
+  z.object({ type: z.literal('year'), year: z.number().int() }).strict(),
+  // Named presets (limited set)
+  z
+    .object({
+      type: z.literal('named'),
+      preset: z.enum([
+        'all_time',
+        'all',
+        'today',
+        'yesterday',
+        'last_7_days',
+        'last_14_days',
+        'last_30_days',
+        'this_month',
+        'last_month',
+      ]),
+    })
+    .strict(),
+  // Free-form label, e.g., "2006_to_now", "2024-Q3", "2019-2022", "2024-09"
+  z.object({ type: z.literal('label'), label: z.string().min(1) }).strict(),
+]);
+
+export const planSchema = z.object({
+  mode: z.enum(['rag', 'post']).default('rag'),
+  top_k: z.number().int().min(1).max(10).default(5),
+  threshold: z.number().min(0).max(1).default(0.2),
+  weights: z
+    .object({ chunk: z.number().min(0).max(1), title: z.number().min(0).max(1) })
+    .default({ chunk: 0.7, title: 0.3 }),
+  rewrites: z.array(z.string()).default([]),
+  keywords: z.array(z.string()).default([]),
+  hybrid: z
+    .object({
+      enabled: z.boolean().default(false),
+      // LLM outputs retrieval_bias label; server maps to alpha
+      retrieval_bias: z.enum(['lexical', 'balanced', 'semantic']).default('balanced'),
+      alpha: z.number().min(0).max(1).optional(),
+      max_rewrites: z.number().int().min(0).max(4).default(3),
+      max_keywords: z.number().int().min(0).max(8).default(6),
+    })
+    .default({ enabled: false, retrieval_bias: 'balanced', max_rewrites: 3, max_keywords: 6 }),
+  filters: z
+    .object({
+      time: timeFilterSchema.optional(),
+    })
+    .strict()
+    .optional(),
+  sort: z.enum(['created_at_desc', 'created_at_asc']).default('created_at_desc'),
+  limit: z.number().int().min(1).max(20).default(5),
+});
+
+export type SearchPlan = z.infer<typeof planSchema>;
+
+// ===== API: /ai/v2/ask =====
+
+export const askV2Schema = z.object({
+  body: z.object({
+    question: z.string(),
+    user_id: z.string(),
+    category_id: z.number().optional(),
+    post_id: z.number().optional(),
+    speech_tone: z.number().optional(),
+    llm: z
+      .object({
+        provider: z.enum(['openai', 'gemini']).optional(),
+        model: z.string().optional(),
+        options: z
+          .object({
+            temperature: z.number().optional(),
+            top_p: z.number().optional(),
+            max_output_tokens: z.number().optional(),
+          })
+          .optional(),
+      })
+      .optional(),
+  }),
+});
+
+export type AskV2Request = z.infer<typeof askV2Schema>['body'];
diff --git a/src/utils/cost.ts b/src/utils/cost.ts
new file mode 100644
index 0000000..62334db
--- /dev/null
+++ b/src/utils/cost.ts
@@ -0,0 +1,40 @@
+export type Pricing = {
+  input_per_1k: number;
+  output_per_1k: number;
+  cached_input_per_1k?: number;
+  currency: 'USD' | 'KRW' | string;
+};
+
+const PRICING_TABLE: Record<string, Pricing> = {
+  // OpenAI
+  'gpt-5-mini': { input_per_1k: 0.00025, output_per_1k: 0.002, cached_input_per_1k: 0.000025, currency: 'USD' },
+  'gpt-5-nano': { input_per_1k: 0.00005, output_per_1k: 0.0004, cached_input_per_1k: 0.000005, currency: 'USD' },
+  'gpt-4o': { input_per_1k: 0.005, output_per_1k: 0.015, currency: 'USD' },
+  'gpt-4o-mini': { input_per_1k: 0.0005, output_per_1k: 0.0015, currency: 'USD' },
+  // Embeddings
+  'text-embedding-3-small': { input_per_1k: 0.00002, output_per_1k: 0, currency: 'USD' },
+  // Gemini (example values — update per official pricing if needed)
+  'gemini-2.5-flash': { input_per_1k: 0.0001, output_per_1k: 0.0004, currency: 'USD' },
+};
+
+export const getModelPricing = (model: string): Pricing | null => {
+  if (!model) return null;
+  const key = model.toLowerCase();
+  if (PRICING_TABLE[key]) return PRICING_TABLE[key];
+  // naive aliasing for common variants
+  if (key.startsWith('gpt-4o')) return PRICING_TABLE['gpt-4o'];
+  if (key.startsWith('gpt-5-mini')) return PRICING_TABLE['gpt-5-mini'];
+  return null;
+};
+
+export const calcCost = (tokens: number, per_1k: number): number => {
+  if (!per_1k) return 0;
+  return (tokens / 1000) * per_1k;
+};
+
+export const formatCost = (amount: number, currency: string, round: number = 4): string => {
+  const factor = Math.pow(10, Math.max(0, round));
+  const rounded = Math.round(amount * factor) / factor;
+  return `${rounded} ${currency}`;
+};
+
diff --git a/src/utils/time.ts b/src/utils/time.ts
new file mode 100644
index 0000000..b950068
--- /dev/null
+++ b/src/utils/time.ts
@@ -0,0 +1,211 @@
+// Minimal KST time utilities and range normalization
+
+const KST_OFFSET_MINUTES = 9 * 60; // UTC+9
+
+const toDate = (isoOrDate: string | Date): Date => (isoOrDate instanceof Date ? isoOrDate : new Date(isoOrDate));
+
+export const nowUtc = (): Date => new Date();
+
+export const toKst = (d: Date): Date => {
+  // Convert UTC date to KST by adding offset
+  return new Date(d.getTime() + KST_OFFSET_MINUTES * 60 * 1000);
+};
+
+export const fromKstToUtc = (d: Date): Date => {
+  return new Date(d.getTime() - KST_OFFSET_MINUTES * 60 * 1000);
+};
+
+const startOfDay = (d: Date): Date => new Date(d.getFullYear(), d.getMonth(), d.getDate(), 0, 0, 0, 0);
+const endOfDay = (d: Date): Date => new Date(d.getFullYear(), d.getMonth(), d.getDate(), 23, 59, 59, 999);
+
+export const startOfMonth = (year: number, monthIndex0: number): Date => new Date(year, monthIndex0, 1, 0, 0, 0, 0);
+export const endOfMonth = (year: number, monthIndex0: number): Date => new Date(year, monthIndex0 + 1, 0, 23, 59, 59, 999);
+
+export const startOfQuarter = (year: number, quarter: number): Date => {
+  const m0 = (quarter - 1) * 3; // 0-based month index
+  return new Date(year, m0, 1, 0, 0, 0, 0);
+};
+export const endOfQuarter = (year: number, quarter: number): Date => {
+  const m0 = quarter * 3 - 1; // end month index
+  return new Date(year, m0 + 1, 0, 23, 59, 59, 999);
+};
+
+export type AbsoluteRange = { from: string; to: string };
+
+export const toAbsoluteRangeKst = (input: { type: string; [k: string]: any }, base: Date = nowUtc()): AbsoluteRange | null => {
+  try {
+    const baseKst = toKst(base);
+    const year = baseKst.getFullYear();
+    // Named presets
+    if (input.type === 'named') {
+      const p = String(input.preset || '').toLowerCase();
+      const endK = endOfDay(baseKst);
+      const beginOfTodayK = startOfDay(baseKst);
+      const endUtc = fromKstToUtc(endK).toISOString();
+      const todayStartUtc = fromKstToUtc(beginOfTodayK).toISOString();
+      if (p === 'all' || p === 'all_time') return null; // no time filter
+      if (p === 'today') return { from: todayStartUtc, to: endUtc };
+      if (p === 'yesterday') {
+        const yK = new Date(beginOfTodayK.getTime());
+        yK.setDate(yK.getDate() - 1);
+        return { from: fromKstToUtc(startOfDay(yK)).toISOString(), to: fromKstToUtc(endOfDay(yK)).toISOString() };
+      }
+      const daysBack = (n: number) => {
+        const toK = endK;
+        const fromK = new Date(toK.getTime());
+        fromK.setDate(fromK.getDate() - (n - 1));
+        return { from: fromKstToUtc(startOfDay(fromK)).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      };
+      if (p === 'last_7_days') return daysBack(7);
+      if (p === 'last_14_days') return daysBack(14);
+      if (p === 'last_30_days') return daysBack(30);
+      if (p === 'this_month') {
+        const fromK = startOfMonth(year, baseKst.getMonth());
+        const toK = endOfMonth(year, baseKst.getMonth());
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      if (p === 'last_month') {
+        const m = baseKst.getMonth();
+        const yAdj = m === 0 ? year - 1 : year;
+        const mAdj = m === 0 ? 11 : m - 1;
+        const fromK = startOfMonth(yAdj, mAdj);
+        const toK = endOfMonth(yAdj, mAdj);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      return null; // unknown named: drop filter
+    }
+    if (input.type === 'relative') {
+      const unit = String(input.unit);
+      const value = Math.max(1, parseInt(String(input.value || '1'), 10));
+      const toK = endOfDay(baseKst);
+      const fromK = new Date(toK.getTime());
+      if (unit === 'day') fromK.setDate(fromK.getDate() - value + 1);
+      else if (unit === 'week') fromK.setDate(fromK.getDate() - value * 7 + 1);
+      else if (unit === 'month') fromK.setMonth(fromK.getMonth() - value);
+      else if (unit === 'year') fromK.setFullYear(fromK.getFullYear() - value);
+      const fromUtc = fromKstToUtc(startOfDay(fromK));
+      const toUtc = fromKstToUtc(toK);
+      return { from: fromUtc.toISOString(), to: toUtc.toISOString() };
+    }
+    if (input.type === 'absolute') {
+      const from = new Date(input.from);
+      const to = new Date(input.to);
+      return { from: from.toISOString(), to: to.toISOString() };
+    }
+    if (input.type === 'month') {
+      const m = Math.max(1, Math.min(12, parseInt(String(input.month), 10)));
+      const y = input.year ? parseInt(String(input.year), 10) : year;
+      const fromK = startOfMonth(y, m - 1);
+      const toK = endOfMonth(y, m - 1);
+      return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+    }
+    if (input.type === 'year') {
+      const y = parseInt(String(input.year), 10);
+      const fromK = startOfMonth(y, 0);
+      const toK = endOfMonth(y, 11);
+      return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+    }
+    if (input.type === 'quarter') {
+      const q = Math.max(1, Math.min(4, parseInt(String(input.quarter), 10)));
+      const y = input.year ? parseInt(String(input.year), 10) : year;
+      const fromK = startOfQuarter(y, q);
+      const toK = endOfQuarter(y, q);
+      return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+    }
+    if (input.type === 'label') {
+      const raw = String(input.label || '').trim();
+      if (!raw) return null;
+      const s = raw.replace(/\s+/g, '').toLowerCase();
+      const endK = endOfDay(baseKst);
+      // Support common named tokens expressed as labels
+      const startTodayK = startOfDay(baseKst);
+      const toUtcStr = fromKstToUtc(endK).toISOString();
+      const fromTodayUtcStr = fromKstToUtc(startTodayK).toISOString();
+      const daysBack = (n: number) => {
+        const toK = endK;
+        const fromK = new Date(toK.getTime());
+        fromK.setDate(fromK.getDate() - (n - 1));
+        return { from: fromKstToUtc(startOfDay(fromK)).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      };
+      if (s === 'all' || s === 'all_time') return null; // drop filter
+      if (s === 'today') return { from: fromTodayUtcStr, to: toUtcStr };
+      if (s === 'yesterday') {
+        const yK = new Date(startTodayK.getTime());
+        yK.setDate(yK.getDate() - 1);
+        return { from: fromKstToUtc(startOfDay(yK)).toISOString(), to: fromKstToUtc(endOfDay(yK)).toISOString() };
+      }
+      if (s === 'last_7_days') return daysBack(7);
+      if (s === 'last_14_days') return daysBack(14);
+      if (s === 'last_30_days') return daysBack(30);
+      if (s === 'this_month') {
+        const fromK = startOfMonth(year, baseKst.getMonth());
+        const toK = endOfMonth(year, baseKst.getMonth());
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      if (s === 'last_month') {
+        const m = baseKst.getMonth();
+        const yAdj = m === 0 ? year - 1 : year;
+        const mAdj = m === 0 ? 11 : m - 1;
+        const fromK = startOfMonth(yAdj, mAdj);
+        const toK = endOfMonth(yAdj, mAdj);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      // 1) YYYY_to_now / YYYY-to-now
+      let m = s.match(/^(\d{4})(?:_|-|to)+now$/);
+      if (m) {
+        const y = parseInt(m[1], 10);
+        const fromK = startOfMonth(y, 0);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(endK).toISOString() };
+      }
+      // 2) YYYY-YYYY / YYYY..YYYY / YYYY_to_YYYY
+      m = s.match(/^(\d{4})(?:\.|_|-|to){1,2}(\d{4})$/);
+      if (m) {
+        const y1 = parseInt(m[1], 10);
+        const y2 = parseInt(m[2], 10);
+        const a = Math.min(y1, y2);
+        const b = Math.max(y1, y2);
+        const fromK = startOfMonth(a, 0);
+        const toK = endOfMonth(b, 11);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      // 3) YYYY-Qn or Qn-YYYY
+      m = s.match(/^(\d{4})(?:-|_)q([1-4])$/);
+      if (m) {
+        const y = parseInt(m[1], 10);
+        const q = parseInt(m[2], 10);
+        const fromK = startOfQuarter(y, q);
+        const toK = endOfQuarter(y, q);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      m = s.match(/^q([1-4])(?:-|_)?(\d{4})$/);
+      if (m) {
+        const q = parseInt(m[1], 10);
+        const y = parseInt(m[2], 10);
+        const fromK = startOfQuarter(y, q);
+        const toK = endOfQuarter(y, q);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      // 4) YYYY-MM
+      m = s.match(/^(\d{4})(?:-|_)?(\d{1,2})$/);
+      if (m) {
+        const y = parseInt(m[1], 10);
+        const month = Math.max(1, Math.min(12, parseInt(m[2], 10)));
+        const fromK = startOfMonth(y, month - 1);
+        const toK = endOfMonth(y, month - 1);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      // 5) YYYY
+      m = s.match(/^(\d{4})$/);
+      if (m) {
+        const y = parseInt(m[1], 10);
+        const fromK = startOfMonth(y, 0);
+        const toK = endOfMonth(y, 11);
+        return { from: fromKstToUtc(fromK).toISOString(), to: fromKstToUtc(toK).toISOString() };
+      }
+      return null; // unrecognized label
+    }
+  } catch {
+    // ignore
+  }
+  return null;
+};
diff --git a/src/utils/tokenizer.ts b/src/utils/tokenizer.ts
new file mode 100644
index 0000000..bc662f4
--- /dev/null
+++ b/src/utils/tokenizer.ts
@@ -0,0 +1,32 @@
+import { get_encoding, type TiktokenEncoding } from '@dqbd/tiktoken';
+
+const encodingForModel = (model?: string): TiktokenEncoding => {
+  const lower = (model ?? '').toLowerCase();
+  if (lower.includes('gpt-5') || lower.includes('gpt-4o') || lower.includes('o1') || lower.includes('o3')) {
+    return 'o200k_base' as TiktokenEncoding;
+  }
+  return 'cl100k_base' as TiktokenEncoding;
+};
+
+export const countTextTokens = (text: string, model: string): number => {
+  const encKey = encodingForModel(model);
+  const enc = get_encoding(encKey);
+  try {
+    const tokens = enc.encode(text || '');
+    return tokens.length;
+  } finally {
+    // no explicit free in @dqbd/tiktoken browser build; safe to let GC handle
+  }
+};
+
+type SimpleMessage = { role: string; content: string };
+
+export const countChatMessagesTokens = (messages: SimpleMessage[], model: string): number => {
+  // Approximate: sum content token counts + minimal role overhead
+  const overheadPerMsg = 3; // rough
+  const roleOverhead = 1;
+  return messages.reduce((sum, m) => {
+    return sum + countTextTokens(m.content || '', model) + overheadPerMsg + roleOverhead;
+  }, 0);
+};
+