Skip to content

leenamho2000/VeriAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

VeriAI ๐Ÿง ๐ŸŒฟ

๋ฌธ์„œ ์‹ ๋ขฐ๋„ & ESGยท๊ทธ๋ฆฐ์›Œ์‹ฑ ์œ„ํ—˜๋„ ๋ถ„์„ ๋„๊ตฌ

ํ™˜๊ฒฝ ๊ด‘๊ณ /ESG ๋ฌธ์„œ/์ผ๋ฐ˜ ๋ณด๊ณ ์„œ๋ฅผ ๋ฌธ์žฅ ๋‹จ์œ„๋กœ ๋ถ„์„ํ•ด ๊ทผ๊ฑฐ ๋ถ€์กฑยท๋ชจํ˜ธํ•œ ํ‘œํ˜„ยท๋ฒ”์œ„ ๊ณผ์žฅ ๋“ฑ์„ ์ ์ˆ˜ํ™”ํ•˜๊ณ , ์ƒ์œ„ ์œ„ํ—˜ ๋ฌธ์žฅ๋งŒ LLM์œผ๋กœ ์‹ฌ์ธต ๋ถ„์„ํ•ด ๋ฆฌํฌํŠธ๊นŒ์ง€ ๋งŒ๋“ค์–ด ์ฃผ๋Š” ๋„๊ตฌ์ž…๋‹ˆ๋‹ค.


๐Ÿ”Ž ํ”„๋กœ์ ํŠธ ๊ฐœ์š” (Overview)

VeriAI๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์›Œํฌํ”Œ๋กœ์šฐ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.

  1. ์‚ฌ์šฉ์ž๊ฐ€ ํ…์ŠคํŠธ๋ฅผ ๋ถ™์—ฌ๋„ฃ๊ฑฐ๋‚˜, URL์„ ์ž…๋ ฅํ•˜๋ฉด ๋ณธ๋ฌธ์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  2. ๋ฏธ๋ฆฌ ์ •์˜๋œ ๊ทœ์น™(config/ad_rules.json, config/report_rules.json)์— ๋”ฐ๋ผ
    ๊ฐ ๋ฌธ์žฅ์˜ ์ฆ๊ฑฐ์„ฑ, ๋ชจํ˜ธ์„ฑ, ๋ฒ”์œ„, ์‹œ์ , ์–ธ์–ด์  ์œ„ํ—˜, ์˜คํ”„์…‹ ์˜์กด๋„ ๋“ฑ์„ ์ •๋Ÿ‰ํ™”ํ•ฉ๋‹ˆ๋‹ค.
  3. ๋ฌธ์žฅ๋ณ„๋กœ 0โ€“100 ์‚ฌ์ด์˜ ์œ„ํ—˜๋„ ์ ์ˆ˜ risk์™€ ๋“ฑ๊ธ‰ High/Medium/Low๋ฅผ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค.
  4. ์œ„ํ—˜๋„๊ฐ€ ๋†’์€ ์ƒ์œ„ K๊ฐœ ๋ฌธ์žฅ์„ ๊ณจ๋ผ OpenAI LLM์— ๋ณด๋‚ด,
    • ํ™˜๊ฒฝ ๊ด‘๊ณ  ๋ชจ๋“œ: ์™œ ๊ทธ๋ฆฐ์›Œ์‹ฑ ์œ„ํ—˜์ด ์žˆ๋Š”์ง€, ์–ด๋–ค ๊ทผ๊ฑฐ๊ฐ€ ์ถ”๊ฐ€๋˜์–ด์•ผ ํ•˜๋Š”์ง€
    • ์ผ๋ฐ˜ ๋ณด๊ณ ์„œ ๋ชจ๋“œ: ์–ด๋–ค ์ˆ˜์น˜/๋ฐฉ๋ฒ•/ํ‘œ/์ธ์šฉ์ด ๋ถ€์กฑํ•œ์ง€
      ๋ฅผ JSON ํ˜•ํƒœ๋กœ ๋ฐ›์•„์˜ต๋‹ˆ๋‹ค.
  5. ์ „์ฒด ๊ฒฐ๊ณผ๋ฅผ ๋Œ€์‹œ๋ณด๋“œ Streamlit๋กœ ํƒ์ƒ‰ํ•˜๊ณ ,
    CSV / PDF ๋ฆฌํฌํŠธ๋กœ ๋‚ด๋ณด๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐ŸŒ Project Overview (Overview - EN)

VeriAI is an AI-assisted document checker for:

  • ESG / environmental advertisements (greenwashing risk)
  • General business / technical reports (evidence & clarity)

It:

  • Splits documents into sentences
  • Scores each sentence with rule-based features
  • Sends only the riskiest ones to an LLM for deeper review
  • Provides interactive visualizations and exports (CSV, PDF)

๐Ÿ… ์ˆ˜์ƒ ๋ฐ ์„ฑ๊ณผ (Achievements)

  • ๐Ÿ† 2025 ์บก์Šคํ†ค ๋””์ž์ธ ๊ฒฝ์ง„๋Œ€ํšŒ ์šฐ์ˆ˜์ƒ ์ˆ˜์ƒ
  • ๐Ÿ† 2025 ์บก์Šคํ†ค ๋””์ž์ธ ๊ฒฝ์ง„๋Œ€ํšŒ ESG ์šฐ์ˆ˜์ƒ ์ˆ˜์ƒ
  • ๐ŸŒŽ ์ผ๋ณธ ํ•™์ˆ ๊ต๋ฅ˜ํšŒ ํ”„๋กœ์ ํŠธ ๋ฐœํ‘œ ์„ ์ •

ํ•ด๋‹น ํ”„๋กœ์ ํŠธ๋Š” ์‹ค๋ฌดยทํ•™์ˆ ์ ์œผ๋กœ ๋ชจ๋‘ ์ธ์ •๋ฐ›์€ ๊ฒ€์ฆ๋œ ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค.


โœจ ์ฃผ์š” ๊ธฐ๋Šฅ (Features)

  • ๐Ÿ“ฅ ์ž…๋ ฅ
    • ํ…์ŠคํŠธ ์ง์ ‘ ์ž…๋ ฅ
    • URL ์ž…๋ ฅ โ†’ trafilatura๋กœ ๋ณธ๋ฌธ ์ž๋™ ์ถ”์ถœ
  • โš–๏ธ ๊ทœ์น™ ๊ธฐ๋ฐ˜ ์ •๋Ÿ‰ ๋ถ„์„
    • ์ˆ˜์น˜ยท์—ฐ๋„ยทํ‘œ์ค€ยท์ œ3์ž ๊ฒ€์ฆ ๋“ฑ ๊ทผ๊ฑฐ ์ ์ˆ˜(evidence_score)
    • ๋ชจํ˜ธ์–ด/๊ณผ์žฅ/๋ฏธ๋ž˜์‹œ์ œ ๋“ฑ ๋ชจํ˜ธ์„ฑ ์ ์ˆ˜(vagueness_score)
    • ์ ์šฉ ๋ฒ”์œ„/์‹œ์ /์˜คํ”„์…‹ ์‚ฌ์šฉ์— ๋”ฐ๋ฅธ ์œ„ํ—˜๋„
    • 0โ€“100 ์œ„ํ—˜๋„(risk) + High/Medium/Low ๋ผ๋ฒจ ๋ถ€์—ฌ
  • ๐Ÿงพ ๋‘ ๊ฐ€์ง€ ๋ถ„์„ ๋ชจ๋“œ
    • ํ™˜๊ฒฝ ๊ด‘๊ณ  (Ad) : ESGยท๊ทธ๋ฆฐ์›Œ์‹ฑ ์ค‘์‹ฌ ๊ทœ์น™
    • ์ผ๋ฐ˜ ๋ณด๊ณ ์„œ (Report) : ์—ฐ๊ตฌ/๊ธฐ์ˆ /๋น„์ฆˆ๋‹ˆ์Šค ๋ณด๊ณ ์„œ์šฉ ์ฆ๊ฑฐ์„ฑ ๊ทœ์น™
  • ๐Ÿง  LLM ์‹ฌ์ธต ๋ถ„์„
    • ์ƒ์œ„ ์œ„ํ—˜ ๋ฌธ์žฅ K๊ฐœ๋งŒ LLM์— ์ „๋‹ฌ
    • ๊ด‘๊ณ  ๋ชจ๋“œ: risk_reasons, explanation, evidence_needed, suggested_queries ๋“ฑ
    • ๋ณด๊ณ ์„œ ๋ชจ๋“œ: issues, what_to_add(metrics/method/tables_figures/citations) ๋“ฑ
  • ๐Ÿ“Š ์‹œ๊ฐํ™” & XAI
    • ๋ฌธ์žฅ๋ณ„ ์œ„ํ—˜๋„ ์Šค์บํ„ฐ ํ”Œ๋กฏ
    • ์œ„ํ—˜ ์š”์ธ Stacked Bar ์ฐจํŠธ
    • SHAP Waterfall Plot์œผ๋กœ ์ ์ˆ˜๊ฐ€ ์–ด๋–ป๊ฒŒ ๋งŒ๋“ค์–ด์กŒ๋Š”์ง€ ์„ค๋ช…
  • ๐Ÿ“ค ๋‚ด๋ณด๋‚ด๊ธฐ
    • ์ „์ฒด ๊ฒฐ๊ณผ CSV ๋‹ค์šด๋กœ๋“œ
    • ์š”์•ฝ + LLM ๊ฒฐ๊ณผ๋ฅผ ํฌํ•จํ•œ PDF ๋ฆฌํฌํŠธ ์ƒ์„ฑ

๐Ÿ“ธ ์Šคํฌ๋ฆฐ์ƒท (Screenshots)

๋ฉ”์ธ ํ™”๋ฉด

main_ui

๋ถ„์„ ๊ฒฐ๊ณผ ํ™”๋ฉด

result_ui

PDF ๋ฆฌํฌํŠธ ์˜ˆ์‹œ

report_example


๐Ÿญ ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜ (Architecture)

VeriAI๋Š” ๊ทœ์น™ ๊ธฐ๋ฐ˜ ์œ„ํ—˜๋„ ๋ถ„์„ + LLM ์‹ฌ์ธต ๋ถ„์„์„ ๊ฒฐํ•ฉํ•œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ตฌ์กฐ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.
์ „์ฒด ํ๋ฆ„์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค:

User Input (Text / URL)
โ†“
parsers.py

  • URL ์ž…๋ ฅ ์‹œ ๋ณธ๋ฌธ ํฌ๋กค๋ง(trafilatura)
  • HTML/๋ถˆํ•„์š” ์š”์†Œ ์ œ๊ฑฐ ํ›„ ์ˆœ์ˆ˜ ํ…์ŠคํŠธ ์ถ”์ถœ

โ†“
rules.py

  • ๋ฌธ์žฅ ๋ถ„๋ฆฌ
  • ๊ทœ์น™ ๊ธฐ๋ฐ˜ Feature ์ถ”์ถœ
    (evidence, vagueness, coverage, temporal, language-risk ๋“ฑ)
  • ๊ฐ€์ค‘์น˜๋ฅผ ์ ์šฉํ•˜์—ฌ 0โ€“100 ์œ„ํ—˜๋„(risk) ์‚ฐ์ถœ
  • High / Medium / Low ๋“ฑ๊ธ‰ ๋ถ„๋ฅ˜

โ†“
llm.py

  • ์œ„ํ—˜๋„ ์ƒ์œ„ K๊ฐœ ๋ฌธ์žฅ ์„ ๋ณ„
  • ๊ด‘๊ณ /๋ณด๊ณ ์„œ ๋ชจ๋“œ์— ๋งž์ถฐ LLM์—๊ฒŒ JSON ํ˜•์‹์œผ๋กœ ๋ถ„์„ ์š”์ฒญ
  • ํ•„์š”ํ•œ ์ฆ๊ฑฐ/๋ชจํ˜ธ์„ฑ/๋ฌธ์ œ์  ๋“ฑ์„ ๊ตฌ์กฐํ™”ํ•ด ๋ฐ˜ํ™˜

โ†“
report.py

  • ๊ทœ์น™ ๊ธฐ๋ฐ˜ + LLM ๋ถ„์„ ๊ฒฐ๊ณผ ์ข…ํ•ฉ
  • PDF/CSV ๋ฆฌํฌํŠธ ์ƒ์„ฑ(FPDF)

โ†“
app.py (Streamlit UI)

  • ํ…์ŠคํŠธ ์ž…๋ ฅ โ†’ ๋ถ„์„ โ†’ ์‹œ๊ฐํ™”(ํ‘œ/๊ทธ๋ž˜ํ”„/SHAP) โ†’ ๋ฆฌํฌํŠธ ๋‹ค์šด๋กœ๋“œ๊นŒ์ง€
    ์ „์ฒด ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ์‚ฌ์šฉ์ž ์ธํ„ฐํŽ˜์ด์Šค๋กœ ์ œ๊ณต

โœ” ๊ตฌ์„ฑ ์š”์†Œ ์š”์•ฝ

  • parsers.py โ€” URL ๋ณธ๋ฌธ ํ…์ŠคํŠธ ํฌ๋กค๋ง
  • rules.py โ€” ๊ทœ์น™ ๊ธฐ๋ฐ˜ ์ ์ˆ˜ ๊ณ„์‚ฐ (ad_rules.json / report_rules.json ์‚ฌ์šฉ)
  • llm.py โ€” OpenAI API(gpt-๊ณ„์—ด) ๊ธฐ๋ฐ˜ ์‹ฌ์ธต ๋ถ„์„
  • report.py โ€” PDF/CSV ๋ฆฌํฌํŠธ ์ƒ์„ฑ
  • app.py โ€” Streamlit UI ๋ฐ ์ „์ฒด ํ”„๋กœ์„ธ์Šค ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜

๐Ÿ“š ๊ทœ์น™ ์„ค์ • ํŒŒ์ผ (Rule Configuration)

VeriAI์˜ ๊ทœ์น™ ๊ธฐ๋ฐ˜ ์ ์ˆ˜ ๊ณ„์‚ฐ์€ config ํด๋”์˜ JSON ํŒŒ์ผ๋กœ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ํŒŒ์ผ๋“ค์„ ์ˆ˜์ •ํ•˜๋ฉด Python ์ฝ”๋“œ๋ฅผ ๊ฑด๋“œ๋ฆฌ์ง€ ์•Š๊ณ ๋„ ๊ทœ์น™์„ ์‰ฝ๊ฒŒ ํ™•์žฅยท์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • config/ad_rules.json โ€” ํ™˜๊ฒฝ ๊ด‘๊ณ  / ESG ๋งˆ์ผ€ํŒ… ๋ฌธ์žฅ์šฉ ๊ทœ์น™ ์„ธํŠธ

    • ๊ณต์ •๊ฑฐ๋ž˜์œ„์›ํšŒ์˜ ํ™˜๊ฒฝ ๊ด€๋ จ ํ‘œ์‹œยท๊ด‘๊ณ ์— ๊ด€ํ•œ ์‹ฌ์‚ฌ์ง€์นจ์—์„œ ์ œ์‹œํ•˜๋Š” ํ‘œํ˜„ ์œ ํ˜•๊ณผ ์˜ˆ์‹œ ๋ฌธ๊ตฌ๋ฅผ ์ฐธ๊ณ ํ•ด, ๋ชจํ˜ธํ•œ ์นœํ™˜๊ฒฝ ํ‘œํ˜„, ๊ณผ์žฅยท์ ˆ๋Œ€ ํ‘œํ˜„, ๊ทผ๊ฑฐ ์—†๋Š” ํ™˜๊ฒฝ ์ฃผ์žฅ, ํƒ„์†Œ์ค‘๋ฆฝ/์ƒ์‡„ ๊ด€๋ จ ์šฉ์–ด ๋“ฑ์„ ๋ ‰์‹œ์ฝ˜(lexicon)์œผ๋กœ ์ •๋ฆฌํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ์ฃผ์š” ๊ตฌ์„ฑ
      • weights: evidence, vagueness, coverage, temporal, language, offset_risk ๋“ฑ์˜ ๊ฐ€์ค‘์น˜
      • thresholds: High / Medium / Low ๋“ฑ๊ธ‰์„ ๋‚˜๋ˆ„๋Š” ๊ธฐ์ค€๊ฐ’
      • regex: ์ˆซ์ž+๋‹จ์œ„, ์—ฐ๋„, ๋ฒ”์œ„(scope), URL, ๊ธˆ์•ก, ๊ฐ์ถ•/์ฆ๊ฐ€ %, ์‹œ์  ํ‘œํ˜„ ๋“ฑ์˜ ํŒจํ„ด
      • lexicons: vague / overclaim / future / offset_terms / greenwashing_keywords ๋“ฑ ํ•ต์‹ฌ ๋‹จ์–ดยท๊ตฌ ๋ชฉ๋ก
  • config/report_rules.json โ€” ์ผ๋ฐ˜ ๋ณด๊ณ ์„œ / ์—ฐ๊ตฌยท๋น„์ฆˆ๋‹ˆ์Šค ๋ฌธ์žฅ์šฉ ๊ทœ์น™ ์„ธํŠธ

    • ์—ฐ๊ตฌ ๋ฐฉ๋ฒ•, ํ‘œ๋ณธ, ํ†ต๊ณ„ ์ •๋ณด, ์ธ์šฉยท์ฐธ๊ณ ๋ฌธํ—Œ, ํ‘œยท๊ทธ๋ฆผ ์–ธ๊ธ‰ ๋“ฑ ์ฆ๊ฑฐ์„ฑ Evidence ๊ด€๋ จ ์š”์†Œ์— ์ง‘์ค‘ํ•ด ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ๊ตฌ์กฐ๋Š” ad_rules.json๊ณผ ๋™์ผํ•˜์ง€๋งŒ, ๋…ผ๋ฌธยท๋ณด๊ณ ์„œ ๋„๋ฉ”์ธ์— ๋งž๋Š” ์ •๊ทœ์‹(regex)๊ณผ ๋ ‰์‹œ์ฝ˜์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • ์˜ˆ: ์ธ์šฉ ํ˜•์‹([1], (Kim, 2024)), DOI, ํ‘œยท๊ทธ๋ฆผ(Figure/Table), ํ†ต๊ณ„ ์ง€ํ‘œ(p-value, CI, n=โ€ฆ) ๋“ฑ ํƒ์ง€

์ด ๊ทœ์น™ ์„ค์ • ํŒŒ์ผ๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ rules.py๊ฐ€ ๋ฌธ์žฅ๋ณ„ Feature๋ฅผ ์ถ”์ถœํ•˜๊ณ ,
๊ฐ€์ค‘์น˜๋ฅผ ์ ์šฉํ•ด 0โ€“100 ์‚ฌ์ด์˜ ์ตœ์ข… ์œ„ํ—˜๋„ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ› ๏ธ ๊ธฐ์ˆ  ์Šคํƒ (Tech Stack)

๐Ÿ”น Language & Environment

  • Python 3.10+
  • Virtual Environment (venv)

๐Ÿ”น Framework & UI

  • Streamlit โ€” ์›น UI ๋ฐ ๊ฒฐ๊ณผ ์‹œ๊ฐํ™”
  • Plotly / Matplotlib โ€” ๊ทธ๋ž˜ํ”„ ๋ฐ ์œ„ํ—˜๋„ ๋ถ„ํฌ ์‹œ๊ฐํ™”
  • SHAP โ€” ๊ทœ์น™ ๊ธฐ๋ฐ˜ ์ ์ˆ˜ ๊ธฐ์—ฌ๋„ ์‹œ๊ฐํ™”

๐Ÿ”น AI / NLP

  • OpenAI API (GPT ๊ณ„์—ด) โ€” ๊ณ ์œ„ํ—˜ ๋ฌธ์žฅ ์‹ฌ์ธต ๋ถ„์„
  • RapidFuzz โ€” ๋ฌธ์žฅ ๋น„๊ต ๋ฐ ์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ ๊ฒ€์‚ฌ
  • Trafilatura โ€” URL ๋ณธ๋ฌธ ํฌ๋กค๋ง ๋ฐ ์ถ”์ถœ

๐Ÿ”น Document & Report

  • FPDF / Pillow โ€” PDF ๋ฆฌํฌํŠธ ์ƒ์„ฑ ๋ฐ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ

๐Ÿ”น Data Processing

  • pandas / numpy โ€” ์ ์ˆ˜ ๊ณ„์‚ฐ ๋ฐ ํ…Œ์ด๋ธ” ์ฒ˜๋ฆฌ
  • json โ€” ๊ทœ์น™(ad_rules/report_rules) ๋กœ๋”ฉ ๋ฐ ์„ค์ • ์ฒ˜๋ฆฌ

๐Ÿ”น Configuration & Security

  • python-dotenv โ€” ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ๊ด€๋ฆฌ (.env)
  • config/ โ€” ๊ทœ์น™ ๊ธฐ๋ฐ˜ ์Šค์ฝ”์–ด๋ง JSON ํŒŒ์ผ ์ €์žฅ

๐Ÿ”น Development & Version Control

  • Git / GitHub

โš™๏ธ ์„ค์น˜ ๋ฐ ์‹คํ–‰ (Installation & Usage)

1๏ธโƒฃ ์ €์žฅ์†Œ ํด๋ก 

git clone https://github.com/leenamho2000/VeriAI.git
cd VerAI

2๏ธโƒฃ ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ ๋ฐ ํ™œ์„ฑํ™” (๊ถŒ์žฅ)

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate

3๏ธโƒฃ ํŒจํ‚ค์ง€ ์„ค์น˜

pip install -r requirements.txt

requirements.txt์—๋Š” ๋Œ€๋žต ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŒจํ‚ค์ง€๋“ค์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค:

  • streamlit, openai, python-dotenv
  • pandas, numpy, plotly, matplotlib, shap
  • rapidfuzz, trafilatura, fpdf, pillow ๋“ฑ

๐Ÿ” ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ • (.env)

๋ฃจํŠธ ๋””๋ ‰ํ† ๋ฆฌ์— .env ํŒŒ์ผ์„ ๋งŒ๋“ค๊ณ  ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

OPENAI_API_KEY=your_openai_api_key_here
# ์„ ํƒ ์˜ต์…˜
OPENAI_MODEL=gpt-4o-mini
OPENAI_MAX_OUT_TOKENS=1200
  • OPENAI_API_KEY๋Š” ํ•„์ˆ˜์ž…๋‹ˆ๋‹ค.
  • ํ•„์š”ํ•˜๋‹ค๋ฉด ๋ชจ๋ธ๋ง/ํ† ํฐ ์ˆ˜๋ฅผ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ–ฅ๏ธ ์‹คํ–‰ ๋ฐฉ๋ฒ•

streamlit run app.py

๊ธฐ๋ณธ์ ์œผ๋กœ ๋ธŒ๋ผ์šฐ์ €์—์„œ ๋‹ค์Œ ์ฃผ์†Œ๋กœ ์ ‘์†ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


๐Ÿงญ ์‚ฌ์šฉ ๋ฐฉ๋ฒ• (How to Use)

1. ๋ถ„์„ ๋ชจ๋“œ ์„ ํƒ ์‚ฌ์ด๋“œ๋ฐ”์—์„œ ํ™˜๊ฒฝ ๊ด‘๊ณ  (Ad)๋˜๋Š” ์ผ๋ฐ˜ ๋ณด๊ณ ์„œ (Report)๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

2. ํ…์ŠคํŠธ/URL ์ž…๋ ฅ

  • ํ…์ŠคํŠธ ๋ฐ•์Šค์— ๋ถ„์„ํ•  ๋‚ด์šฉ์„ ๋ถ™์—ฌ๋„ฃ๊ฑฐ๋‚˜
  • URL์„ ์ž…๋ ฅํ•˜๊ณ  URL ๋ณธ๋ฌธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ๋ฒ„ํŠผ์„ ๋ˆ„๋ฆ…๋‹ˆ๋‹ค.

3. ๐Ÿ”Ž ๋ถ„์„ํ•˜๊ธฐ ํด๋ฆญ

  • ๋ฌธ์žฅ์ด ์ž๋™ ๋ถ„ํ• ๋˜๊ณ , ๊ฐ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์œ„ํ—˜๋„/๋“ฑ๊ธ‰/์ ์ˆ˜๋“ค์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.

4. ๊ฒฐ๊ณผ ํƒ์ƒ‰

  • ๊ฐœ์š”(ํ‘œ) ํƒญ: ๋ฌธ์žฅ๋ณ„ ์ ์ˆ˜ ํ…Œ์ด๋ธ” + ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ
  • ๋ฌธ์žฅ๋ณ„ ํƒ์ƒ‰ ํƒญ: ์„ ํƒํ•œ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ๊ทœ์น™ ํžˆํŠธ, SHAP Waterfall Plot ๋“ฑ ์ƒ์„ธํžˆ ๋ถ„์„
  • ์‹œ๊ฐํ™” ํƒญ: ์œ„ํ—˜๋„ ๋ถ„ํฌ, ๊ตฌ์„ฑ์š”์†Œ ๊ธฐ์—ฌ๋„ ๋ฐ” ์ฐจํŠธ

5. LLM ํ›„์ฒ˜๋ฆฌ & ๋ฆฌํฌํŠธ

  • ๋‚ด๋ณด๋‚ด๊ธฐ ํƒญ์—์„œ - Top-K ์œ„ํ—˜ ๋ฌธ์žฅ์„ ๊ธฐ์ค€์œผ๋กœ LLM ๋ถ„์„ ์‹คํ–‰ (๊ด‘๊ณ /๋ณด๊ณ ์„œ ๋ชจ๋“œ์— ๋งž๊ฒŒ) - ์ „์ฒด ๊ฒฐ๊ณผ CSV ๋‹ค์šด๋กœ๋“œ - PDF ๋ฆฌํฌํŠธ ์ƒ์„ฑ ๋ฐ ๋‹ค์šด๋กœ๋“œ

โš–๏ธ ๊ทœ์น™ ๊ธฐ๋ฐ˜ ์ ์ˆ˜ํ™” ๊ฐœ์š” (Rule-based Scoring)

๊ฐ ๋ฌธ์žฅ์€ ์•„๋ž˜์™€ ๊ฐ™์€ ์š”์†Œ๋กœ๋ถ€ํ„ฐ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

  • Evidence score 0-16
    • ์ˆ˜์น˜ + ๋‹จ์œ„, ์—ฐ๋„, ํ‘œ์ค€/๋ฐฉ๋ฒ•๋ก , ์ œ3์ž ๊ฒ€์ฆ, URL/์ธ์ฆ/๊ธˆ์•ก ์ •๋ณด ๋“ฑ
  • Vagueness score 0-16
    • ๋ชจํ˜ธํ•œ ESG ๋งˆ์ผ€ํŒ… ์šฉ์–ด, ๊ณผ์žฅ ํ‘œํ˜„, ๋ฏธ๋ž˜ ์‹œ์ œ, ๊ทธ๋ฆฐ์›Œ์‹ฑ ํ•ซ ํ‚ค์›Œ๋“œ ๋“ฑ
  • Coverage / Temporal / Language / Offset
    • "์ „์‚ฌ/์ „ ์ œํ’ˆ/์ „ ์„ธ๊ณ„" ์‹์˜ ๋ฒ”์œ„ ๊ณผ์žฅ
    • ๊ธฐํ•œยท๋งˆ์ผ์Šคํ†ค ์—†์ด ๋ชฉํ‘œ๋งŒ ์–ธ๊ธ‰
    • ๊ณผ์žฅยทํ—ˆ์„ธ ํ‘œํ˜„, ์˜คํ”„์…‹ ์˜์กด ๋“ฑ์— ๋Œ€ํ•œ ์œ„ํ—˜๋„

์ด ํ”ผ์ฒ˜๋“ค์„ ๊ฐ€์ค‘ํ•ฉํ•ด 0-100 ์‚ฌ์ด์˜ ์œ„ํ—˜๋„ risk๋ฅผ ๋งŒ๋“ค๊ณ , ์ž„๊ณ„๊ฐ’์— ๋”ฐ๋ผ High / Medium / Low ๋ผ๋ฒจ์„ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค.


๐Ÿง  LLM ๋ถ„์„ ๊ฒฐ๊ณผ ์˜ˆ์‹œ (LLM Analysis)

  • ํ™˜๊ฒฝ ๊ด‘๊ณ  ๋ชจ๋“œ Ad
    • risk_reasons: ๋ชจํ˜ธ์–ด, ๋ฒ”์œ„๊ณผ๋Œ€, ๊ทผ๊ฑฐ๋ถ€์กฑ, ๋ฏธ๋ž˜์‹œ์ œยท๊ณ„ํš๋ถ€์žฌ ๋“ฑ
    • explanation: ์™œ ๊ทธ ๋ฌธ์žฅ์ด ๊ทธ๋ฆฐ์›Œ์‹ฑ ์œ„ํ—˜์ด ์žˆ๋Š”์ง€ ์„œ์ˆ 
    • evidence_needed: ํ•„์š”ํ•œ ์ˆ˜์น˜/๊ธฐ์ค€์—ฐ๋„/๋ฒ”์œ„/์™ธ๋ถ€๊ฒ€์ฆ ๋“ฑ
    • suggested_queries: ๊ฒ€์ฆยท๊ทผ๊ฑฐ ํ™•๋ณด๋ฅผ ์œ„ํ•œ ๊ฒ€์ƒ‰ ์ฟผ๋ฆฌ
  • ์ผ๋ฐ˜ ๋ณด๊ณ ์„œ ๋ชจ๋“œ Report
    • issues: weasel word, ๊ทผ๊ฑฐ/์ถœ์ฒ˜ ๋ถ€์žฌ, ํ‘œ๋ณธ ์ •๋ณด ๋ถ€์กฑ ๋“ฑ
    • what_to_add.metrics/method/tables_figures/citations: ์–ด๋–ค ์ง€ํ‘œยท๋ฐฉ๋ฒ•ยทํ‘œ/๊ทธ๋ฆผยท์ธ์šฉ์„ ์ถ”๊ฐ€ํ•ด์•ผ ํ•˜๋Š”์ง€ ๊ตฌ์กฐํ™”๋œ ์ œ์•ˆ

์ด ๊ฒฐ๊ณผ๋Š” PDF ๋ฆฌํฌํŠธ์—๋„ ํ•จ๊ป˜ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.


๐Ÿ”ง ํ–ฅํ›„ ๊ฐœ์„  ๊ณ„ํš (Roadmap)

  • URL + PDF + ์ด๋ฏธ์ง€ OCR ๋“ฑ ์ž…๋ ฅ ์ฑ„๋„ ํ™•์žฅ
  • ๊ด‘๊ณ /๋ณด๊ณ ์„œ ์™ธ์— ์ƒˆ๋กœ์šด ๋ถ„์„ ๋ชจ๋“œ ์ถ”๊ฐ€
  • AI ๊ธฐ๋ฐ˜ ์ž๋™ ์š”์•ฝ ๋ฐ ๋ฆฌํฌํŠธ ํ…œํ”Œ๋ฆฟ ๊ณ ๋„ํ™”
  • ์‚ฌ์šฉ์ž ๊ณ„์ •/ํžˆ์Šคํ† ๋ฆฌ ์ €์žฅ ๊ธฐ๋Šฅ
  • AWS/GCP ๊ธฐ๋ฐ˜ ์›น ์„œ๋น„์Šค ํ˜•ํƒœ๋กœ ๋ฐฐํฌ
  • ๊ทœ์น™ ์„ธํŠธ(ad_rules, report_rules) ์ง€์† ์—…๋ฐ์ดํŠธ

๐Ÿ“„ ๋ผ์ด์„ ์Šค (License)

์ด ํ”„๋กœ์ ํŠธ๋Š” MIT License ํ•˜์— ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ LICENSE ํŒŒ์ผ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.


๐Ÿ“Œ ์—ฐ๋ฝ์ฒ˜ (Contact)

About

AI-powered ESG Document Verifier

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages