Skip to content

Feature request: Knowledge base — persistent, reusable instructions for agent sessions #454

@phoenixking25

Description

@phoenixking25

Problem

Today, every session starts with a blank slate. If your team has domain-specific conventions, debugging runbooks, architecture decisions, or coding standards, you have to either:

  1. Repeat them in every prompt
  2. Bake them into repo-level CLAUDE.md files (which only apply to one repo)
  3. Hope the agent figures it out from code context

There's no way to store reusable knowledge at the platform level that gets injected into sessions intelligently.

Proposed Feature: Knowledge Base

A managed knowledge system where teams store reusable instructions, conventions, and context that the platform automatically injects into agent sessions based on relevance.

How it works

Three-tier selection — each knowledge item has a pin_mode that controls when it's injected:

Pin Mode Behavior Example
all Always injected into every session "Our team uses conventional commits. Always run tests before creating a PR."
repo Injected when session targets a pinned repo "The backend uses FastAPI + SQLAlchemy 2.0. Use async sessions." (pinned to backend repo)
auto LLM judge (Haiku) decides relevance based on task "When debugging Stripe webhooks, check the idempotency key first."

Data model

Knowledge item:
  id: string
  name: string                    # Short title (e.g., "Backend coding standards")
  content: text                   # Full instructions injected into agent prompt
  description: string             # Short description for LLM judge (auto mode)
  enabled: boolean                # Toggle without deleting
  pin_mode: "all" | "repo" | "auto"
  pinned_repos: string[]          # Repos this applies to (repo mode only)
  author: string                  # Who created it
  created_at / updated_at

Selection flow at session start

1. Fetch all enabled knowledge items
2. Filter:
   - pin_mode="all"  → always include
   - pin_mode="repo" → include if session repo matches pinned_repos
   - pin_mode="auto" → send to Haiku judge with task description
3. Apply token budget (e.g., max 15k chars) to prevent prompt bloat
4. Inject selected items into the agent's system prompt
5. Log which items were injected (for analytics / optimization)

Usage tracking

Track which knowledge items are injected into which sessions:

KnowledgeUsage:
  knowledge_id: string
  session_id: string
  selection_reason: "all" | "repo" | "auto"
  injected_at: timestamp

This enables analytics: which knowledge items are most used, which repos benefit most, whether auto-selected items correlate with session success.

Where it fits in the current architecture

Layer Change
D1 (session index) New knowledge and knowledge_usage tables
Control plane Knowledge CRUD API routes, selection logic at session init
Session DO Inject selected knowledge into the OpenCode system prompt during sandbox spawn
Web UI Settings → Knowledge page: create/edit/toggle items, pin to repos, view usage
Slack bot No change — knowledge selection is transparent to the trigger source

UI mockup

Settings → Knowledge:

  • List of knowledge items with name, pin mode, status (enabled/disabled), usage count
  • Create/edit form: name, content (markdown), description, pin mode selector, repo multi-select
  • Toggle enabled/disabled without deleting
  • Analytics: most-used items, per-repo breakdown

Example knowledge items

# "PR Standards" (pin_mode: all)
Always create a PR with:
- Title under 70 characters
- Description with ## Summary and ## Test Plan sections
- Run lint and type checks before committing

# "Backend Architecture" (pin_mode: repo, pinned to backend)
Our backend uses:
- FastAPI with async endpoints
- SQLAlchemy 2.0 with asyncpg
- Alembic for migrations
- pytest with async fixtures
Never use synchronous database calls.

# "Stripe Integration" (pin_mode: auto)
When working with Stripe webhooks:
- Always verify webhook signatures
- Use idempotency keys for all API calls
- Handle "payment_intent.succeeded" and "payment_intent.payment_failed" events
- Log all webhook events to the audit table

Why this matters

  • Consistency — team conventions applied to every session without manual repetition
  • Onboarding — new team members' agents automatically know the codebase conventions
  • Efficiency — auto mode avoids prompt bloat by only injecting what's relevant
  • Measurability — usage tracking shows which knowledge is actually helpful

Happy to contribute the implementation.

@ColeMurray

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions