|
| 1 | +# Knowledge Sync |
| 2 | + |
| 3 | +OSA includes a knowledge discovery system that syncs GitHub discussions and academic papers for HED-related content. This helps the assistant link users to relevant discussions and research, not as authoritative knowledge sources, but for discovery. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The knowledge database stores: |
| 8 | + |
| 9 | +- **GitHub Issues and PRs** from HED repositories (hed-specification, hed-schemas, hed-javascript) |
| 10 | +- **Academic Papers** from OpenALEX, Semantic Scholar, and PubMed |
| 11 | + |
| 12 | +!!! note "Discovery, Not Answers" |
| 13 | + The knowledge system is for **discovery only**. The assistant links users to relevant discussions ("There's a related issue: [link]") rather than answering from them. |
| 14 | + |
| 15 | +## Quick Start |
| 16 | + |
| 17 | +```bash |
| 18 | +# Initialize the database |
| 19 | +uv run osa sync init |
| 20 | + |
| 21 | +# Sync GitHub issues/PRs |
| 22 | +uv run osa sync github |
| 23 | + |
| 24 | +# Sync academic papers |
| 25 | +uv run osa sync papers |
| 26 | + |
| 27 | +# Sync everything |
| 28 | +uv run osa sync all |
| 29 | + |
| 30 | +# Check sync status |
| 31 | +uv run osa sync status |
| 32 | +``` |
| 33 | + |
| 34 | +## CLI Commands |
| 35 | + |
| 36 | +### `osa sync init` |
| 37 | + |
| 38 | +Initialize the knowledge database. Creates the SQLite database with FTS5 full-text search support. |
| 39 | + |
| 40 | +```bash |
| 41 | +uv run osa sync init |
| 42 | +``` |
| 43 | + |
| 44 | +### `osa sync github` |
| 45 | + |
| 46 | +Sync GitHub issues and PRs from HED repositories. |
| 47 | + |
| 48 | +```bash |
| 49 | +# Sync all HED repos |
| 50 | +uv run osa sync github |
| 51 | + |
| 52 | +# Sync specific repo |
| 53 | +uv run osa sync github -r hed-standard/hed-specification |
| 54 | +``` |
| 55 | + |
| 56 | +Options: |
| 57 | + |
| 58 | +- `-r, --repo`: Specific repository to sync (e.g., `hed-standard/hed-specification`) |
| 59 | + |
| 60 | +**Synced repositories:** |
| 61 | + |
| 62 | +- `hed-standard/hed-specification` |
| 63 | +- `hed-standard/hed-schemas` |
| 64 | +- `hed-standard/hed-javascript` |
| 65 | + |
| 66 | +### `osa sync papers` |
| 67 | + |
| 68 | +Sync academic papers from multiple sources. |
| 69 | + |
| 70 | +```bash |
| 71 | +# Sync from all sources |
| 72 | +uv run osa sync papers |
| 73 | + |
| 74 | +# Sync from specific source |
| 75 | +uv run osa sync papers -s openalex |
| 76 | +uv run osa sync papers -s semanticscholar |
| 77 | +uv run osa sync papers -s pubmed |
| 78 | + |
| 79 | +# Custom search query |
| 80 | +uv run osa sync papers -q "BIDS event annotation" |
| 81 | +``` |
| 82 | + |
| 83 | +Options: |
| 84 | + |
| 85 | +- `-s, --source`: Paper source (`openalex`, `semanticscholar`, `pubmed`) |
| 86 | +- `-q, --query`: Custom search query (default: "HED annotation" OR "Hierarchical Event Descriptors") |
| 87 | + |
| 88 | +### `osa sync all` |
| 89 | + |
| 90 | +Sync all knowledge sources (GitHub + papers). |
| 91 | + |
| 92 | +```bash |
| 93 | +uv run osa sync all |
| 94 | +``` |
| 95 | + |
| 96 | +### `osa sync status` |
| 97 | + |
| 98 | +Show sync status and database statistics. |
| 99 | + |
| 100 | +```bash |
| 101 | +uv run osa sync status |
| 102 | +``` |
| 103 | + |
| 104 | +Example output: |
| 105 | + |
| 106 | +``` |
| 107 | +Knowledge Database Status |
| 108 | +───────────────────────── |
| 109 | +Database: ~/.local/share/osa/knowledge/hed.db |
| 110 | +
|
| 111 | +GitHub Items: |
| 112 | + hed-standard/hed-specification: 45 issues, 23 PRs |
| 113 | + hed-standard/hed-schemas: 12 issues, 8 PRs |
| 114 | + hed-standard/hed-javascript: 18 issues, 5 PRs |
| 115 | + Last sync: 2026-01-12 02:00:00 UTC |
| 116 | +
|
| 117 | +Papers: |
| 118 | + OpenALEX: 42 papers |
| 119 | + Semantic Scholar: 38 papers |
| 120 | + PubMed: 25 papers |
| 121 | + Last sync: 2026-01-05 03:00:00 UTC |
| 122 | +``` |
| 123 | + |
| 124 | +### `osa sync search` |
| 125 | + |
| 126 | +Search the knowledge database (for testing). |
| 127 | + |
| 128 | +```bash |
| 129 | +uv run osa sync search "validation error" |
| 130 | +``` |
| 131 | + |
| 132 | +## Automated Sync (Docker) |
| 133 | + |
| 134 | +When running OSA in Docker, the scheduler automatically syncs knowledge sources: |
| 135 | + |
| 136 | +| Source | Default Schedule | Environment Variable | |
| 137 | +|--------|-----------------|---------------------| |
| 138 | +| GitHub | Daily at 2am UTC | `SYNC_GITHUB_CRON` | |
| 139 | +| Papers | Weekly Sunday 3am UTC | `SYNC_PAPERS_CRON` | |
| 140 | + |
| 141 | +### Configuration |
| 142 | + |
| 143 | +Configure via environment variables in your `.env` file: |
| 144 | + |
| 145 | +```bash |
| 146 | +# Enable/disable automated sync |
| 147 | +SYNC_ENABLED=true |
| 148 | + |
| 149 | +# Sync schedules (cron expressions, UTC timezone) |
| 150 | +SYNC_GITHUB_CRON=0 2 * * * # Daily at 2am |
| 151 | +SYNC_PAPERS_CRON=0 3 * * 0 # Weekly Sunday at 3am |
| 152 | + |
| 153 | +# Optional API keys for higher rate limits |
| 154 | +GITHUB_TOKEN=ghp_... |
| 155 | +SEMANTIC_SCHOLAR_API_KEY=... |
| 156 | +PUBMED_API_KEY=... |
| 157 | +``` |
| 158 | + |
| 159 | +### Docker Compose |
| 160 | + |
| 161 | +The included `docker-compose.yml` mounts a volume for database persistence: |
| 162 | + |
| 163 | +```yaml |
| 164 | +services: |
| 165 | + osa: |
| 166 | + volumes: |
| 167 | + - osa-data:/app/data |
| 168 | + |
| 169 | +volumes: |
| 170 | + osa-data: |
| 171 | +``` |
| 172 | +
|
| 173 | +This ensures the knowledge database persists across container restarts. |
| 174 | +
|
| 175 | +## Manual Sync Trigger |
| 176 | +
|
| 177 | +You can manually trigger sync at any time: |
| 178 | +
|
| 179 | +```bash |
| 180 | +# Inside Docker container |
| 181 | +docker exec osa uv run osa sync all |
| 182 | + |
| 183 | +# Or from host with CLI |
| 184 | +uv run osa sync all |
| 185 | +``` |
| 186 | + |
| 187 | +## Database Location |
| 188 | + |
| 189 | +| Environment | Location | |
| 190 | +|-------------|----------| |
| 191 | +| Local (macOS) | `~/Library/Application Support/osa/knowledge/hed.db` | |
| 192 | +| Local (Linux) | `~/.local/share/osa/knowledge/hed.db` | |
| 193 | +| Docker | `/app/data/knowledge/hed.db` | |
| 194 | + |
| 195 | +The location can be overridden with the `DATA_DIR` environment variable. |
| 196 | + |
| 197 | +## API Keys |
| 198 | + |
| 199 | +All API keys are optional but recommended for higher rate limits: |
| 200 | + |
| 201 | +| API Key | Purpose | Get Key | |
| 202 | +|---------|---------|---------| |
| 203 | +| `GITHUB_TOKEN` | GitHub API (issues/PRs) | [GitHub Settings](https://github.com/settings/tokens) | |
| 204 | +| `SEMANTIC_SCHOLAR_API_KEY` | Semantic Scholar API | [S2 API](https://www.semanticscholar.org/product/api) | |
| 205 | +| `PUBMED_API_KEY` | PubMed/NCBI API | [NCBI Settings](https://www.ncbi.nlm.nih.gov/account/settings/) | |
| 206 | + |
| 207 | +## Agent Tools |
| 208 | + |
| 209 | +The HED assistant has access to knowledge discovery tools: |
| 210 | + |
| 211 | +### `search_hed_discussions` |
| 212 | + |
| 213 | +Search GitHub issues and PRs for related discussions. |
| 214 | + |
| 215 | +``` |
| 216 | +"Can you find any discussions about validation errors?" |
| 217 | +→ "There's a related discussion in hed-specification#123: [link]" |
| 218 | +``` |
| 219 | + |
| 220 | +### `search_hed_papers` |
| 221 | + |
| 222 | +Search academic papers related to HED. |
| 223 | + |
| 224 | +``` |
| 225 | +"Are there papers about HED in neuroimaging?" |
| 226 | +→ "I found a relevant paper: 'HED Annotation Best Practices' [link]" |
| 227 | +``` |
| 228 | + |
| 229 | +## Troubleshooting |
| 230 | + |
| 231 | +### Sync fails with "gh: command not found" |
| 232 | + |
| 233 | +The `gh` CLI is required for GitHub sync. Install it: |
| 234 | + |
| 235 | +```bash |
| 236 | +# macOS |
| 237 | +brew install gh |
| 238 | + |
| 239 | +# Ubuntu/Debian |
| 240 | +sudo apt install gh |
| 241 | +``` |
| 242 | + |
| 243 | +### Rate limiting |
| 244 | + |
| 245 | +If you hit rate limits, configure API keys in your `.env` file. Without keys: |
| 246 | + |
| 247 | +- GitHub: 60 requests/hour |
| 248 | +- Semantic Scholar: ~100 requests/5 minutes |
| 249 | +- PubMed: 3 requests/second |
| 250 | + |
| 251 | +With keys, limits are significantly higher. |
| 252 | + |
| 253 | +### Database corruption |
| 254 | + |
| 255 | +If the database becomes corrupted, delete and reinitialize: |
| 256 | + |
| 257 | +```bash |
| 258 | +rm ~/.local/share/osa/knowledge/hed.db |
| 259 | +uv run osa sync init |
| 260 | +uv run osa sync all |
| 261 | +``` |
0 commit comments