Feat: Custom API Integrations by sgogriff · Pull Request #52 · onllm-dev/onWatch

sgogriff · 2026-04-04T00:41:51Z

Summary

Adds API Integrations as a new telemetry subsystem for tracking token and cost usage from custom API-driven scripts via local JSONL ingestion.

What Changed

added backend JSONL ingestion and SQLite storage for API Integrations events
added read-only API endpoints for current usage, history, and ingest health
added API Integrations dashboard UI and settings visibility control
added Python wrapper examples for Anthropic, OpenAI, Mistral, OpenRouter, and Gemini
added setup and README documentation

Notes

API Integrations is separate from subscription/quota tracking, codebase reflects this
it tracks cumulative usage telemetry, not remaining plan percentage
ingestion is controlled by ONWATCH_API_INTEGRATIONS_ENABLED
source directory is configurable via ONWATCH_API_INTEGRATIONS_DIR
Docs currently only have a .py wrapper as an example, but more could easily be added. The same goes for adapting to a wider range of providers.

Testing

go test -race ./...
go vet ./...
manually tested JSONL ingestion and dashboard rendering with seeded data. Seeded data generator script available in examples/api_integrations/python along with .py examples and JSONL wrapper.

Screenshots!

codecov · 2026-04-04T00:49:02Z

Codecov Report

❌ Patch coverage is 78.76106% with 192 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/store/api_integrations_store.go	81.94%	28 Missing and 24 partials ⚠️
internal/agent/api_integrations_ingest_agent.go	72.56%	26 Missing and 19 partials ⚠️
internal/web/api_integrations_handlers.go	85.71%	26 Missing and 8 partials ⚠️
internal/web/handlers.go	52.54%	21 Missing and 7 partials ⚠️
internal/api_integrations/types.go	71.42%	13 Missing and 11 partials ⚠️
internal/config/config.go	70.83%	3 Missing and 4 partials ⚠️
internal/store/store.go	94.44%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

prakersh · 2026-04-05T10:42:36Z

Thanks a lot for this contribution! This is a substantial PR so we'll need some time to review it properly. Will follow up with detailed feedback once we've gone through everything.

sgogriff · 2026-04-05T11:03:27Z

No problem! Let me know if there is anything that needs re-thinking. Happy to help :)

prakersh · 2026-04-05T12:56:02Z

Thanks for the detailed PR! I've done an initial review and the overall structure looks solid - clean separation, good test coverage, and consistent use of parameterized SQL.

A few things I'd like to address before merging:

Query bounds:

QueryAPIIntegrationUsageSummary() has no LIMIT - our project guardrails require bounded queries. Would need a cap here.
QueryAPIIntegrationUsageBuckets() loads all events into memory before grouping. For large time ranges this could be expensive - ideally the bucketing should happen in SQL.

Minor code issues:

Duplicate detection in InsertAPIIntegrationUsageEvent relies on strings.Contains(err.Error(), "unique") which is fragile. Prefer checking SQLite error codes.
GetActiveSystemAlertsByProvider double-parses createdAt (RFC3339 then RFC3339Nano) - second parse always overwrites the first. Just RFC3339Nano would suffice.

Typo:

README line mentions "accross seprarte" - should be "across separate"

Worth considering (non-blocking):

No data retention/pruning for api_integration_usage_events - will grow indefinitely
raw_line column stores full JSON alongside parsed columns, doubling storage

Happy to discuss any of these. Nice work on the docs and Python examples.

sgogriff · 2026-04-06T09:59:07Z

Glad that the PR looks good - I've gone through your comments and made these changes: -

Bounded the current usage summary query, limits to 500.
Moved API Integrations history bucketing into SQLite so large time ranges no longer require loading and grouping all raw events in memory.
Hardened duplicate detection for ingested events; switching from error-string matching to SQLite unique-constraint codes.
API Integrations alert timestamp parsing by using a single RFC3339Nano parse path.
Fixed the README typo in the API Integrations description.
Added automatic database retention/pruning for api_integration_usage_events via ONWATCH_API_INTEGRATIONS_RETENTION.
Default retention is now 60 days (1440h). And setting the value to 0 should disable DB pruning.
Source API Integrations JSONL files are not pruned by onWatch (future change? would have to think about how this interacts with offset_bytes in JSONL tailing). People with lots of integrations still need to rotate file manually.
Updated the API Integrations setup docs and README env var reference to document the new retention behaviour.

I didn't change raw_line column storing full JSON doubling storage. Probably needs addressing.

Also had a couple thoughts with where to take this next. Won't be doing it anytime soon, but any thoughts?

Add settings tab for the API integrations, similar to the others. Make it easier for end user!
Add some kind of threshold alerts (i.e. integration exceeds X tokens or $ cost in Y window). This one would be particularly useful.

Let me know if somethings not right, or if we should make more changes before merging :)

prakersh · 2026-04-12T01:17:02Z

Hey @sgogriff, great work on the follow-up! You addressed everything cleanly - the SQL bucketing, SQLite error codes, retention/pruning, and the bounded summary query are all solid. Really nice execution.

I did a deeper pass and found a few more items to tidy up before we merge:

Must fix:

Extra </div> in health summary (app.js ~line 5083)
There's a stray closing </div> tag in renderAPIIntegrationsHealth that will break the health section layout in some browsers. Just needs removing.
QueryAPIIntegrationUsageBuckets needs a LIMIT (api_integrations_store.go ~line 222)
The summary query got bounded (500 cap - nice), but the bucket query slipped through. With 10 integrations over 30 days at 1-min granularity, that's potentially 432K rows loaded into memory before downsampling. A cap (e.g., 5000) would keep this safe.
Fingerprint sub-second precision (types.go ~line 159)
eventFingerprint formats the timestamp with time.RFC3339 (second precision). Two events within the same second that share identical integration/provider/model/tokens would silently dedup. Switching to time.RFC3339Nano would close this gap.
String length validation on JSONL fields (types.go ~lines 65-154)
integration, model, account, and metadata fields have no max length enforcement. A buggy producer could write very large strings. Simple guards would help - e.g., 256 chars for names, 4KB for metadata.
Missing source_path index (api_integrations_store.go ~line 311)
The health endpoint's LEFT JOIN on source_path has no index on that column in api_integration_usage_events. As event count grows, this becomes a full table scan per ingest state row. Adding CREATE INDEX IF NOT EXISTS idx_api_integration_usage_events_source ON api_integration_usage_events(source_path) would fix it.
Drop raw_line column (types.go, store.go, api_integrations_store.go)
Looking at this more closely - raw_line stores the full original JSON line alongside all the parsed columns, but nothing ever reads it. No handler or endpoint surfaces it. The source .jsonl files already serve as the raw audit trail (and they're not pruned by onWatch), so storing the same data again in SQLite is redundant and doubles per-event storage. I'd recommend removing the column, the RawLine struct field, and the related INSERT/SELECT references entirely.

Design questions (non-blocking, just want your thoughts):

AvailableProviders() / HasProvider() in config.go don't include api-integrations. Was this intentional? If backend code iterates providers for any reason, API integrations would be silently skipped. Wondering if it should be included there (gated on APIIntegrationsEnabled) or if you see it as fundamentally separate from polling providers.

Merge conflicts:

The PR currently has merge conflicts with main. Could you rebase onto the latest main and resolve those when you push the fixes above? That way we can get this merged smoothly once the changes look good.

On your future ideas - both a settings tab and threshold alerts sound great. The threshold alerts especially would be a killer feature for teams monitoring spend. Happy to discuss those in separate issues when you're ready.

Thanks again for the solid contribution and quick turnaround on feedback!

…geBucketsLimit, plus tests

sgogriff · 2026-04-13T10:17:05Z

Thanks for the detailed pass, all fixed up!

Extra - removed the closing tag.

Bucket query LIMIT - added apiIntegrationUsageBucketsLimit = 5000 constant as you suggested and capped the query alongside the existing summary limit.

Fingerprint precision - switched to time.RFC3339Nano in eventFingerprint.

String length validation - added maxIntegrationFieldLen = 256 and maxMetadataJSONLen = 4096 constants with guards in ParseUsageEventLine for integration, model, account, and compacted metadata_json. Tests added for each rejection path.

source_path index - added CREATE INDEX IF NOT EXISTS idx_api_integration_usage_events_source to the schema. Since it uses IF NOT EXISTS it runs on startup and covers existing databases automatically.

raw_line removal - dropped the field from the struct, removed it from INSERT/SELECT, and removed it from the schema. Added a DROP COLUMN migration in migrateSchema() for existing databases (with a TODO comment to remove it once everyone has migrated - a very very small number of users on this fork so should be quick, but nicer for us!).

Also rebased onto latest main - one conflict in config.go around the CodexHasProfiles field, resolved by keeping the upstream version.
While running tests on the rebase I also noticed TestRunStatus_PortFallbackDetectsOnwatchProcess is flaky on upstream main too (lsof lag on macOS? - the TCP connect succeeds before lsof can see the process). Included a fix in this PR but happy to pull it out into a separate issue/PR if you'd prefer to keep concerns separate.

Regarding the design;
Kept api-integrations out of AvailableProviders() intentionally. The existing entries in that list all represent providers that onWatch polls directly using some kind of configured credential. api-integrations is deliberately credential-free: it reads whatever providers appear in the JSONL data (so naturally can take inputs from any provider), so the set of providers is determined at runtime by the files, not at config time.
As you mention there is a silent-skip risk. Not fixed on the current structure, so happy to change it if you think that's the better call given the reasoning above. One middle-ground option could be a separate AvailableFeatures() covering non-polling capabilities, which would close that gap without conflating the two concepts

An alert feature is definitely on the roadmap -- it would be a very good addition. I'll get to it at some point, but you're welcome to take it on yourself if you'd rather not wait! :)
Please let me know if you find anything else needing attention regarding this PR in the meantime

prakersh · 2026-04-13T18:01:45Z

Hey @sgogriff, all six must-fixes from the last round are verified and look solid. Rebase is clean. Did a security pass - parameterized SQL, XSS escaping, auth on new endpoints, input validation all check out.

Three more items before merge:

Unbounded PartialLine growth (api_integrations_ingest_agent.go ~L160) - A .jsonl file with no newlines grows state.PartialLine by 256KB per scan cycle indefinitely. Add a cap (e.g., 512KB) and discard with a warning when exceeded.
Unbounded alert creation (api_integrations_ingest_agent.go ~L189) - recordInvalidLine creates a DB row per bad line. A garbage file floods the table. Rate-limit to e.g. 10 alerts per file per cycle.
No cap on files scanned (api_integrations_ingest_agent.go ~L98) - Thousands of .jsonl files would all be stat'd + DB-queried each 5s cycle. A soft cap (e.g., 100) with a warning log would help.

The flaky test fix is fine to keep here. Will open an issue for the AvailableFeatures() idea. Almost there!

- Add 512KB cap on PartialLine with discard + warning when exceeded - Rate-limit invalid line alerts to 10 per file per scan cycle - Cap file scan to 100 files per cycle with round-robin cursor for coverage All three prevent memory/DB flooding from malformed input files.

sgogriff · 2026-04-15T10:21:39Z

Hey @prakersh,
I’ve now applied all three fixes in the latest push.
All set for another look if needed, and do let me know if you find any more issues! :)

sgogriff force-pushed the feat/api-integrations branch from 99d54b4 to 6052ff6 Compare April 6, 2026 15:27

sgogriff added 9 commits April 13, 2026 12:30

feat: api-integration

e421c3f

screenshot indexing

19f059f

fix: typo

1189741

docs: updated db pruning

ab18569

fix: bound queries and add DB retention pruning

f41553c

fix: update NewServer extra arg in test

b5eea45

fix: div stray removed

77f83d2

fix: removed raw_line, timestamp to nano, limits on apiIntegrationUsa…

5e96593

…geBucketsLimit, plus tests

fix: poll lsof readiness in startOnwatchNCListener to avoid flaky test

2fd1fdf

sgogriff force-pushed the feat/api-integrations branch from eb0aba7 to 2fd1fdf Compare April 13, 2026 09:40

fix: addressing "no such table" caught by test

972a682

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Custom API Integrations#52

Feat: Custom API Integrations#52
sgogriff wants to merge 11 commits intoonllm-dev:mainfrom
sgogriff:feat/api-integrations

sgogriff commented Apr 4, 2026

Uh oh!

codecov bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

prakersh commented Apr 5, 2026 •

edited

Loading

Uh oh!

sgogriff commented Apr 5, 2026

Uh oh!

prakersh commented Apr 5, 2026

Uh oh!

sgogriff commented Apr 6, 2026

Uh oh!

prakersh commented Apr 12, 2026

Uh oh!

sgogriff commented Apr 13, 2026

Uh oh!

prakersh commented Apr 13, 2026

Uh oh!

sgogriff commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sgogriff commented Apr 4, 2026

Summary

What Changed

Notes

Testing

Screenshots!

Uh oh!

codecov bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

prakersh commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgogriff commented Apr 5, 2026

Uh oh!

prakersh commented Apr 5, 2026

Uh oh!

sgogriff commented Apr 6, 2026

Uh oh!

prakersh commented Apr 12, 2026

Uh oh!

sgogriff commented Apr 13, 2026

Uh oh!

prakersh commented Apr 13, 2026

Uh oh!

sgogriff commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Apr 4, 2026 •

edited

Loading

prakersh commented Apr 5, 2026 •

edited

Loading