Feature/orion publish intent v1 b#62
Conversation
Reviewer's GuideImplements Orion QIKI intent v1 protocol end-to-end: makes operator intents fire-and-forget, enriches them with minimal telemetry/context snapshot, introduces strict Pydantic models for intents and proposals, updates NATS subjects, and adds tests for routing and protocol validation. Sequence diagram for Orion operator QIKI intent v1 publish flowsequenceDiagram
actor Operator
participant OrionOperatorConsole
participant NATS
participant QIKIAgent
Operator->>OrionOperatorConsole: enter command "q: <text>" or "// <text>"
OrionOperatorConsole->>OrionOperatorConsole: _run_command(raw)
OrionOperatorConsole->>OrionOperatorConsole: asyncio.create_task(_publish_qiki_intent(qiki_text))
Note over OrionOperatorConsole: Fire and forget<br/>operator loop continues
OrionOperatorConsole->>OrionOperatorConsole: _publish_qiki_intent(text)
OrionOperatorConsole->>OrionOperatorConsole: TelemetrySnapshotModel.normalize_payload(latest_telemetry)
OrionOperatorConsole->>OrionOperatorConsole: build vitals, selection, incidents_top
OrionOperatorConsole->>OrionOperatorConsole: construct IntentV1
OrionOperatorConsole->>OrionOperatorConsole: intent_payload = intent.model_dump()
OrionOperatorConsole->>OrionOperatorConsole: canonical JSON + sha256 digest
OrionOperatorConsole->>NATS: publish_command(QIKI_INTENT_V1, intent_payload)
NATS-->>QIKIAgent: deliver intent on subject qiki.intent.v1
NATS-->>OrionOperatorConsole: publish_command result
OrionOperatorConsole->>OrionOperatorConsole: log "Intent sent" with hash and ts
Class diagram for Orion QIKI intent v1 protocol modelsclassDiagram
class _StrictModel {
<<pydantic.BaseModel>>
+ConfigDict model_config
}
class LangHint {
<<enum>>
+AUTO
+RU
+EN
}
class EnvironmentMode {
<<enum>>
+FACTORY
+MISSION
}
class SelectionV1 {
<<_StrictModel>>
+kind Literal_event_incident_track_snapshot_none = "none"
+id Optional_str
}
class IntentV1 {
<<_StrictModel>>
+version Literal_1 = 1
+text str
+lang_hint LangHint = LangHint.AUTO
+screen str
+selection SelectionV1
+ts int
+environment_mode EnvironmentMode = EnvironmentMode.FACTORY
+snapshot_min dict_str_Any
}
class ProposalV1 {
<<_StrictModel>>
+proposal_id str
+title str
+justification str
+priority int
+confidence float
+proposed_actions list_Any
+_must_be_empty_in_stage_a(v list_Any) list_Any
}
class ProposalsBatchV1 {
<<_StrictModel>>
+version Literal_1 = 1
+ts int
+proposals list_ProposalV1
+metadata dict_str_Any
}
_StrictModel <|-- SelectionV1
_StrictModel <|-- IntentV1
_StrictModel <|-- ProposalV1
_StrictModel <|-- ProposalsBatchV1
LangHint <.. IntentV1
EnvironmentMode <.. IntentV1
SelectionV1 <.. IntentV1
ProposalV1 <.. ProposalsBatchV1
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Comment |
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The fire-and-forget task in
_run_commandswallows exceptions by just touchingt.exception(); consider logging the exception in theadd_done_callbackso failures in_publish_qiki_intentare visible in logs. - Parsing
OPERATOR_CONSOLE_INTENT_TOP_INCIDENTSwithint(os.getenv(...))can raiseValueErroron bad input and break intent publishing; consider wrapping this in a safe helper with a default fallback to avoid crashing the operator loop.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The fire-and-forget task in `_run_command` swallows exceptions by just touching `t.exception()`; consider logging the exception in the `add_done_callback` so failures in `_publish_qiki_intent` are visible in logs.
- Parsing `OPERATOR_CONSOLE_INTENT_TOP_INCIDENTS` with `int(os.getenv(...))` can raise `ValueError` on bad input and break intent publishing; consider wrapping this in a safe helper with a default fallback to avoid crashing the operator loop.
## Individual Comments
### Comment 1
<location> `src/qiki/services/operator_console/main_orion.py:3460-3461` </location>
<code_context>
return
- await self._publish_qiki_intent(qiki_text)
+ # Fire-and-forget: ORION must never block the operator loop on agent I/O.
+ task = asyncio.create_task(self._publish_qiki_intent(qiki_text))
+ task.add_done_callback(lambda t: t.exception() if not t.cancelled() else None)
return
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Background task errors are silently dropped; consider logging failures in the done callback.
Because the callback just calls `t.exception()` and ignores the result, any errors in `_publish_qiki_intent` are effectively swallowed, making operational issues hard to detect. Consider replacing the lambda with a small helper that checks `t.exception()` and logs the error (and perhaps the relevant intent text) so failures are visible without blocking the main loop.
Suggested implementation:
```python
from dataclasses import dataclass
from dataclasses import asdict
import asyncio
import functools
import json
import logging
import math
import os
import time
from hashlib import sha256
from typing import Any, Optional
```
```python
from hashlib import sha256
from typing import Any, Optional
def _log_task_error(task: asyncio.Task, intent_text: str) -> None:
"""Log exceptions raised by fire-and-forget background tasks."""
if task.cancelled():
return
exc = task.exception()
if exc is None:
return
logger = logging.getLogger(__name__)
logger.exception(
"Background task failed while publishing qiki intent",
extra={"intent_text": intent_text},
)
```
```python
)
return
# Fire-and-forget: ORION must never block the operator loop on agent I/O.
task = asyncio.create_task(self._publish_qiki_intent(qiki_text))
task.add_done_callback(
functools.partial(_log_task_error, intent_text=qiki_text)
)
```
</issue_to_address>
### Comment 2
<location> `src/qiki/services/operator_console/main_orion.py:3743` </location>
<code_context>
+ -float(getattr(inc, "last_seen", 0.0) or 0.0),
+ )
+ )
+ limit = int(os.getenv("OPERATOR_CONSOLE_INTENT_TOP_INCIDENTS", "10") or 10)
+ now = time.time()
+ for inc in incidents[: max(0, limit)]:
</code_context>
<issue_to_address>
**issue (bug_risk):** Parsing the env var for `limit` can raise and abort intent publishing; consider a safer fallback.
If `OPERATOR_CONSOLE_INTENT_TOP_INCIDENTS` is set to a non-integer value, `int(...)` will raise `ValueError` and block publishing the intent. Consider wrapping this in a `try`/`except` with a default (e.g., 10) and a logged warning so misconfigured envs don’t stop QIKI intents from being sent.
</issue_to_address>
### Comment 3
<location> `src/qiki/services/operator_console/main_orion.py:32-33` </location>
<code_context>
-from qiki.shared.nats_subjects import COMMANDS_CONTROL, QIKI_INTENTS
+from qiki.shared.nats_subjects import COMMANDS_CONTROL, QIKI_INTENT_V1
try:
from qiki.services.operator_console.ui.charts import PpiScopeRenderer
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Broad `except Exception` around telemetry normalization may hide programming errors.
Catching all exceptions from `TelemetrySnapshotModel.normalize_payload` and defaulting to `{}` keeps the operator loop running but can mask real programming or schema issues. Please narrow the `except` to the expected error types (e.g. `ValueError`, `TypeError`) or at least log the exception (ideally rate-limited) so telemetry regressions are visible while preserving robustness.
</issue_to_address>
### Comment 4
<location> `tests/unit/test_orion_qiki_protocol_v1.py:41-50` </location>
<code_context>
+ IntentV1.model_validate({"version": 1, "text": "x"})
+
+
+def test_proposal_v1_actions_must_be_empty_in_stage_a() -> None:
+ ok = ProposalV1(
+ proposal_id="P1",
+ title="Title",
+ justification="Justification",
+ priority=50,
+ confidence=0.6,
+ proposed_actions=[],
+ )
+ assert ok.proposed_actions == []
+
</code_context>
<issue_to_address>
**suggestion (testing):** Also validate ProposalV1 priority and confidence boundary conditions
Since ProposalV1 enforces bounds on `priority` (0–100) and `confidence` (0.0–1.0), please add tests that cover the boundary values (0 and 100 for `priority`, 0.0 and 1.0 for `confidence`) and assert that out-of-range values raise `ValidationError`. This will help catch any future changes that might weaken these constraints.
```suggestion
def test_proposal_v1_actions_must_be_empty_in_stage_a() -> None:
ok = ProposalV1(
proposal_id="P1",
title="Title",
justification="Justification",
priority=50,
confidence=0.6,
proposed_actions=[],
)
assert ok.proposed_actions == []
@pytest.mark.parametrize("priority", [0, 100])
def test_proposal_v1_priority_allows_boundary_values(priority: int) -> None:
ok = ProposalV1(
proposal_id="P2",
title="Title",
justification="Justification",
priority=priority,
confidence=0.5,
proposed_actions=[],
)
assert ok.priority == priority
@pytest.mark.parametrize("priority", [-1, 101])
def test_proposal_v1_priority_rejects_out_of_range(priority: int) -> None:
with pytest.raises(ValidationError):
ProposalV1(
proposal_id="P3",
title="Title",
justification="Justification",
priority=priority,
confidence=0.5,
proposed_actions=[],
)
@pytest.mark.parametrize("confidence", [0.0, 1.0])
def test_proposal_v1_confidence_allows_boundary_values(confidence: float) -> None:
ok = ProposalV1(
proposal_id="P4",
title="Title",
justification="Justification",
priority=50,
confidence=confidence,
proposed_actions=[],
)
assert ok.confidence == confidence
@pytest.mark.parametrize("confidence", [-0.01, 1.01])
def test_proposal_v1_confidence_rejects_out_of_range(confidence: float) -> None:
with pytest.raises(ValidationError):
ProposalV1(
proposal_id="P5",
title="Title",
justification="Justification",
priority=50,
confidence=confidence,
proposed_actions=[],
)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| task = asyncio.create_task(self._publish_qiki_intent(qiki_text)) | ||
| task.add_done_callback(lambda t: t.exception() if not t.cancelled() else None) |
There was a problem hiding this comment.
suggestion (bug_risk): Background task errors are silently dropped; consider logging failures in the done callback.
Because the callback just calls t.exception() and ignores the result, any errors in _publish_qiki_intent are effectively swallowed, making operational issues hard to detect. Consider replacing the lambda with a small helper that checks t.exception() and logs the error (and perhaps the relevant intent text) so failures are visible without blocking the main loop.
Suggested implementation:
from dataclasses import dataclass
from dataclasses import asdict
import asyncio
import functools
import json
import logging
import math
import os
import time
from hashlib import sha256
from typing import Any, Optionalfrom hashlib import sha256
from typing import Any, Optional
def _log_task_error(task: asyncio.Task, intent_text: str) -> None:
"""Log exceptions raised by fire-and-forget background tasks."""
if task.cancelled():
return
exc = task.exception()
if exc is None:
return
logger = logging.getLogger(__name__)
logger.exception(
"Background task failed while publishing qiki intent",
extra={"intent_text": intent_text},
) )
return
# Fire-and-forget: ORION must never block the operator loop on agent I/O.
task = asyncio.create_task(self._publish_qiki_intent(qiki_text))
task.add_done_callback(
functools.partial(_log_task_error, intent_text=qiki_text)
)| -float(getattr(inc, "last_seen", 0.0) or 0.0), | ||
| ) | ||
| ) | ||
| limit = int(os.getenv("OPERATOR_CONSOLE_INTENT_TOP_INCIDENTS", "10") or 10) |
There was a problem hiding this comment.
issue (bug_risk): Parsing the env var for limit can raise and abort intent publishing; consider a safer fallback.
If OPERATOR_CONSOLE_INTENT_TOP_INCIDENTS is set to a non-integer value, int(...) will raise ValueError and block publishing the intent. Consider wrapping this in a try/except with a default (e.g., 10) and a logged warning so misconfigured envs don’t stop QIKI intents from being sent.
| try: | ||
| from qiki.services.operator_console.ui.charts import PpiScopeRenderer |
There was a problem hiding this comment.
suggestion (bug_risk): Broad except Exception around telemetry normalization may hide programming errors.
Catching all exceptions from TelemetrySnapshotModel.normalize_payload and defaulting to {} keeps the operator loop running but can mask real programming or schema issues. Please narrow the except to the expected error types (e.g. ValueError, TypeError) or at least log the exception (ideally rate-limited) so telemetry regressions are visible while preserving robustness.
This PR publishes intent v1 B features for the Orion system as per Stage 0 plan.
Summary by Sourcery
Publish Orion operator intents using the new strict v1 protocol and enrich them with minimal console and telemetry context.
New Features:
Enhancements:
Tests: