Skip to content

feat: add backend client and inventory attestation call#176

Open
jingxiang-z wants to merge 18 commits intomainfrom
feat/agent-backend-client-skeleton
Open

feat: add backend client and inventory attestation call#176
jingxiang-z wants to merge 18 commits intomainfrom
feat/agent-backend-client-skeleton

Conversation

@jingxiang-z
Copy link
Copy Markdown
Collaborator

@jingxiang-z jingxiang-z commented Apr 15, 2026

Summary

Implements the agent-side client and runtime loops for the new backend-facing inventory and attestation architecture described in the design doc.

This PR:

  • Adds internal/backendclient — typed HTTP client for all five agent endpoints (/v1/agent/enroll, /v1/agent/nodes/{nodeUUID}, /v1/agent/nodes/{nodeUUID}/nonce, /v1/agent/nodes/{nodeUUID}/attestation, /v1/agent/token)
  • Adds internal/agentstate — centralized persistent state access (JWT, SAK, backend base URL, node ID) backed by the existing SQLite metadata database, with legacy endpoint fallback
  • Refactors internal/attestation — replaces the monolithic attestation.go with split manager, collector, backend, nonce, and types packages using backendclient and agentstate
  • Adds internal/inventory — source, sink, and manager for periodic node inventory upsert via PUT /v1/agent/nodes/{nodeUUID}
  • Wires enrollment to use backendclient.Enroll and performs a best-effort inventory sync immediately after enrollment
  • Wires both inventory and attestation loops into the server startup path (startInventoryLoop, startAttestationLoop)
  • Removes machine info fields from OTLP resource attributes — inventory fields now go exclusively through PUT /v1/agent/nodes/{nodeUUID}
  • Moves AttestationConfig out of HealthExporterConfig to top-level Config and adds InventoryConfig with Enabled/Interval/InitialInterval controls and env var overrides
  • Deletes the legacy internal/attestation/attestation.go monolith and its associated tests
  • Bumps Go toolchain to 1.26.2

Validation

go test -race ./internal/backendclient/... ./internal/agentstate/... ./internal/attestation/... ./internal/inventory/... ./internal/enrollment/... ./internal/config/... ./cmd/fleetint/...

Follow-up

  • Attach X-GPUHealth-Agent-Mode: direct-inventory-v1 header on OTLP metrics/logs requests to suppress legacy node extraction on the backend
  • Add serial, vendor, model fields to DiskInfo.BlockDevice per the inventory field classification in the design doc

Summary by CodeRabbit

  • New Features

    • Inventory management: periodic local collection with change-only backend sync.
    • Attestation workflow: modular nonce/evidence providers, CLI evidence collector, periodic attestation manager and backend submitter.
    • Agent state persistence: local SQLite storage for backend URL, JWT, SAK, and node ID.
  • Configuration

    • New loop controls: inventory and attestation enable/interval/initial-interval environment variables.
  • Other

    • Go toolchain bumped to 1.26.2; backend endpoint validation relaxed for localhost/loopback.
    • CSV/UI label updated to "Agent Version".

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds agent-state persistence, a backend HTTP client, inventory collection and mapping, and a modular attestation subsystem; enrollment now uses the backend client and persists metadata to SQLite; server starts inventory/attestation loops; CI, deployment, and config defaults/validation updated.

Changes

Cohort / File(s) Summary
Agent State
internal/agentstate/state.go, internal/agentstate/sqlite.go, internal/agentstate/sqlite_test.go
New exported State interface and SQLite-backed implementation for persisted metadata keys (backend_base_url, sak_token, JWT, node ID) with context-aware getters (value, ok, err) and setters that persist and secure state.
Backend Client
internal/backendclient/client.go, internal/backendclient/types.go, internal/backendclient/errors.go, internal/backendclient/client_test.go, internal/backendclient/errors_test.go
New HTTP backend client and DTOs with constructors, JSON request helper, response-size limiting, redirect-rejection, HTTPStatusError type, error mapping for enroll, and comprehensive tests.
Inventory Subsystem
internal/inventory/types.go, internal/inventory/manager.go, internal/inventory/hash.go, internal/inventory/mapper/backend.go, internal/inventory/sink/backend.go, internal/inventory/*_test.go
New inventory model, Manager for periodic collection with jitter/retry, hash-based export suppression, mapper to backend upsert request, backend sink that reads agent state to create clients, and many tests for scheduling, hashing, mapping, and export behavior.
Attestation (Modularized)
internal/attestation/types.go, internal/attestation/manager.go, internal/attestation/collector.go, internal/attestation/nonce.go, internal/attestation/backend.go, internal/attestation/*_test.go
Replace monolith with modular interfaces (NonceProvider, EvidenceCollector, Submitter, Manager), CLI evidence collector, backend adapters that resolve clients from agent state, manager orchestration and tests; old monolithic implementation and its tests removed.
Enrollment & CLI
internal/enrollment/enrollment.go, internal/enrollment/enrollment_test.go, cmd/fleetint/enroll.go, cmd/fleetint/enroll_test.go
Enrollment workflow now validates/normalizes base endpoint, uses backend client Enroll, persists SAK/JWT/backend_base_url to SQLite (secureing state file) and triggers best-effort inventory sync; CLI wiring updated to call new workflow and tests adjusted.
Server & Background Loops
internal/server/server.go, internal/server/server_test.go, cmd/fleetint/run.go
Server now starts inventory and attestation managers using state-backed providers and CLI evidence collector; new env/config loop handling added and loop-interval helpers introduced.
Exporter & Collector
internal/exporter/collector/collector.go, internal/exporter/collector_test.go, internal/exporter/exporter.go, internal/exporter/*
Removed attestation collection/config entries from health collector; HealthData now includes GPU UUID→index mapping; JWT refresh uses backend client with backend_base_url; tests and converters updated to reflect removed attestation and machine-info changes.
Config API & Defaults
internal/config/*
Moved attestation/inventory loop configs to top-level Config, added Inventory config, updated defaults (inventory 1h, attestation initial 5m/24h), and strengthened validation (interval and initial_interval constraints).
Endpoint Validation
internal/endpoint/endpoint.go, internal/endpoint/endpoint_test.go
ValidateBackendEndpoint permits http only for loopback hosts; added DeriveBackendBaseURL to extract scheme+host from legacy endpoints and used as fallback.
CLI Status / Unenroll
cmd/fleetint/status.go, cmd/fleetint/unenroll.go, cmd/fleetint/unenroll_test.go
Status now derives metrics/logs endpoints from backend_base_url with legacy fallback; unenroll deletes backend metadata using agentstate constants; tests updated.
Machine Info Rename
internal/machineinfo/machineinfo.go, internal/machineinfo/machineinfo_test.go
Renamed FleetintVersionAgentVersion (field, JSON key, display label); tests adapted.
Build & Deployment
.github/workflows/ci.yml, .github/workflows/release.yml, .golangci.yml, deployments/*, docs/*, SECURITY.md
Bumped Go toolchain to 1.26.2 across CI/lint; replaced attestation jitter env with inventory/attestation scheduling envs in deployment values and docs; SECURITY.md attestation paths updated.

Sequence Diagram(s)

sequenceDiagram
    participant Manager as Attestation Manager
    participant State as agentstate.State
    participant Backend as backendclient.Client
    participant Collector as EvidenceCollector

    Manager->>State: GetNodeID(ctx)
    State-->>Manager: nodeID
    Manager->>State: GetJWT(ctx)
    State-->>Manager: jwt
    Manager->>Backend: GetNonce(ctx, nodeID, jwt)
    Backend-->>Manager: nonce, refreshTS, refreshedJWT
    alt refreshedJWT present
        Manager->>State: SetJWT(ctx, refreshedJWT)
        State-->>Manager: ok
    end
    Manager->>Collector: Collect(ctx, nonce)
    Collector-->>Manager: SDKResponse
    Manager->>Backend: SubmitAttestation(ctx, nodeID, mappedRequest, jwt)
    Backend-->>Manager: success / error
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I hopped through state and backend lines,

I hashed the snapshots, saved the signs,
I chased a nonce through HTTP,
Collected evidence from CLI,
New loops now hum beneath the pines. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add backend client and inventory attestation call' accurately describes the main changes: it adds internal/backendclient for HTTP communication and integrates inventory/attestation workflows. The title captures the primary feature additions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-backend-client-skeleton

Comment @coderabbitai help to get the list of available commands and usage tips.

@jingxiang-z jingxiang-z changed the title chore: add backend client and inventory skeletons feat: add backend client and inventory skeletons Apr 15, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
internal/agentstate/state.go (1)

21-34: State interface should follow the existing interface segregation pattern or clarify its purpose relative to existing StateStore interfaces.

The State interface is currently unused in the codebase. However, the codebase already demonstrates interface segregation: JWTProvider (GetJWT/SetJWT only) is used by the attestation manager, and package-specific StateStore interfaces exist in both internal/attestationloop/types.go and internal/inventory/types.go. If State is intended as a unified interface, consider splitting it into narrower, focused interfaces (e.g., JWTStore, SAKStore, BackendURLStore, NodeIDStore) to reduce coupling, or clarify how it relates to the existing StateStore pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agentstate/state.go` around lines 21 - 34, The State interface
currently groups unrelated getters/setters (GetBackendBaseURL/SetBackendBaseURL,
GetJWT/SetJWT, GetSAK/SetSAK, GetNodeID/SetNodeID) and conflicts with the
existing interface segregation pattern (see JWTProvider and package StateStore
types in internal/attestationloop/types.go and internal/inventory/types.go); fix
by splitting State into focused interfaces (e.g., JWTStore with GetJWT/SetJWT,
SAKStore with GetSAK/SetSAK, BackendURLStore with
GetBackendBaseURL/SetBackendBaseURL, NodeIDStore with GetNodeID/SetNodeID) and
update any consumers to depend on the narrow interface(s) they actually need, or
alternatively add clear package-level comments explaining State is a unified
facade and ensure it is actually used instead of the existing
JWTProvider/StateStore types.
internal/store/memory.go (1)

49-55: Note: Large struct parameters are dictated by interface contracts.

Static analysis flags snap (496 bytes) and result (136 bytes) as heavy parameters. These signatures are constrained by the inventory.StateStore and attestationloop.StateStore interfaces which define value semantics.

If copy overhead becomes a concern, consider updating the interfaces to use pointer parameters (*inventory.Snapshot, *attestationloop.Result). For this skeleton, the current approach is acceptable.

Also applies to: 77-83

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/store/memory.go` around lines 49 - 55, The review notes that
large-value parameters (inventory.Snapshot in MemoryStore.PutInventory and
attestationloop.Result elsewhere) are expensive to copy but are required by the
current interfaces; to address this either (A) leave the implementation as-is
for this skeleton, or (B) change the method signatures on the interfaces
inventory.StateStore and attestationloop.StateStore to accept pointer types
(*inventory.Snapshot, *attestationloop.Result) and update all implementations
(e.g., MemoryStore.PutInventory) and callers to use pointers to avoid copies;
locate uses by searching for MemoryStore.PutInventory, inventory.Snapshot,
attestationloop.Result, and the StateStore interfaces and consistently switch
signatures and call sites if you opt for the pointer change.
internal/backendclient/client.go (1)

90-98: Consider: Large request structs passed by value.

Static analysis flags NodeUpsertRequest (456 bytes) and AttestationRequest (96 bytes) as heavy parameters. Since these are immediately marshaled to JSON in doJSON, the copy overhead is negligible compared to serialization and network I/O.

If you prefer pointer semantics for consistency with Go conventions on large structs, update the interface signatures. Otherwise, value semantics are acceptable here.

Also applies to: 118-126

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backendclient/client.go` around lines 90 - 98, The NodeUpsertRequest
(456 bytes) and AttestationRequest (96 bytes) are large structs—switch the
public method signatures to pointer semantics for consistency: change func (c
*client) UpsertNode(ctx context.Context, nodeID string, req NodeUpsertRequest,
jwt string) error to use *NodeUpsertRequest, and the analogous
Attest/Attestation method (the one referenced at lines 118-126) to use
*AttestationRequest; update all internal callers to pass pointers, add a nil
check for the pointer params (return an error if nil), and pass the pointer
directly into c.doJSON (no other marshaling changes needed). Ensure any
interface or tests that call these methods are updated to use &struct literals
or existing pointers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/agentstate/state.go`:
- Around line 22-34: The getters on the State interface (GetBackendBaseURL,
GetJWT, GetSAK, GetNodeID) currently return (string, error) which makes "unset"
indistinguishable from an empty persisted value; change each getter signature to
return an explicit presence flag, e.g. (string, bool, error) (or equivalent like
(*string, error) if preferred), update the State interface method signatures
accordingly, and update all implementations of State to return the boolean
presence flag (true when a value exists, false when not persisted) so callers
can reliably differentiate unset vs empty string without relying on errors.

In `@internal/attestationloop/sink/backend.go`:
- Around line 40-45: In Export (method backendSink.Export) add nil-check guards
for the JWT and client dependencies: after calling s.jwt(ctx) validate the
returned jwt is not nil and return a clear error if it is; likewise ensure
s.client is not nil before calling s.client.SubmitAttestation and return an
explicit error if missing; keep the existing call to
s.client.SubmitAttestation(result.NodeID, mapper.ToAttestationRequest(result),
jwt) only after both checks succeed to avoid panics.

In `@internal/backendclient/client_test.go`:
- Around line 42-44: The test assigns c via New(server.URL) and then immediately
overwrites it with NewWithHTTPClient, making the first call dead code; either
remove the first assignment to only test NewWithHTTPClient, or split into two
assertions/tests so both constructors are exercised: keep the New(server.URL)
call in a separate subtest or add checks after each constructor (using variable
names like c1/c2 or separate t.Run blocks) to validate New and NewWithHTTPClient
(with mustParseURL and server.Client()) independently.

In `@internal/backendclient/client.go`:
- Around line 33-36: The file containing the constants userAgent and
maxResponseBodyBytes has gofmt formatting issues; run gofmt -w on that source
file to apply standard Go formatting, then re-stage the file and ensure the
constants block (and surrounding code) are correctly formatted before pushing
the change.

In `@internal/backendclient/errors.go`:
- Around line 32-36: The HTTPStatusError.Error method currently returns the
entire e.Body which can leak sensitive data and create huge logs; modify
HTTPStatusError.Error to sanitize and cap the body fragment before including it
(e.g., trim whitespace, replace newlines, limit to a safe length like 200
characters and append "…(truncated)" if longer) and fall back to a status-only
message when the sanitized fragment is empty; update references to the
HTTPStatusError struct/fields (Error(), Body, StatusCode) so the error string
never exposes unbounded raw response content.

In `@internal/inventory/mapper/backend.go`:
- Around line 42-52: The NodeResources construction (backendclient.NodeResources
in the inventory mapper) currently only maps CPUInfo and MemoryInfo and drops
GPUInfo, DiskInfo, and NICInfo from s.Resources; update the mapping in the
function that builds backendclient.NodeResources to include GPUInfo, DiskInfo,
and NICInfo by mapping s.Resources.GPUInfo, s.Resources.DiskInfo and
s.Resources.NICInfo into the corresponding backendclient types (preserving any
slice elements or nested fields), ensuring any necessary conversions for element
types or fields are implemented alongside CPUInfo and MemoryInfo so no resource
inventory is omitted.

In `@internal/inventory/sink/backend.go`:
- Around line 40-45: The Export method may panic when dependencies are nil:
check the result of s.jwt(ctx) for a nil jwt (in addition to err) and return a
descriptive error instead of proceeding, and also guard s.client (and/or
s.client.UpsertNode) before calling UpsertNode; if s.client is nil return a
clear error (e.g., using fmt.Errorf) so Export fails gracefully rather than
panicking. Ensure these checks are added inside backendSink.Export around the
calls to s.jwt and s.client.UpsertNode and keep error messages referencing
Export, jwt, and client for clarity.

In `@internal/store/memory.go`:
- Around line 32-36: The file internal/store/memory.go has gofmt/formatting
issues around the struct fields (inventory, hasInventory, lastInventoryHash,
lastInventorySyncTS); run the formatter to fix whitespace and alignment by
executing `gofmt -w internal/store/memory.go` (or apply equivalent editor Go
formatting) so the struct and surrounding code conform to gofmt standards.

---

Nitpick comments:
In `@internal/agentstate/state.go`:
- Around line 21-34: The State interface currently groups unrelated
getters/setters (GetBackendBaseURL/SetBackendBaseURL, GetJWT/SetJWT,
GetSAK/SetSAK, GetNodeID/SetNodeID) and conflicts with the existing interface
segregation pattern (see JWTProvider and package StateStore types in
internal/attestationloop/types.go and internal/inventory/types.go); fix by
splitting State into focused interfaces (e.g., JWTStore with GetJWT/SetJWT,
SAKStore with GetSAK/SetSAK, BackendURLStore with
GetBackendBaseURL/SetBackendBaseURL, NodeIDStore with GetNodeID/SetNodeID) and
update any consumers to depend on the narrow interface(s) they actually need, or
alternatively add clear package-level comments explaining State is a unified
facade and ensure it is actually used instead of the existing
JWTProvider/StateStore types.

In `@internal/backendclient/client.go`:
- Around line 90-98: The NodeUpsertRequest (456 bytes) and AttestationRequest
(96 bytes) are large structs—switch the public method signatures to pointer
semantics for consistency: change func (c *client) UpsertNode(ctx
context.Context, nodeID string, req NodeUpsertRequest, jwt string) error to use
*NodeUpsertRequest, and the analogous Attest/Attestation method (the one
referenced at lines 118-126) to use *AttestationRequest; update all internal
callers to pass pointers, add a nil check for the pointer params (return an
error if nil), and pass the pointer directly into c.doJSON (no other marshaling
changes needed). Ensure any interface or tests that call these methods are
updated to use &struct literals or existing pointers.

In `@internal/store/memory.go`:
- Around line 49-55: The review notes that large-value parameters
(inventory.Snapshot in MemoryStore.PutInventory and attestationloop.Result
elsewhere) are expensive to copy but are required by the current interfaces; to
address this either (A) leave the implementation as-is for this skeleton, or (B)
change the method signatures on the interfaces inventory.StateStore and
attestationloop.StateStore to accept pointer types (*inventory.Snapshot,
*attestationloop.Result) and update all implementations (e.g.,
MemoryStore.PutInventory) and callers to use pointers to avoid copies; locate
uses by searching for MemoryStore.PutInventory, inventory.Snapshot,
attestationloop.Result, and the StateStore interfaces and consistently switch
signatures and call sites if you opt for the pointer change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c225851a-de17-4aed-b5a7-ab146dd9e0d7

📥 Commits

Reviewing files that changed from the base of the PR and between 07f3216 and c7fdf10.

📒 Files selected for processing (16)
  • internal/agentstate/state.go
  • internal/attestationloop/manager.go
  • internal/attestationloop/mapper/backend.go
  • internal/attestationloop/sink/backend.go
  • internal/attestationloop/source/source.go
  • internal/attestationloop/types.go
  • internal/backendclient/client.go
  • internal/backendclient/client_test.go
  • internal/backendclient/errors.go
  • internal/backendclient/types.go
  • internal/inventory/manager.go
  • internal/inventory/mapper/backend.go
  • internal/inventory/sink/backend.go
  • internal/inventory/source/source.go
  • internal/inventory/types.go
  • internal/store/memory.go

Comment thread internal/agentstate/state.go
Comment thread internal/attestationloop/sink/backend.go Outdated
Comment thread internal/backendclient/client_test.go Outdated
Comment thread internal/backendclient/client.go
Comment thread internal/backendclient/errors.go
Comment thread internal/inventory/mapper/backend.go
Comment thread internal/inventory/sink/backend.go Outdated
Comment thread internal/store/memory.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
cmd/fleetint/unenroll.go (1)

67-78: Consider generating placeholders dynamically to avoid count mismatch.

The addition of "backend_base_url" is correct and aligns with the enrollment changes. However, manually maintaining the placeholder count in the query string is error-prone.

♻️ Proposed refactor to generate placeholders dynamically
 	keysToDelete := []string{
 		pkgmetadata.MetadataKeyToken,
 		"sak_token",
 		"backend_base_url",
 		"enroll_endpoint",
 		"metrics_endpoint",
 		"logs_endpoint",
 		"nonce_endpoint",
 	}

 	// Build batch delete query
-	query := "DELETE FROM gpud_metadata WHERE key IN (?, ?, ?, ?, ?, ?, ?)"
+	placeholders := make([]string, len(keysToDelete))
+	for i := range placeholders {
+		placeholders[i] = "?"
+	}
+	query := "DELETE FROM gpud_metadata WHERE key IN (" + strings.Join(placeholders, ", ") + ")"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/fleetint/unenroll.go` around lines 67 - 78, The hard-coded query
placeholder count can go out of sync with keysToDelete; instead, generate the
IN-list dynamically from len(keysToDelete) (e.g., build a slice of "?" repeated
len(keysToDelete) and strings.Join it with commas) and then set query =
fmt.Sprintf("DELETE FROM gpud_metadata WHERE key IN (%s)", placeholdersJoined);
update the code around keysToDelete and query to use this generated placeholder
string and import strings/fmt as needed.
internal/agentstate/sqlite.go (1)

56-62: Consider defining a constant for "sak_token".

The string literal "sak_token" is used here and in cmd/fleetint/enroll.go and cmd/fleetint/unenroll.go. A shared constant would prevent typo-related bugs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agentstate/sqlite.go` around lines 56 - 62, Introduce a shared
constant for the metadata key currently hardcoded as "sak_token" and replace the
literal in sqliteState.GetSAK and sqliteState.SetSAK with that constant; update
the other usages in cmd/fleetint/enroll.go and cmd/fleetint/unenroll.go to
import or reference the same constant so all places (including the getMetadata
and setMetadata calls in GetSAK and SetSAK) use the single source of truth
(e.g., define SAKMetadataKey or similar and use it in GetSAK, SetSAK, enroll,
and unenroll).
internal/inventory/manager_run_test.go (1)

57-75: Potential flakiness with time.Sleep for synchronization.

The test relies on time.Sleep(25 * time.Millisecond) to ensure at least one export occurs before cancellation. While this should work given the 10ms interval, sleep-based synchronization can be flaky under CI load. Consider using a channel signal from the sink when an export occurs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/manager_run_test.go` around lines 57 - 75, The test
TestManagerRunStopsOnContextCancel uses time.Sleep for synchronization which can
be flaky; modify the fakeSink used by the test to include a notification channel
(e.g., add a field like exportedCh chan struct{} to the fakeSink and close/send
on it inside the sink's Export method), then in
TestManagerRunStopsOnContextCancel wait on that channel with a timeout instead
of sleeping (select on exportedCh and a time.After), then cancel the context and
assert NewManager(...).Run(ctx) returns context.Canceled and that sink.exported
is not empty; update fakeSink and the test to use exportedCh to
deterministically detect the first export.
cmd/fleetint/enroll.go (1)

169-171: Use a shared constant for "backend_base_url" to avoid duplication across files.

The hardcoded string "backend_base_url" appears in multiple locations (enroll.go:169 and unenroll.go:70) but a matching constant metadataKeyBackendBaseURL already exists in internal/agentstate/sqlite.go:29. Either export this constant from agentstate and use it consistently, or define a shared constant in a more appropriate location.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/fleetint/enroll.go` around lines 169 - 171, Replace the hardcoded
"backend_base_url" string used in pkgmetadata.SetMetadata calls with a single
shared constant; either export the existing metadataKeyBackendBaseURL from
internal/agentstate (rename to MetadataKeyBackendBaseURL) and import that
constant into cmd/fleetint/enroll.go and cmd/fleetint/unenroll.go, or define a
shared constant in a common package (e.g., pkgmetadata) and update both
enroll.go and unenroll.go to use that constant instead of the literal; update
imports and any references to the old unexported name accordingly.
cmd/fleetint/enroll_test.go (1)

79-90: Assert performInventorySync call behavior explicitly.

These tests stub performInventorySync but don’t verify whether it was invoked. Adding explicit assertions will catch regressions in the post-enroll flow.

✅ Suggested test hardening
 func TestEnrollCommandBlocksOnFailedPrecheck(t *testing.T) {
@@
 	enrollmentCalled := false
+	inventorySyncCalled := false
@@
-	performInventorySync = func(context.Context) error { return nil }
+	performInventorySync = func(context.Context) error {
+		inventorySyncCalled = true
+		return nil
+	}
@@
 	assert.Contains(t, out.String(), "Enrollment skipped: precheck failed")
 	assert.False(t, enrollmentCalled)
+	assert.False(t, inventorySyncCalled)
 }

 func TestEnrollCommandForceBypassesFailedPrecheck(t *testing.T) {
@@
 	enrollmentCalled := false
+	inventorySyncCalled := false
@@
-	performInventorySync = func(context.Context) error { return nil }
+	performInventorySync = func(context.Context) error {
+		inventorySyncCalled = true
+		return nil
+	}
@@
 	require.NoError(t, err)
 	assert.True(t, enrollmentCalled)
+	assert.True(t, inventorySyncCalled)
 }

Also applies to: 120-129

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/fleetint/enroll_test.go` around lines 79 - 90, The test stubs
performInventorySync but never asserts whether it was called; update the enroll
tests (the block around the existing performInventorySync stub and the similar
block at lines 120-129) to replace the noop stub with a spy that sets a boolean
(e.g., inventorySyncCalled) or increments a counter when invoked, and add
assertions (require.True/False or assert.Equal) after app.Run to explicitly
verify that performInventorySync was (or was not) called as expected; reference
the performInventorySync symbol and the existing enrollmentCalled boolean to
implement the spy and assertions.
internal/inventory/source/source_test.go (1)

37-114: Add explicit error-path tests for source guards.

Happy-path mapping looks good, but this file should also cover: nil collector, collector error, and nil machine-info returns to lock in Collect guard behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/source/source_test.go` around lines 37 - 114, Add tests
exercising Collect's error/guard paths: create three new subtests that call
NewMachineInfoSource with (1) a nil collector and assert Collect returns an
error, (2) a fakeMachineInfoCollector whose Collect returns an error and assert
that error is propagated, and (3) a fakeMachineInfoCollector that returns (nil,
nil) for MachineInfo and assert Collect returns an error or empty snapshot per
the current guard behavior; reference NewMachineInfoSource,
fakeMachineInfoCollector and its Collect method, and the source.Collect call to
locate where to add these assertions.
internal/inventory/sink/backend_test.go (1)

120-135: Verify base URL is passed from state into clientFactory.

TestBackendSinkExportUsesState checks JWT usage but currently does not assert the baseURL argument used to create the backend client.

🔍 Suggested assertion addition
 func TestBackendSinkExportUsesState(t *testing.T) {
 	client := &fakeClient{}
+	var gotBaseURL string
 	s := &backendSink{
 		state: fakeState{
 			baseURL: "https://example.com",
 			jwt:     "jwt-token",
 		},
-		clientFactory: func(string) (backendclient.Client, error) {
+		clientFactory: func(rawBaseURL string) (backendclient.Client, error) {
+			gotBaseURL = rawBaseURL
 			return client, nil
 		},
 	}
@@
 	require.NoError(t, err)
+	require.Equal(t, "https://example.com", gotBaseURL)
 	require.Equal(t, "node-1", client.nodeID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/sink/backend_test.go` around lines 120 - 135, Update the
test TestBackendSinkExportUsesState to verify the baseURL passed into the
clientFactory matches the sink state’s backend endpoint: modify the
clientFactory closure used in the test (the func(string) (backendclient.Client,
error) assigned to clientFactory) to capture its string argument into a local
variable (e.g., gotBaseURL) and after calling s.Export with the
inventory.Snapshot, assert that gotBaseURL equals the expected backend endpoint
from the sink state; keep the existing assertions for client.nodeID, client.jwt
and client.req.
internal/backendclient/client.go (1)

213-239: Consider wrapping errors to preserve diagnostic context.

The mapped errors replace the original HTTPStatusError, discarding potentially useful diagnostic information (e.g., response body). For debugging production issues, consider wrapping instead:

return fmt.Errorf("the token used in the enrollment is not in the correct format: %w", statusErr)

This preserves the ability to inspect the original status code and body via errors.As.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backendclient/client.go` around lines 213 - 239, The mapped errors
in mapEnrollError replace the original HTTPStatusError (statusErr) and lose
diagnostic context; update each returned error to wrap the original statusErr
using the Go %w wrapping (e.g., return fmt.Errorf("<message>: %w", statusErr))
so callers can still inspect the underlying HTTPStatusError (status code, body)
via errors.As; apply this change to every case in mapEnrollError while leaving
the default branch returning err unchanged.
internal/attestationloop/manager.go (3)

115-122: Clarify semantics of lastResult on submission failure.

lastResult is updated before submission (line 117), so it reflects the last attempted collection even if submission fails (line 119). If consumers expect lastResult to represent successfully submitted results only, consider updating it after successful submission instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestationloop/manager.go` around lines 115 - 122, The code sets
m.lastResult before calling m.submitter.Submit so lastResult reflects attempted
collections even when Submit fails; move the clone-and-assign of m.lastResult to
after successful submission (i.e., only assign m.lastResult = &cloned after
m.submitter.Submit(ctx, result, jwt) returns nil), keeping the same clone
behavior and mutex (use m.mu.Lock()/Unlock() around the assignment), and leave
Submit called with the original result; this ensures lastResult represents only
successfully submitted results.

51-68: Consider validating required dependencies in constructor.

Dependencies are validated in CollectOnce (line 78), but deferring validation until runtime delays detection of misconfiguration. For production, consider validating non-nil dependencies in NewManager to fail fast during initialization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestationloop/manager.go` around lines 51 - 68, NewManager should
fail fast by validating required dependencies instead of deferring to
CollectOnce; update the constructor (NewManager) to check that nodeIDProvider,
jwtProvider, nonceProvider, collector, submitter and interval are non-nil/valid
and return an error (or panic consistently with project conventions) when any
required dependency is missing, and update callers to handle the changed
signature if you switch to returning (Manager, error); this ensures
misconfiguration is detected at initialization rather than in CollectOnce.

115-118: Shallow copy shares underlying slice data in SDKResponse.Evidences.

cloned := *result creates a shallow copy of the Result struct. Since SDKResponse contains Evidences []EvidenceItem (a slice, which is a reference type), both the cloned result and the original share the same underlying array. If the original result is modified after this call, m.lastResult will reflect those changes, potentially causing data corruption in concurrent scenarios.

Either implement a deep copy, or document that result must not be modified after this call.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestationloop/manager.go` around lines 115 - 118, The shallow copy
using "cloned := *result" leaves SDKResponse.Evidences shared between the
original and m.lastResult; change to a deep copy: allocate a new SDKResponse (or
Result) instance, copy primitive fields, and create a new Evidences slice then
copy or deep-clone each EvidenceItem into it before assigning to m.lastResult
(ensure you deep-copy any reference fields inside EvidenceItem if present);
update the assignment that currently uses "cloned := *result" and "m.lastResult
= &cloned" to store the fully independent copy instead (alternatively document
immutability of result if you prefer not to copy).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/agentstate/sqlite.go`:
- Around line 102-107: The code silently ignores an error from s.stateFileFn(),
which can mask path resolution failures even if openReadWrite() succeeded;
update the block around stateFile, err := s.stateFileFn() to handle err != nil
by returning or propagating it (e.g., return fmt.Errorf("stateFileFn: %w", err))
instead of swallowing it, and then call
config.SecureStateFilePermissions(stateFile) only when err == nil; reference the
s.stateFileFn function and config.SecureStateFilePermissions to locate and fix
the logic.

In `@internal/backendclient/client_test.go`:
- Around line 34-40: The test HTTP handler passed to httptest.NewTLSServer in
client_test.go uses require.* inside the handler goroutine which can call
t.FailNow() and is unsafe; change these checks to either capture request values
(method, path, Authorization, User-Agent, etc.) into local variables and perform
require/assert checks in the main test goroutine after the client request
returns, or replace require.* with assert.* inside the handler (e.g., replace
require.Equal calls for http.MethodPost, "/v1/agent/enroll", headers with
assert.Equal). Update all similar handlers noted (the handler in the
NewTLSServer call and the other handler blocks referenced around lines 62-69,
83-86, 104-107, 120-127) so assertions run safely from the main test goroutine
or use assert.*.

---

Nitpick comments:
In `@cmd/fleetint/enroll_test.go`:
- Around line 79-90: The test stubs performInventorySync but never asserts
whether it was called; update the enroll tests (the block around the existing
performInventorySync stub and the similar block at lines 120-129) to replace the
noop stub with a spy that sets a boolean (e.g., inventorySyncCalled) or
increments a counter when invoked, and add assertions (require.True/False or
assert.Equal) after app.Run to explicitly verify that performInventorySync was
(or was not) called as expected; reference the performInventorySync symbol and
the existing enrollmentCalled boolean to implement the spy and assertions.

In `@cmd/fleetint/enroll.go`:
- Around line 169-171: Replace the hardcoded "backend_base_url" string used in
pkgmetadata.SetMetadata calls with a single shared constant; either export the
existing metadataKeyBackendBaseURL from internal/agentstate (rename to
MetadataKeyBackendBaseURL) and import that constant into cmd/fleetint/enroll.go
and cmd/fleetint/unenroll.go, or define a shared constant in a common package
(e.g., pkgmetadata) and update both enroll.go and unenroll.go to use that
constant instead of the literal; update imports and any references to the old
unexported name accordingly.

In `@cmd/fleetint/unenroll.go`:
- Around line 67-78: The hard-coded query placeholder count can go out of sync
with keysToDelete; instead, generate the IN-list dynamically from
len(keysToDelete) (e.g., build a slice of "?" repeated len(keysToDelete) and
strings.Join it with commas) and then set query = fmt.Sprintf("DELETE FROM
gpud_metadata WHERE key IN (%s)", placeholdersJoined); update the code around
keysToDelete and query to use this generated placeholder string and import
strings/fmt as needed.

In `@internal/agentstate/sqlite.go`:
- Around line 56-62: Introduce a shared constant for the metadata key currently
hardcoded as "sak_token" and replace the literal in sqliteState.GetSAK and
sqliteState.SetSAK with that constant; update the other usages in
cmd/fleetint/enroll.go and cmd/fleetint/unenroll.go to import or reference the
same constant so all places (including the getMetadata and setMetadata calls in
GetSAK and SetSAK) use the single source of truth (e.g., define SAKMetadataKey
or similar and use it in GetSAK, SetSAK, enroll, and unenroll).

In `@internal/attestationloop/manager.go`:
- Around line 115-122: The code sets m.lastResult before calling
m.submitter.Submit so lastResult reflects attempted collections even when Submit
fails; move the clone-and-assign of m.lastResult to after successful submission
(i.e., only assign m.lastResult = &cloned after m.submitter.Submit(ctx, result,
jwt) returns nil), keeping the same clone behavior and mutex (use
m.mu.Lock()/Unlock() around the assignment), and leave Submit called with the
original result; this ensures lastResult represents only successfully submitted
results.
- Around line 51-68: NewManager should fail fast by validating required
dependencies instead of deferring to CollectOnce; update the constructor
(NewManager) to check that nodeIDProvider, jwtProvider, nonceProvider,
collector, submitter and interval are non-nil/valid and return an error (or
panic consistently with project conventions) when any required dependency is
missing, and update callers to handle the changed signature if you switch to
returning (Manager, error); this ensures misconfiguration is detected at
initialization rather than in CollectOnce.
- Around line 115-118: The shallow copy using "cloned := *result" leaves
SDKResponse.Evidences shared between the original and m.lastResult; change to a
deep copy: allocate a new SDKResponse (or Result) instance, copy primitive
fields, and create a new Evidences slice then copy or deep-clone each
EvidenceItem into it before assigning to m.lastResult (ensure you deep-copy any
reference fields inside EvidenceItem if present); update the assignment that
currently uses "cloned := *result" and "m.lastResult = &cloned" to store the
fully independent copy instead (alternatively document immutability of result if
you prefer not to copy).

In `@internal/backendclient/client.go`:
- Around line 213-239: The mapped errors in mapEnrollError replace the original
HTTPStatusError (statusErr) and lose diagnostic context; update each returned
error to wrap the original statusErr using the Go %w wrapping (e.g., return
fmt.Errorf("<message>: %w", statusErr)) so callers can still inspect the
underlying HTTPStatusError (status code, body) via errors.As; apply this change
to every case in mapEnrollError while leaving the default branch returning err
unchanged.

In `@internal/inventory/manager_run_test.go`:
- Around line 57-75: The test TestManagerRunStopsOnContextCancel uses time.Sleep
for synchronization which can be flaky; modify the fakeSink used by the test to
include a notification channel (e.g., add a field like exportedCh chan struct{}
to the fakeSink and close/send on it inside the sink's Export method), then in
TestManagerRunStopsOnContextCancel wait on that channel with a timeout instead
of sleeping (select on exportedCh and a time.After), then cancel the context and
assert NewManager(...).Run(ctx) returns context.Canceled and that sink.exported
is not empty; update fakeSink and the test to use exportedCh to
deterministically detect the first export.

In `@internal/inventory/sink/backend_test.go`:
- Around line 120-135: Update the test TestBackendSinkExportUsesState to verify
the baseURL passed into the clientFactory matches the sink state’s backend
endpoint: modify the clientFactory closure used in the test (the func(string)
(backendclient.Client, error) assigned to clientFactory) to capture its string
argument into a local variable (e.g., gotBaseURL) and after calling s.Export
with the inventory.Snapshot, assert that gotBaseURL equals the expected backend
endpoint from the sink state; keep the existing assertions for client.nodeID,
client.jwt and client.req.

In `@internal/inventory/source/source_test.go`:
- Around line 37-114: Add tests exercising Collect's error/guard paths: create
three new subtests that call NewMachineInfoSource with (1) a nil collector and
assert Collect returns an error, (2) a fakeMachineInfoCollector whose Collect
returns an error and assert that error is propagated, and (3) a
fakeMachineInfoCollector that returns (nil, nil) for MachineInfo and assert
Collect returns an error or empty snapshot per the current guard behavior;
reference NewMachineInfoSource, fakeMachineInfoCollector and its Collect method,
and the source.Collect call to locate where to add these assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e5a983be-90ba-4bca-9be9-31348de6a041

📥 Commits

Reviewing files that changed from the base of the PR and between c7fdf10 and c90aee0.

📒 Files selected for processing (28)
  • .github/workflows/ci.yml
  • .github/workflows/release.yml
  • .golangci.yml
  • cmd/fleetint/enroll.go
  • cmd/fleetint/enroll_test.go
  • cmd/fleetint/unenroll.go
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestationloop/backend.go
  • internal/attestationloop/manager.go
  • internal/attestationloop/types.go
  • internal/backendclient/client.go
  • internal/backendclient/client_test.go
  • internal/backendclient/errors.go
  • internal/backendclient/errors_test.go
  • internal/inventory/hash.go
  • internal/inventory/hash_test.go
  • internal/inventory/manager.go
  • internal/inventory/manager_run_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/mapper/backend.go
  • internal/inventory/mapper/backend_test.go
  • internal/inventory/sink/backend.go
  • internal/inventory/sink/backend_test.go
  • internal/inventory/source/source.go
  • internal/inventory/source/source_test.go
  • internal/inventory/types.go
✅ Files skipped from review due to trivial changes (7)
  • .golangci.yml
  • .github/workflows/release.yml
  • .github/workflows/ci.yml
  • internal/inventory/mapper/backend_test.go
  • internal/agentstate/state.go
  • internal/inventory/types.go
  • internal/backendclient/errors_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/attestationloop/types.go
  • internal/inventory/manager.go
  • internal/backendclient/errors.go

Comment thread internal/agentstate/sqlite.go
Comment thread internal/backendclient/client_test.go
@jingxiang-z jingxiang-z changed the title feat: add backend client and inventory skeletons feat: add backend client and inventory attestation call Apr 21, 2026
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@jingxiang-z jingxiang-z force-pushed the feat/agent-backend-client-skeleton branch from b014d75 to 5a8c3a1 Compare April 21, 2026 21:00
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@jingxiang-z jingxiang-z force-pushed the feat/agent-backend-client-skeleton branch from 5a8c3a1 to 8d55dd9 Compare April 21, 2026 21:01
@jingxiang-z jingxiang-z marked this pull request as ready for review April 21, 2026 21:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d55dd99e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backendclient/client.go
Comment thread internal/agentstate/sqlite.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (7)
internal/exporter/converter/csv_test.go (1)

180-183: Add an explicit assertion for Agent Version CSV output.

You set AgentVersion in fixtures, but the tests don’t currently verify the corresponding emitted row. Adding that assertion will protect this rename from regressions.

✅ Suggested assertion addition
 	assert.Greater(t, len(records), 5)
 	assert.Equal(t, []string{"attribute_name", "attribute_value"}, records[0])
+	assert.Contains(t, records, []string{"Agent Version", "0.1.5"})
 	assert.Contains(t, records, []string{"DCGM Version", "4.2.3"})

Also applies to: 282-282

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exporter/converter/csv_test.go` around lines 180 - 183, Add an
explicit assertion that the CSV output contains the "Agent Version" row
corresponding to the fixture's AgentVersion ("0.1.5"). Locate the test that
builds the fixtures (uses the AgentVersion field) and the variable holding
produced CSV output (the test's CSV string/rows), then add an assertion that a
row exists with the expected header/label (e.g., "Agent Version") and value
"0.1.5" — mirror existing assertions for DCGMVersion/OSImage/KernelVersion so
the test fails if that mapping is renamed or removed.
internal/attestation/nonce_test.go (1)

28-50: Assert the forwarded nodeID and JWT too.

This test currently passes even if backendNonceProvider.GetNonce swaps or drops the request arguments, because the stub ignores its inputs and only returns a canned response.

Proposed test hardening
 type testNonceClient struct {
-	resp *backendclient.NonceResponse
+	resp      *backendclient.NonceResponse
+	gotNodeID string
+	gotJWT    string
 }
 
-func (c *testNonceClient) GetNonce(context.Context, string, string) (*backendclient.NonceResponse, error) {
+func (c *testNonceClient) GetNonce(_ context.Context, nodeID, jwt string) (*backendclient.NonceResponse, error) {
+	c.gotNodeID = nodeID
+	c.gotJWT = jwt
 	return c.resp, nil
 }
 
 func TestBackendNonceProvider(t *testing.T) {
 	refreshTS := time.Now().UTC()
-	provider := NewBackendNonceProvider(&testNonceClient{
+	client := &testNonceClient{
 		resp: &backendclient.NonceResponse{
 			Nonce:                 "abc123",
 			NonceRefreshTimestamp: refreshTS,
 			JWTAssertion:          "new-jwt",
 		},
-	})
+	}
+	provider := NewBackendNonceProvider(client)
 
 	nonce, ts, jwt, err := provider.GetNonce(context.Background(), "node-1", "jwt-token")
 	require.NoError(t, err)
 	require.Equal(t, "abc123", nonce)
 	require.Equal(t, refreshTS, ts)
 	require.Equal(t, "new-jwt", jwt)
+	require.Equal(t, "node-1", client.gotNodeID)
+	require.Equal(t, "jwt-token", client.gotJWT)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestation/nonce_test.go` around lines 28 - 50, The test uses a
canned testNonceClient that ignores its inputs, so TestBackendNonceProvider
doesn't verify that NewBackendNonceProvider forwards nodeID and JWT; update
testNonceClient (and its GetNonce method) to capture the provided nodeID and jwt
arguments (or validate them inline) and have the test assert the captured values
after calling provider.GetNonce; reference testNonceClient.GetNonce,
TestBackendNonceProvider, and NewBackendNonceProvider to locate and change the
stub so it fails the test if nodeID or JWT are not the expected "node-1" and
"jwt-token".
internal/exporter/converter/otlp_test.go (1)

313-335: Widen this regression check beyond dcgmVersion.

The renamed test only proves one machine-info attribute disappeared. If another removed hardware field starts leaking back into the OTLP resource, this still passes. I'd tighten this with either a small allowlist for resource keys or explicit checks for the other machine-info attributes that were intentionally removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exporter/converter/otlp_test.go` around lines 313 - 335, Update the
TestOTLPConverter_Convert_IgnoresMachineInfoInResource test to assert that all
known MachineInfo fields are absent from the OTLP resource instead of only
checking "dcgmVersion"; locate the test function
TestOTLPConverter_Convert_IgnoresMachineInfoInResource and the loop over
rm.Resource.Attributes produced by NewOTLPConverter().Convert, build a set/list
of the MachineInfo keys that were intentionally removed (e.g., dcgmVersion and
the other MachineInfo field names used in machineinfo.MachineInfo) and assert
that no attribute.Key matches any of those names, or alternatively implement a
tight allowlist of acceptable resource keys and assert every attribute.Key is in
that allowlist; this ensures any leaking hardware fields from
machineinfo.MachineInfo are caught.
internal/enrollment/enrollment.go (1)

80-82: Redundant SecureStateFilePermissions call.

SecureStateFilePermissions is called twice: once on line 80 before metadata operations, and again on lines 96-98 after. The second call appears redundant since the file permissions were already secured.

♻️ Proposed fix to remove redundant call
 	if err := pkgmetadata.SetMetadata(ctx, dbRW, "backend_base_url", baseURL); err != nil {
 		return fmt.Errorf("failed to set backend base URL: %w", err)
 	}
-	if err := config.SecureStateFilePermissions(stateFile); err != nil {
-		return fmt.Errorf("failed to secure state database permissions: %w", err)
-	}
 	return nil
 }

Also applies to: 96-98

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/enrollment/enrollment.go` around lines 80 - 82, Remove the duplicate
SecureStateFilePermissions(stateFile) invocation: keep the initial call that
already secures the state file and delete the subsequent call and its associated
error handling block (the second SecureStateFilePermissions(stateFile) and its
fmt.Errorf return). Ensure no other logic depends on the second invocation and
preserve existing error handling from the first SecureStateFilePermissions call
so stateFile remains secured exactly once.
internal/inventory/manager_run_test.go (1)

57-75: Time-based synchronization may cause flaky tests in CI.

Using time.Sleep(25 * time.Millisecond) for synchronization before cancellation can be flaky under resource contention. Consider using a synchronization primitive or increasing the margin.

♻️ Proposed fix using a channel to signal readiness
 func TestManagerRunStopsOnContextCancel(t *testing.T) {
 	src := &fakeSource{
 		snapshots: []*Snapshot{{MachineID: "machine-1", Hostname: "host-a"}},
 	}
 	sink := &fakeSink{}
 	ctx, cancel := context.WithCancel(context.Background())
 
 	done := make(chan error, 1)
 	go func() {
 		done <- NewManager(src, sink, InventoryConfig{Interval: 10 * time.Millisecond}).Run(ctx)
 	}()
 
-	time.Sleep(25 * time.Millisecond)
+	// Allow sufficient iterations before canceling
+	time.Sleep(50 * time.Millisecond)
 	cancel()
 
 	err := <-done
 	require.ErrorIs(t, err, context.Canceled)
 	require.NotEmpty(t, sink.exported)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/manager_run_test.go` around lines 57 - 75, The test uses
time.Sleep to wait for the manager to perform work and then cancels, which is
flaky; modify the test to use a synchronization channel instead: add a ready
signal (e.g., a chan struct{}) to the fakeSource or fakeSink that
NewManager/Manager.Run will cause to be closed/sent when the first
snapshot/export occurs, then in TestManagerRunStopsOnContextCancel wait on that
ready channel (with a sensible timeout) before calling cancel; update the
fakeSource/fakeSink helpers to send on the ready channel when they handle the
snapshot so the test deterministically proceeds without time.Sleep.
internal/inventory/source/source.go (1)

140-150: Consider documenting the NIC interface selection criteria.

NetPrivateIP is set from the first private interface in the slice. The selection order depends on the underlying collector's ordering, which may not be deterministic across reboots.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/source/source.go` around lines 140 - 150, NetPrivateIP is
currently assigned from the first element of info.NICInfo.PrivateIPInterfaces
which relies on collector ordering; update the code and/or comments to make the
selection deterministic and explicit: implement a clear selection rule (e.g.,
prefer non-loopback, non-empty IPs, prefer interface named "eth0" or sort by
Interface/MAC and pick the first) in the block that populates
snap.Resources.NICInfo.PrivateIPInterfaces and then set snap.NetPrivateIP using
that deterministic choice, and add a brief comment above this logic describing
the selection criteria; reference the variables/functions NetPrivateIP,
info.NICInfo.PrivateIPInterfaces, and snap.Resources.NICInfo when making the
change.
internal/exporter/collector/collector.go (1)

87-87: Consider removing the unused parameter if this API is internal-only; otherwise, maintain backward compatibility.

The _ any parameter is unused and only has one call site (exporter.go:82) where nil is passed. If collector.New is an internal-only API not exposed to external packages, removing this parameter is safe and clean. However, if external consumers depend on this exported function, the removal would be a breaking change. The current _ any pattern is acceptable for maintaining API compatibility during a transition away from the attestation manager feature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exporter/collector/collector.go` at line 87, collector.New currently
declares an unused parameter `_ any`; if this API is internal-only, remove the
`_ any` parameter from the collector.New function signature and from its sole
call site (the call in exporter.go) and update any related docs/tests to reflect
the simplified signature; if the function is part of the public API, keep the
parameter to preserve compatibility but add a clear comment on the collector.New
declaration explaining the parameter is intentionally unused (for attestation
manager compatibility) so future maintainers know why it remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/fleetint/status.go`:
- Around line 173-176: The call to endpoint.ValidateBackendEndpoint(baseURL) in
readEnrollmentStatus currently returns an error that aborts the entire fleetint
status flow; change this so a malformed backend_base_url produces a warning (log
via the same logger used in readEnrollmentStatus or fmt.Fprintf(os.Stderr, ...))
instead of returning, then fall back to the legacy metadata path or treat the
enrollment as "not enrolled" so status still prints; specifically, replace the
current return on err after ValidateBackendEndpoint with a warning that includes
the raw baseURL and err, and ensure the rest of readEnrollmentStatus proceeds
using the legacy keys or a nil/empty validated value to determine enrollment.

In `@internal/attestation/backend.go`:
- Around line 102-112: The wrapper is not providing the node UUID so submissions
are unscoped; update the state-backed path to resolve and include the node
identifier before forwarding. Specifically, in NewStateBackendSubmitter /
stateSubmitter.Submit flow, read the node UUID from the provided
agentstate.State (via the state accessor on stateBackendClientFactory or
directly from the state object) and ensure the backend client you obtain
(stateBackendClientFactory.client) is created or scoped using that node UUID, or
attach the node UUID into the Result payload before calling
NewBackendSubmitter(client).Submit; modify either
stateBackendClientFactory.client to accept/derive the node ID or add a step in
stateSubmitter.Submit to fetch the node ID (e.g., state.NodeID() or equivalent)
and pass it into the backend client or result so the attestation request is
node-scoped.

In `@internal/attestation/collector.go`:
- Around line 63-66: The current logic calls cmd.Run() then only returns the
execution error if stdout is empty, which can mask process failures when partial
stdout exists; change the flow around cmd.Run()/stdout/stderr so that if
cmd.Run() returns a non-nil err you immediately return a descriptive execution
error (including stderr and the wrapped err) — i.e., treat err from cmd.Run() as
authoritative and do not attempt to parse stdout when err != nil; only attempt
JSON parsing of stdout when cmd.Run() succeeded. Refer to the cmd.Run(), stdout,
stderr variables and the existing "attestation CLI execution failed" error
message when implementing this change.

In `@internal/attestation/manager_test.go`:
- Around line 98-115: Update the test
TestCollectOnceCollectorFailureStillSubmitsFailureResult to assert that
manager.CollectOnce returns an error when the evidence collector fails while
still verifying the failure result was submitted by testSubmitter (check
submitter.submitted.result and its Success=false); similarly update the Run
retry test to assert that Collect/Submit were invoked more than once by using
the testSubmitter invocation count (or a counter on testEvidenceCollector) and
verify that Run honors RetryInterval by observing multiple submissions rather
than just a single failed collect then sleep; locate references to CollectOnce,
Run, RetryInterval, Collect, Submit, submitter/testSubmitter and update
assertions accordingly to expect an error return and multiple submits.

In `@internal/attestation/manager.go`:
- Around line 145-168: The collector error (err returned from
m.collector.Collect) is recorded in result but not propagated; change the flow
in Run so that after calling m.submitter.Submit(ctx, result, jwt) you return the
original collection error when present. Concretely, save the collector error
(err) from m.collector.Collect, still populate and submit the Result, then if
that saved error is non-nil return nil and that error (instead of nil),
otherwise return the successful result; refer to m.collector.Collect, result,
err, and m.submitter.Submit to locate the code to adjust.

In `@internal/backendclient/client.go`:
- Around line 63-70: NewWithHTTPClient currently allows baseURL == nil which
later causes a panic when code calls baseURL.JoinPath; change NewWithHTTPClient
to validate baseURL and return a typed error instead of constructing a client
with a nil URL. Update the function signature to return (Client, error), add a
package-level error (e.g., ErrNilBaseURL) or fmt.Errorf("nil baseURL"), check if
baseURL == nil and return nil, ErrNilBaseURL, and otherwise return &client{...},
nil; do the same defensive change for the other constructor that accepts baseURL
(the function around lines 156-160) and update all call sites to handle the new
(Client, error) return.

In `@internal/exporter/exporter.go`:
- Around line 349-367: In refreshJWTToken the code trusts backend_base_url and
treats a malformed legacy endpoint from readLegacyBackendBaseURL as fatal;
change the logic to validate URLs before using them (use net/url parsing) and to
tolerate bad entries: after calling pkgmetadata.ReadMetadata and getting
baseURL, parse and accept it only if valid; when iterating legacy endpoints
inside readLegacyBackendBaseURL (and the similar block around lines 396-409),
skip/log malformed entries instead of returning an error and continue to the
next candidate; only return an error if no valid URL candidates remain; ensure
newBackendClient and client.Enroll are only invoked with a validated, parsed
baseURL.
- Around line 257-283: The current logic leaves metricsEndpoint/logsEndpoint
empty when pkgmetadata.ReadMetadata returns a non-empty but invalid
backend_base_url; update the branch handling validateErr (and per-endpoint
joinErr) to fall back to legacy endpoints by calling
e.readValidatedEndpoint(ctx, "metrics_endpoint") and
e.readValidatedEndpoint(ctx, "logs_endpoint") so that
endpoint.ValidateBackendEndpoint and endpoint.JoinPath failures don't disable
HTTP export; refer to pkgmetadata.ReadMetadata,
endpoint.ValidateBackendEndpoint, endpoint.JoinPath, e.readValidatedEndpoint,
and the metricsEndpoint/logsEndpoint variables when making the change.

In `@internal/inventory/manager.go`:
- Around line 97-113: CollectOnce can trigger duplicate exports when called
concurrently because the shouldExport check occurs outside m.mu and
lastExportedHash is updated later; fix by serializing export operations: add an
export mutex (e.g., m.exportMu) and, when shouldExport is true, acquire
m.exportMu around the call to m.sink.Export(ctx, snap) and the update to
m.lastExportedHash (use m.mu only for snapshot/lastSnapshot access), so two
goroutines cannot export the same hash concurrently; alternatively, you can set
m.lastExportedHash under m.mu before releasing it and then call Export, but if
you choose that approach ensure you revert the hash on non-ErrNotReady
failures—prefer the export mutex to keep semantics simple.

---

Nitpick comments:
In `@internal/attestation/nonce_test.go`:
- Around line 28-50: The test uses a canned testNonceClient that ignores its
inputs, so TestBackendNonceProvider doesn't verify that NewBackendNonceProvider
forwards nodeID and JWT; update testNonceClient (and its GetNonce method) to
capture the provided nodeID and jwt arguments (or validate them inline) and have
the test assert the captured values after calling provider.GetNonce; reference
testNonceClient.GetNonce, TestBackendNonceProvider, and NewBackendNonceProvider
to locate and change the stub so it fails the test if nodeID or JWT are not the
expected "node-1" and "jwt-token".

In `@internal/enrollment/enrollment.go`:
- Around line 80-82: Remove the duplicate SecureStateFilePermissions(stateFile)
invocation: keep the initial call that already secures the state file and delete
the subsequent call and its associated error handling block (the second
SecureStateFilePermissions(stateFile) and its fmt.Errorf return). Ensure no
other logic depends on the second invocation and preserve existing error
handling from the first SecureStateFilePermissions call so stateFile remains
secured exactly once.

In `@internal/exporter/collector/collector.go`:
- Line 87: collector.New currently declares an unused parameter `_ any`; if this
API is internal-only, remove the `_ any` parameter from the collector.New
function signature and from its sole call site (the call in exporter.go) and
update any related docs/tests to reflect the simplified signature; if the
function is part of the public API, keep the parameter to preserve compatibility
but add a clear comment on the collector.New declaration explaining the
parameter is intentionally unused (for attestation manager compatibility) so
future maintainers know why it remains.

In `@internal/exporter/converter/csv_test.go`:
- Around line 180-183: Add an explicit assertion that the CSV output contains
the "Agent Version" row corresponding to the fixture's AgentVersion ("0.1.5").
Locate the test that builds the fixtures (uses the AgentVersion field) and the
variable holding produced CSV output (the test's CSV string/rows), then add an
assertion that a row exists with the expected header/label (e.g., "Agent
Version") and value "0.1.5" — mirror existing assertions for
DCGMVersion/OSImage/KernelVersion so the test fails if that mapping is renamed
or removed.

In `@internal/exporter/converter/otlp_test.go`:
- Around line 313-335: Update the
TestOTLPConverter_Convert_IgnoresMachineInfoInResource test to assert that all
known MachineInfo fields are absent from the OTLP resource instead of only
checking "dcgmVersion"; locate the test function
TestOTLPConverter_Convert_IgnoresMachineInfoInResource and the loop over
rm.Resource.Attributes produced by NewOTLPConverter().Convert, build a set/list
of the MachineInfo keys that were intentionally removed (e.g., dcgmVersion and
the other MachineInfo field names used in machineinfo.MachineInfo) and assert
that no attribute.Key matches any of those names, or alternatively implement a
tight allowlist of acceptable resource keys and assert every attribute.Key is in
that allowlist; this ensures any leaking hardware fields from
machineinfo.MachineInfo are caught.

In `@internal/inventory/manager_run_test.go`:
- Around line 57-75: The test uses time.Sleep to wait for the manager to perform
work and then cancels, which is flaky; modify the test to use a synchronization
channel instead: add a ready signal (e.g., a chan struct{}) to the fakeSource or
fakeSink that NewManager/Manager.Run will cause to be closed/sent when the first
snapshot/export occurs, then in TestManagerRunStopsOnContextCancel wait on that
ready channel (with a sensible timeout) before calling cancel; update the
fakeSource/fakeSink helpers to send on the ready channel when they handle the
snapshot so the test deterministically proceeds without time.Sleep.

In `@internal/inventory/source/source.go`:
- Around line 140-150: NetPrivateIP is currently assigned from the first element
of info.NICInfo.PrivateIPInterfaces which relies on collector ordering; update
the code and/or comments to make the selection deterministic and explicit:
implement a clear selection rule (e.g., prefer non-loopback, non-empty IPs,
prefer interface named "eth0" or sort by Interface/MAC and pick the first) in
the block that populates snap.Resources.NICInfo.PrivateIPInterfaces and then set
snap.NetPrivateIP using that deterministic choice, and add a brief comment above
this logic describing the selection criteria; reference the variables/functions
NetPrivateIP, info.NICInfo.PrivateIPInterfaces, and snap.Resources.NICInfo when
making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9a779fe1-8ff8-4fe1-845c-d5e6cffa8521

📥 Commits

Reviewing files that changed from the base of the PR and between c90aee0 and 8d55dd9.

📒 Files selected for processing (66)
  • .github/workflows/ci.yml
  • .github/workflows/release.yml
  • .golangci.yml
  • SECURITY.md
  • cmd/fleetint/enroll.go
  • cmd/fleetint/enroll_test.go
  • cmd/fleetint/run.go
  • cmd/fleetint/status.go
  • cmd/fleetint/unenroll.go
  • cmd/fleetint/unenroll_test.go
  • deployments/helm/fleet-intelligence-agent/README.md
  • deployments/helm/fleet-intelligence-agent/values.yaml
  • deployments/packages/systemd/fleetint.env
  • docs/configuration.md
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestation/attestation.go
  • internal/attestation/attestation_test.go
  • internal/attestation/backend.go
  • internal/attestation/backend_test.go
  • internal/attestation/collector.go
  • internal/attestation/collector_test.go
  • internal/attestation/manager.go
  • internal/attestation/manager_test.go
  • internal/attestation/nonce.go
  • internal/attestation/nonce_test.go
  • internal/attestation/types.go
  • internal/backendclient/client.go
  • internal/backendclient/client_test.go
  • internal/backendclient/errors.go
  • internal/backendclient/errors_test.go
  • internal/backendclient/types.go
  • internal/config/config.go
  • internal/config/config_test.go
  • internal/config/default.go
  • internal/endpoint/endpoint.go
  • internal/endpoint/endpoint_test.go
  • internal/enrollment/enrollment.go
  • internal/enrollment/enrollment_test.go
  • internal/exporter/collector/collector.go
  • internal/exporter/collector/collector_test.go
  • internal/exporter/converter/csv.go
  • internal/exporter/converter/csv_test.go
  • internal/exporter/converter/otlp.go
  • internal/exporter/converter/otlp_test.go
  • internal/exporter/exporter.go
  • internal/exporter/exporter_test.go
  • internal/exporter/options.go
  • internal/exporter/options_test.go
  • internal/inventory/hash.go
  • internal/inventory/hash_test.go
  • internal/inventory/manager.go
  • internal/inventory/manager_run_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/mapper/backend.go
  • internal/inventory/mapper/backend_test.go
  • internal/inventory/sink/backend.go
  • internal/inventory/sink/backend_test.go
  • internal/inventory/source/source.go
  • internal/inventory/source/source_test.go
  • internal/inventory/types.go
  • internal/machineinfo/machineinfo.go
  • internal/machineinfo/machineinfo_test.go
  • internal/server/server.go
  • internal/server/server_test.go
💤 Files with no reviewable changes (4)
  • internal/exporter/options_test.go
  • internal/exporter/options.go
  • internal/attestation/attestation_test.go
  • internal/attestation/attestation.go
✅ Files skipped from review due to trivial changes (14)
  • .github/workflows/release.yml
  • .golangci.yml
  • SECURITY.md
  • deployments/helm/fleet-intelligence-agent/README.md
  • internal/inventory/hash.go
  • internal/inventory/hash_test.go
  • internal/backendclient/errors.go
  • internal/backendclient/errors_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/mapper/backend.go
  • internal/backendclient/client_test.go
  • internal/agentstate/state.go
  • internal/inventory/mapper/backend_test.go
  • internal/backendclient/types.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • .github/workflows/ci.yml
  • cmd/fleetint/enroll.go
  • cmd/fleetint/unenroll.go
  • internal/agentstate/sqlite_test.go
  • cmd/fleetint/enroll_test.go
  • internal/inventory/types.go

Comment thread cmd/fleetint/status.go Outdated
Comment thread internal/attestation/backend.go
Comment thread internal/attestation/collector.go
Comment thread internal/attestation/manager_test.go
Comment thread internal/attestation/manager.go
Comment thread internal/backendclient/client.go
Comment thread internal/exporter/exporter.go
Comment thread internal/exporter/exporter.go
Comment thread internal/inventory/manager.go
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/backendclient/client.go (1)

66-78: Avoid mutating the caller’s *http.Client.

NewWithHTTPClient rewrites CheckRedirect on the supplied client, so reusing that same client elsewhere will silently change its redirect behavior globally. A shallow copy keeps the policy local to this backend client.

💡 Suggested fix
 func NewWithHTTPClient(baseURL *url.URL, httpClient *http.Client) Client {
 	if httpClient == nil {
 		httpClient = &http.Client{Timeout: 30 * time.Second}
 	}
-	if httpClient.CheckRedirect == nil {
-		httpClient.CheckRedirect = func(*http.Request, []*http.Request) error {
+	clientCopy := *httpClient
+	if clientCopy.CheckRedirect == nil {
+		clientCopy.CheckRedirect = func(*http.Request, []*http.Request) error {
 			return errRedirectNotAllowed
 		}
 	}
 	return &client{
-		httpClient: httpClient,
+		httpClient: &clientCopy,
 		baseURL:    baseURL,
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backendclient/client.go` around lines 66 - 78, NewWithHTTPClient
currently mutates the caller-provided *http.Client by overwriting CheckRedirect;
instead, if httpClient != nil make a shallow copy (copy := *httpClient;
httpClient = &copy) and then set CheckRedirect on that copy so only the backend
client's instance is modified; keep the existing behavior when httpClient is nil
(create a new &http.Client{Timeout:...}) and return a client struct that uses
the copied httpClient, referencing NewWithHTTPClient, the httpClient parameter,
CheckRedirect, and errRedirectNotAllowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/enrollment/enrollment.go`:
- Around line 45-60: The code currently uses the full validated URL (which may
include legacy paths) when creating the backend client and persisting config;
update Enroll so after calling endpoint.ValidateBackendEndpoint you derive a
normalized backend base URL containing only scheme and host (and port if
present) — e.g., build a URL from baseURL.Scheme + "://" + baseURL.Host — then
pass that normalized base to newBackendClient(...) and to
storeConfigInMetadata(...) (and use that same normalized base when constructing
the client), keeping the rest of the logic (client.Enroll, jwtToken handling)
unchanged; reference functions/entities: Enroll, ValidateBackendEndpoint,
newBackendClient, storeConfigInMetadata, and the validated baseURL variable.

In `@internal/inventory/manager.go`:
- Around line 56-79: The Run method currently calls m.CollectOnce() and then
sleeps, which causes an immediate second collection and skips the intended
interval; modify manager.Run so the sleep happens before each subsequent
CollectOnce: after the initial collection (and initial jitter when
m.config.JitterEnabled), enter the loop that computes nextInterval (using
m.config.Interval, m.config.RetryInterval and calculateJitter/retryJitterCap
based on the previous error), call sleepWithContext(ctx, nextInterval) first,
then call m.CollectOnce(ctx) and update the error for the next iteration; keep
use of sleepWithContext, calculateJitter, initialJitterCap, retryJitterCap, and
the retry interval logic intact but moved to execute prior to the collection.
- Around line 72-73: The retry backoff always applies
calculateJitter(retryJitterCap(...)) even when JitterEnabled is false, making
m.config.JitterEnabled ineffective; update the conditional around the retry
sleep calculation (the block that assigns nextInterval using
m.config.RetryInterval and calculateJitter/retryJitterCap) to only add
calculateJitter(...) when m.config.JitterEnabled is true, otherwise set
nextInterval deterministically to m.config.RetryInterval (or the capped value)
so the "without jitter" path remains deterministic.

---

Nitpick comments:
In `@internal/backendclient/client.go`:
- Around line 66-78: NewWithHTTPClient currently mutates the caller-provided
*http.Client by overwriting CheckRedirect; instead, if httpClient != nil make a
shallow copy (copy := *httpClient; httpClient = &copy) and then set
CheckRedirect on that copy so only the backend client's instance is modified;
keep the existing behavior when httpClient is nil (create a new
&http.Client{Timeout:...}) and return a client struct that uses the copied
httpClient, referencing NewWithHTTPClient, the httpClient parameter,
CheckRedirect, and errRedirectNotAllowed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f30bca91-cf03-4568-aefa-f7eec7bbb5e9

📥 Commits

Reviewing files that changed from the base of the PR and between 44938f0 and 47c3759.

📒 Files selected for processing (18)
  • cmd/fleetint/status.go
  • cmd/fleetint/unenroll.go
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestation/backend.go
  • internal/attestation/backend_test.go
  • internal/attestation/manager.go
  • internal/attestation/manager_test.go
  • internal/attestation/nonce_test.go
  • internal/backendclient/client.go
  • internal/backendclient/client_test.go
  • internal/enrollment/enrollment.go
  • internal/inventory/manager.go
  • internal/inventory/manager_run_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/mapper/backend_test.go
  • internal/inventory/source/source_test.go
✅ Files skipped from review due to trivial changes (4)
  • internal/inventory/mapper/backend_test.go
  • internal/agentstate/state.go
  • internal/backendclient/client_test.go
  • internal/attestation/manager_test.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • cmd/fleetint/unenroll.go
  • internal/attestation/nonce_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/source/source_test.go
  • internal/agentstate/sqlite.go
  • internal/attestation/backend_test.go
  • cmd/fleetint/status.go
  • internal/attestation/manager.go

Comment thread internal/enrollment/enrollment.go
Comment thread internal/inventory/manager.go
Comment thread internal/inventory/manager.go Outdated
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant