Skip to content

conn error sub-codes for better classification#2889

Open
sawka wants to merge 2 commits intomainfrom
sawka/conn-error-codes
Open

conn error sub-codes for better classification#2889
sawka wants to merge 2 commits intomainfrom
sawka/conn-error-codes

Conversation

@sawka
Copy link
Member

@sawka sawka commented Feb 17, 2026

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

Walkthrough

This PR introduces a hierarchical error classification system for remote connection failures. A new SubCode field is added to the CodedError struct with helper functions MakeSubCodedError() and GetErrorSubCode(). The ClassifyConnError() function signature is updated to return two values (code, subCode). New public constants define specific dial error subcodes (DNS, refused, timeout, context-canceled, no-route, host-unreachable, net-unreachable, conn-reset, perm-denied, proxy-jump, other) and auth subcodes. Telemetry structures are extended with ConnSubErrorCode and ConnContextError fields. A .gitignore entry for temporary kilo-format files is also added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to assess whether it relates to the changeset. Add a description explaining the purpose of the sub-codes, how they improve error classification, and what changes were made across the affected files.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'conn error sub-codes for better classification' accurately reflects the main change: introducing sub-codes for connection error classification across multiple files for improved error diagnostics.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sawka/conn-error-codes

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

if err != nil {
blocklogger.Infof(ctx, "[conndebug] ERROR dial error: %v\n", err)
return nil, utilds.MakeCodedError(ConnErrCode_Dial, err)
subCode := DialSubCode_ProxyJump // This is a proxy jump connection error
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Hardcoded subcode loses error classification detail

This hardcodes DialSubCode_ProxyJump for all proxy jump errors, but doesn't classify the actual error type. A proxy jump connection can fail for many reasons (DNS, timeout, refused, etc.).

Consider calling ClassifyDialErrorSubCode(err) to get the actual error type, or use a composite approach that preserves both the proxy-jump context and the underlying error classification.

@kiloconnect
Copy link

kiloconnect bot commented Feb 17, 2026

Code Review Summary

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
pkg/remote/sshclient.go 856 Hardcoded subcode loses error classification detail for proxy jump connections
Files Reviewed (5 files)
  • .gitignore - No issues
  • pkg/remote/conncontroller/conncontroller.go - No issues
  • pkg/remote/sshclient.go - 1 issue
  • pkg/telemetry/telemetrydata/telemetrydata.go - No issues
  • pkg/utilds/codederror.go - No issues

Fix these issues in Kilo Cloud

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/remote/sshclient.go (2)

178-182: DialSubCode_ContextCanceled groups both cancellation and deadline exceeded.

context.DeadlineExceeded is mapped to DialSubCode_ContextCanceled ("context-canceled"). If downstream telemetry consumers need to distinguish user-initiated cancellation from timeouts, this subcode won't differentiate them. Consider whether a separate DialSubCode_ContextDeadline (or renaming the constant to something like DialSubCode_ContextError) would be more accurate. This is a minor telemetry clarity concern since ConnContextError in telemetry already captures this as a boolean, and the timeout string-pattern checks below would never fire for these errors anyway.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/remote/sshclient.go` around lines 178 - 182, The current mapping treats
both context.Canceled and context.DeadlineExceeded as
DialSubCode_ContextCanceled, which loses distinction between user cancellation
and timeouts; update the handling in the error-to-subcode logic (the block that
checks errors.Is(err, context.Canceled) || errors.Is(err,
context.DeadlineExceeded)) to return a separate subcode for deadline exceeded
(e.g., add DialSubCode_ContextDeadline) while keeping
DialSubCode_ContextCanceled for context.Canceled, or alternatively rename to
DialSubCode_ContextError if you prefer a single bucket; ensure the new constant
is declared alongside other DialSubCode values and update any telemetry code
that consumes DialSubCode_ContextCanceled if it must account for the new
DialSubCode_ContextDeadline.

854-859: Proxy jump errors always get DialSubCode_ProxyJump, losing the underlying cause.

When dialing through a proxy and it fails (e.g., DNS failure on the proxy hop, timeout, connection refused), the subcode is unconditionally set to "proxy-jump". If you want to capture the root cause for telemetry, you could nest or prefix the underlying subcode. That said, this may be intentional to distinguish proxy-hop failures from direct-dial failures—just flagging in case finer granularity was desired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/remote/sshclient.go` around lines 854 - 859, The current code
unconditionally sets DialSubCode_ProxyJump for any error from
currentClient.DialContext, losing the underlying cause; modify the error
handling in the clientConn = currentClient.DialContext(ctx, "tcp", networkAddr)
block to detect if err is a utilds-style subcoded error (e.g., type assert to
whatever subcoded error type the project uses or call an extractor like
utilds.ExtractSubCode(err)), extract the inner subcode, and produce a composite
subcode (for example DialSubCode_ProxyJump + ":" + innerSubCode) or otherwise
nest/prefix the inner subcode; log both the proxy-jump context and the inner
subcode via blocklogger.Infof and pass the composite/nested subcode into
utilds.MakeSubCodedError(ConnErrCode_Dial, compositeSubCode, err) so the
original error and its finer-grained subcode are preserved (refer to clientConn,
currentClient.DialContext, DialSubCode_ProxyJump, utilds.MakeSubCodedError,
ConnErrCode_Dial, blocklogger.Infof).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/remote/sshclient.go`:
- Around line 178-182: The current mapping treats both context.Canceled and
context.DeadlineExceeded as DialSubCode_ContextCanceled, which loses distinction
between user cancellation and timeouts; update the handling in the
error-to-subcode logic (the block that checks errors.Is(err, context.Canceled)
|| errors.Is(err, context.DeadlineExceeded)) to return a separate subcode for
deadline exceeded (e.g., add DialSubCode_ContextDeadline) while keeping
DialSubCode_ContextCanceled for context.Canceled, or alternatively rename to
DialSubCode_ContextError if you prefer a single bucket; ensure the new constant
is declared alongside other DialSubCode values and update any telemetry code
that consumes DialSubCode_ContextCanceled if it must account for the new
DialSubCode_ContextDeadline.
- Around line 854-859: The current code unconditionally sets
DialSubCode_ProxyJump for any error from currentClient.DialContext, losing the
underlying cause; modify the error handling in the clientConn =
currentClient.DialContext(ctx, "tcp", networkAddr) block to detect if err is a
utilds-style subcoded error (e.g., type assert to whatever subcoded error type
the project uses or call an extractor like utilds.ExtractSubCode(err)), extract
the inner subcode, and produce a composite subcode (for example
DialSubCode_ProxyJump + ":" + innerSubCode) or otherwise nest/prefix the inner
subcode; log both the proxy-jump context and the inner subcode via
blocklogger.Infof and pass the composite/nested subcode into
utilds.MakeSubCodedError(ConnErrCode_Dial, compositeSubCode, err) so the
original error and its finer-grained subcode are preserved (refer to clientConn,
currentClient.DialContext, DialSubCode_ProxyJump, utilds.MakeSubCodedError,
ConnErrCode_Dial, blocklogger.Infof).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant