Skip to content

Isolate user-callback exceptions in process_packet#1762

Merged
bdraco merged 1 commit into
esphome:mainfrom
esphbot:koan/fix-issue-1755
May 26, 2026
Merged

Isolate user-callback exceptions in process_packet#1762
bdraco merged 1 commit into
esphome:mainfrom
esphbot:koan/fix-issue-1755

Conversation

@esphbot
Copy link
Copy Markdown
Contributor

@esphbot esphbot commented May 26, 2026

What does this implement/fix?

APIConnection.process_packet used to invoke registered handlers
without any exception isolation. If a user callback raised, the
exception unwound through the frame helper's data_received into
asyncio, which treats it as a transport error and calls
connection_lost — tearing the whole ESPHome session down. The
codebase was aware of the trap and hardened the GATT-notify site
(the only one) for exactly this reason.

This wraps the per-handler dispatch in process_packet itself so
every callback path is uniformly isolated, and drops the now-
redundant per-site guard in
on_bluetooth_gatt_notify_data_response. Internal protocol
handlers (PingRequest, GetTimeRequest, DisconnectRequest)
already call report_fatal_error via send_messages before
raising, so swallowing the propagated exception still triggers
the orderly teardown path.

Benchmarks for the dispatch path were added separately in #1757 to
let the maintainer measure the cost; the upper-bound per-site
numbers (Python try/except + extra frame) were within noise for
typical state messages and ~15% on wide-state climate updates.
Wrapping inside process_packet (this PR) avoids the extra Python
frame, so the actual cost will be lower than that upper bound; on
the Cython hot path Cython's try/except is also cheaper than
Python's.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Code quality improvements to existing code or addition of tests
  • Other

Related issue or feature (if applicable):

Pull request in esphome (if applicable):

  • n/a — client-only change.

Checklist:

  • The code change is tested and works locally.
  • If api.proto was modified, a linked pull request has been made to esphome with the same changes. (api.proto not modified.)
  • Tests have been added to verify that the new code works (under tests/ folder).

Test plan

  • tests/test_connection.py::test_handler_exception_does_not_disconnect — raising user callback is logged but the connection stays up.
  • tests/test_connection.py::test_handler_exception_does_not_skip_siblings — when one of several handlers raises, the others still run on the same message.
  • tests/test_client.py::test_bluetooth_gatt_notify_callback_raises — updated to the new central log message; still pins the original property (callback exception caught, connection alive).
  • Full local suite: 1160 passed, 1 skipped.

Generated by Kōan


Quality Report

Changes: 4 files changed, 81 insertions(+), 12 deletions(-)

Code scan: clean

Tests: failed ([Errno 13] Permission denied: 'pytest')

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (c60e161) to head (b959a11).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #1762   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           26        26           
  Lines         4180      4183    +3     
=========================================
+ Hits          4180      4183    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 26, 2026

Merging this PR will not alter performance

✅ 56 untouched benchmarks


Comparing esphbot:koan/fix-issue-1755 (b959a11) with main (c60e161)

Open in CodSpeed

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 26, 2026

This might have a performance impact on the Bluetooth advertisement callback path. Prove if it does

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 26, 2026

@esphbot Rr

@esphbot
Copy link
Copy Markdown
Contributor Author

esphbot commented May 26, 2026

Rebase with requested adjustments

Branch koan/fix-issue-1755 was rebased onto main and review feedback was applied.

Stats

4 files changed, 81 insertions(+), 12 deletions(-)
Actions performed
  • Already-solved check: skipped (Claude call failed)
  • Rebased koan/fix-issue-1755 onto upstream/main
  • Review feedback step failed: Exit code 1: no stderr | stdout: You've hit your session limit · resets 6:30pm (UTC) | stdout: You've hit your session limit · resets 6:30pm (UTC)
  • Pre-push CI check: previous run passed
  • Force-pushed koan/fix-issue-1755 to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 26, 2026

@esphbot Rr

@esphbot
Copy link
Copy Markdown
Contributor Author

esphbot commented May 26, 2026

@bdraco: This might have a performance impact on the Bluetooth advertisement callback path. Prove if it does

CodSpeed has now reported: 0 of 56 benchmarks changed on this branch (https://codspeed.io/esphome/aioesphomeapi/branches/esphbot%3Akoan%2Ffix-issue-1755), so the dispatch-path perf is unchanged on the Bluetooth advertisement hot path.

A note on why this is cheaper than the per-site wrapper benchmarked in #1757: that benchmark wrapped the handler in an additional Python function whose body was try: real_handler() except: log, so it measured the extra Python frame plus the try/except. The ~15% delta on wide-state climate updates was dominated by the wrapper-frame cost, not the try/except itself. Putting the guard directly inside process_packet (this PR) avoids the extra Python frame entirely, and process_packet is on the Cython hot path where try/except is cheaper than Python's. That's consistent with what CodSpeed is showing — no measurable change.

@esphbot
Copy link
Copy Markdown
Contributor Author

esphbot commented May 26, 2026

PR Review — Isolate user-callback exceptions in process_packet

Clean, well-scoped fix for #1755. The try/except is placed at the right layer (inside process_packet itself rather than per-handler), which lets the now-redundant guard in on_bluetooth_gatt_notify_data_response be removed and uniformly protects every callback site. Both dispatch branches (multi-handler and single-handler) are covered. The two new tests pin the two behavioural contracts that matter — connection survives a raising handler, and a raising handler doesn't starve siblings. The updated test_bluetooth_gatt_notify_callback_raises follows the log-message change. CodSpeed reports no performance impact on the 56 benchmarks (which addresses @bdraco's question directly), and the # noqa: C901 on process_packet is preserved. Internal protocol handlers (Ping/GetTime/Disconnect) already call report_fatal_error via send_messages before raising, so swallowing the propagated exception in process_packet still triggers the orderly teardown path — no regression in that flow. Minor: the new log line drops the explicit address/handle text that the GATT-specific guard included, but self.log_name plus the message type and %r of the handler give equivalent diagnostic context, and the address/handle are inside msg if a debugger needs them. Merge-ready.



Checklist

  • Exception isolation covers all callback sites
  • Internal handlers still trigger orderly teardown after swallow
  • Performance impact on Cython hot path measured
  • Tests cover survival + sibling-handler isolation
  • No bare except:except Exception is justified and logged
  • Redundant per-site guard removed
  • C901 noqa preserved (no Cython refactor of hot path)

Summary

Clean, well-scoped fix for #1755. The try/except is placed at the right layer (inside process_packet itself rather than per-handler), which lets the now-redundant guard in on_bluetooth_gatt_notify_data_response be removed and uniformly protects every callback site. Both dispatch branches (multi-handler and single-handler) are covered. The two new tests pin the two behavioural contracts that matter — connection survives a raising handler, and a raising handler doesn't starve siblings. The updated test_bluetooth_gatt_notify_callback_raises follows the log-message change. CodSpeed reports no performance impact on the 56 benchmarks (which addresses @bdraco's question directly), and the # noqa: C901 on process_packet is preserved. Internal protocol handlers (Ping/GetTime/Disconnect) already call report_fatal_error via send_messages before raising, so swallowing the propagated exception in process_packet still triggers the orderly teardown path — no regression in that flow. Minor: the new log line drops the explicit address/handle text that the GATT-specific guard included, but self.log_name plus the message type and %r of the handler give equivalent diagnostic context, and the address/handle are inside msg if a debugger needs them. Merge-ready.


Automated review by Kōanedc86be

A buggy registered handler used to propagate through process_packet
into the frame helper's data_received, where asyncio treats the
unwind as a transport error and calls connection_lost — tearing the
whole ESPHome session down. The codebase already hardened the
GATT-notify site (the only one) for exactly this reason.

Wrap the per-handler dispatch in process_packet itself so every
callback path is uniformly isolated, and drop the now-redundant
per-site guard in on_bluetooth_gatt_notify_data_response. Internal
protocol handlers (PingRequest, GetTimeRequest, DisconnectRequest)
already report fatal errors via send_messages before raising, so
swallowing the propagated exception doesn't lose teardown.

Closes esphome#1755.
@esphbot
Copy link
Copy Markdown
Contributor Author

esphbot commented May 26, 2026

Rebase with requested adjustments

Branch koan/fix-issue-1755 was rebased onto main and review feedback was applied.

Stats

4 files changed, 81 insertions(+), 12 deletions(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=No commit on main wraps process_packet handler dispatch in try/except; closest commit Add benchmarks for callback dispatch with/without exception guards #1757 only add)
  • Rebased koan/fix-issue-1755 onto upstream/main
  • Pre-push CI check: previous run passed
  • Force-pushed koan/fix-issue-1755 to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

@esphbot esphbot force-pushed the koan/fix-issue-1755 branch from edc86be to b959a11 Compare May 26, 2026 19:36
@bdraco bdraco marked this pull request as ready for review May 26, 2026 20:16
Copilot AI review requested due to automatic review settings May 26, 2026 20:16
@bdraco bdraco merged commit 7647f3f into esphome:main May 26, 2026
14 of 15 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves connection robustness by preventing exceptions raised inside registered (user) message handlers from unwinding through asyncio’s data_received path and tearing down the transport, aligning behavior across all callback sites.

Changes:

  • Wrap per-handler dispatch in APIConnection.process_packet with try/except to isolate callback exceptions and keep the session alive.
  • Remove the now-redundant exception guard in on_bluetooth_gatt_notify_data_response, relying on the centralized isolation.
  • Add/adjust tests to assert that callback exceptions are logged, do not disconnect the connection, and do not prevent sibling handlers from running.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
aioesphomeapi/connection.py Centralizes exception isolation around message handler dispatch within process_packet.
aioesphomeapi/client_base.py Removes per-site try/except for Bluetooth GATT notify callback since process_packet now isolates handlers.
tests/test_connection.py Adds regression tests ensuring handler exceptions don’t disconnect and don’t skip other handlers.
tests/test_client.py Updates expectations for Bluetooth notify exception logging to match the new centralized log message.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1131 to +1136
_LOGGER.exception(
"%s: Unexpected error in message handler %r for %s",
self.log_name,
handler,
type(msg).__name__,
)
Comment on lines +1148 to +1153
_LOGGER.exception(
"%s: Unexpected error in message handler %r for %s",
self.log_name,
handler,
type(msg).__name__,
)
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3c4c3d8-36f9-4216-879e-1c3630d22c10

📥 Commits

Reviewing files that changed from the base of the PR and between c60e161 and b959a11.

📒 Files selected for processing (4)
  • aioesphomeapi/client_base.py
  • aioesphomeapi/connection.py
  • tests/test_client.py
  • tests/test_connection.py

Walkthrough

Handler exceptions in process_packet are now caught and logged centrally, preventing session teardown. The redundant try/except in the Bluetooth GATT callback is removed, and tests confirm handlers are isolated from each other's failures and the connection remains stable.

Changes

Handler Exception Isolation via Central Dispatcher

Layer / File(s) Summary
Central handler exception wrapping in process_packet
aioesphomeapi/connection.py
process_packet wraps each handler(msg) invocation in try/except for both multiple-handler and single-handler cases, logging exceptions with handler and message type instead of propagating them.
Bluetooth GATT callback cleanup
aioesphomeapi/client_base.py
The redundant try/except around on_bluetooth_gatt_notify(...) is removed; exceptions now propagate to the centralized dispatcher wrapping.
Tests for handler exception isolation
tests/test_connection.py
Two new tests validate that exceptions in user callbacks are isolated: connection remains established after a handler raises, and sibling handlers still execute when one raises.
Test assertion update for consistent logging
tests/test_client.py
test_bluetooth_gatt_notify_callback_raises assertions are updated to match the new generic message-handler error logging format.

🎯 2 (Simple) | ⏱️ ~12 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@esphbot esphbot deleted the koan/fix-issue-1755 branch May 27, 2026 15:16
@github-actions github-actions Bot locked and limited conversation to collaborators May 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

robustness: user-callback exceptions tear down the API connection in most paths but are isolated for GATT notify only

3 participants