Disable Reactor Netty auto-retry on HttpClient to prevent duplicate P… by dxichen · Pull Request #488 · linkedin/openhouse

dxichen · 2026-03-05T19:11:58Z

…UT requests

Reactor Netty by default retries idempotent requests (and in some versions non-idempotent ones) when a pooled connection is closed before a response is received (PrematureCloseException). For PUT requests to the OpenHouse catalog this caused a second concurrent commit attempt: the first succeeded (200 OK) while the second received a 409 Conflict, which Iceberg mapped to CommitFailedException and triggered SnapshotProducer.cleanAll() - deleting the manifest list files written by the successful commit and corrupting the table.

Fix: call disableRetry(true) on every HttpClient instance so Reactor Netty never silently retries a request, regardless of connection strategy (NEW or POOLED) or transport (HTTP or HTTPS).

Summary

Issue] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

…UT requests Reactor Netty by default retries idempotent requests (and in some versions non-idempotent ones) when a pooled connection is closed before a response is received (PrematureCloseException). For PUT requests to the OpenHouse catalog this caused a second concurrent commit attempt: the first succeeded (200 OK) while the second received a 409 Conflict, which Iceberg mapped to CommitFailedException and triggered SnapshotProducer.cleanAll() - deleting the manifest list files written by the successful commit and corrupting the table. Fix: call disableRetry(true) on every HttpClient instance so Reactor Netty never silently retries a request, regardless of connection strategy (NEW or POOLED) or transport (HTTP or HTTPS). Fixes: https://linkedin.atlassian.net/browse/BDP-72706 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cbb330 · 2026-03-06T01:29:23Z

PUT requests can retry on the network layer too. So disabling retries on netty would not necessarily solve this class of commit conflict.

I think its safer to handle duplicate metadata.json PUT commits on the server. For example, mysql will respond differently if entry already exists, we catch this signal in hts and return 201. We should discuss the status code in this case -- 201 could be ok, since existence of metadata at commit already could be ok but there are implications

abhisheknath2011 · 2026-03-06T04:17:36Z

client/secureclient/src/main/java/com/linkedin/openhouse/client/ssl/WebClientFactory.java

    } else {
      log.info("Using connection pool strategy");
-      client = HttpClient.create(getCustomConnectionProvider());
+      client = HttpClient.create(getCustomConnectionProvider()).disableRetry(true);


This config is for pooled connection and not at the request level. Normally Reactor Netty may retry once automatically when:

A pooled connection becomes stale

Connection closed before request write

TCP reset during acquisition

Connection closed by server before request starts.

Ideally on 200 or 409 Netty should not retry. In which scenario you are observing the issue?

As this is common client side http connection pool config, changing default netty behaviour might have some consequences. Disabling netty retry can increase:

503 errors.

connection reset

PrematureCloseException

ClosedChannelException

This code is common and not only used across openhouse codebase, but also in client side.

abhisheknath2011 reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable Reactor Netty auto-retry on HttpClient to prevent duplicate P…#488

Disable Reactor Netty auto-retry on HttpClient to prevent duplicate P…#488
dxichen wants to merge 1 commit intolinkedin:mainfrom
dxichen:dchen1/fix-disable-netty-retry-put

dxichen commented Mar 5, 2026

Uh oh!

cbb330 commented Mar 6, 2026 •

edited

Loading

Uh oh!

abhisheknath2011 Mar 6, 2026 •

edited

Loading

Uh oh!

abhisheknath2011 Mar 6, 2026

Uh oh!

abhisheknath2011 Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dxichen commented Mar 5, 2026

Summary

Changes

Testing Done

Additional Information

Uh oh!

cbb330 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhisheknath2011 Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhisheknath2011 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

abhisheknath2011 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cbb330 commented Mar 6, 2026 •

edited

Loading

abhisheknath2011 Mar 6, 2026 •

edited

Loading