Skip to content

Fixes #39368 Fix race during concurrent autosign requests#945

Open
alexjfisher wants to merge 1 commit into
theforeman:developfrom
alexjfisher:issue_39368
Open

Fixes #39368 Fix race during concurrent autosign requests#945
alexjfisher wants to merge 1 commit into
theforeman:developfrom
alexjfisher:issue_39368

Conversation

@alexjfisher
Copy link
Copy Markdown
Contributor

Concurrent autosign requests could race while updating the token whitelist. Although writes were locked, the read and modification happened before the write lock was taken, so one request could observe an empty file or overwrite another request's update.

Concurrent autosign requests could race while updating the token
whitelist.  Although writes were locked, the read and modification
happened before the write lock was taken, so one request could observe
an empty file or overwrite another request's update.
@alexjfisher alexjfisher marked this pull request as ready for review May 28, 2026 14:53
Copy link
Copy Markdown
Contributor

@adamruzicka adamruzicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sane. At what scale did you start seeing those conflicts?

# These helpers must only be called with a locked file handle from #lock.
def unsafe_read(file)
file.rewind
YAML.safe_load(file.read, fallback: [])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're trying to be as defensive as possible, should we also check the type we get and reset to empty array if we get something unexpected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file is empty, it seemed reasonable to return an empty array, but I'm not too sure about silently fixing corrupt data??

@alexjfisher
Copy link
Copy Markdown
Contributor Author

This looks sane. At what scale did you start seeing those conflicts?

~ 50 hosts. In our use-case the API is being called at the terraform plan phase, with the actual concurrency being governed by terraform's parallelism setting. We were still able to trigger the error with parallelism as low as 4.

(I'm separately considering a follow-up PR that'd do some pruning of expired, but never used tokens).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants