Skip to content

Conversation

@liuxuezhao
Copy link
Contributor

@liuxuezhao liuxuezhao commented Feb 10, 2026

For the case of 2nd remap, if the spare target is DOWN2UP need to set fs_down2up flag, to make it be able to set shard's po_rebuilding flag at the end.
One example case -
Target A is DOWN, rebuild completed and status changed to DOWNOUT
Target B is DOWN, rebuild started but not completed but admin do the reint,
its status change to UP and with DOWN2UP flag.

In object layout calculation, one shard firstly located in Target A, but 1st remap to Target B, but still need to do 2nd remap. In this case should set fs_down2up flag which is not set in the 1st remap, to avoid not be able to set shard's po_rebuilding flag so will cause read from it (invalid place).

This bug could cause data corruption (mostly like with cause shard losing).

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

For the case of 2nd remap, if the spare target is DOWN2UP need to set
fs_down2up flag, to make it be able to set shard's po_rebuilding flag
at the end.
One example case -
Target A is DOWN, rebuild completed and status changed to DOWNOUT
Target B is DOWN, rebuild started but not completed but admin do the reint,
its status change to UP and with DOWN2UP flag.

In object layout calculation, one shard firstly located in Target A, but 1st
remap to Target B, but still need to do 2nd remap. In this case should set
fs_down2up flag which is not set in the 1st remap, to avoid not be able to set
shard's po_rebuilding flag so will cause read from it (invalid place).

This bug could cause data corruption (mostly like with cause shard losing).

Signed-off-by: Xuezhao Liu <xuezhao.liu@hpe.com>
@liuxuezhao liuxuezhao requested review from a team as code owners February 10, 2026 00:36
@github-actions
Copy link

Errors are Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-18487

@liuxuezhao liuxuezhao requested a review from kccain February 10, 2026 10:09
Copy link
Contributor

@kccain kccain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this area of the code is a little new to me, but changes seem reasonable after some preliminary study of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants