Brgndy25/tests/payload retrieval via voting#7
Brgndy25/tests/payload retrieval via voting#7Brgndy25 wants to merge 6 commits intoserokell-fix-missing-payloadfrom
Conversation
Problem: No way to test voter fallback scenarios in integration tests since the executor always receives the payload directly from the TA. Solution: Introduce VoterRecoverySeed and VoterNegativeSeed test seeds. VoterRecoverySeed stores the payload but blocks direct forwarding to the executor, forcing voter fallback. VoterNegativeSeed skips prompt storage entirely so voters also fail to find it.
Problem: When the executor's direct payload retrieval from the TA fails, the inference is stuck, since there is no recovery path via voters. The voting and server/public packages also have an import cycle that prevents clean integration. Solution: Extract shared API types into an apitypes package to break the import cycle. Add GetVoterURLs and VoterFallback to NodePinger so the executor can sample voters when direct retrieval fails. Wire up the fallback in event_listener and expose POST /v1/voting/verify for voters to verify respondent payloads on behalf of challengers.
Problem: The voter fallback mechanism has no integration test coverage to verify end-to-end behavior in a multi-node cluster. Solution: Add two test cases in VoterTests.kt: - voter recovers payload when executor direct retrieval is blocked - all voters cast negative votes when payload does not exist
| // Create verification request. | ||
| // Note: PromptHash is left empty intentionally. The on-chain hash is computed from the | ||
| // canonicalized request body, but prompt storage stores the full ChatRequest (with auth keys, | ||
| // timestamps, etc.). Comparing hashes here would always mismatch. The voter's job is to | ||
| // check payload availability, not hash correctness. |
There was a problem hiding this comment.
Is it correct this is the flow?:
- Call
RequestVerificationFromVoters - ... For each voter, call
np.requestVerificationFromSingleVoter - ... which will reach
v1/voting/verify, i.e.,postVotingVerify - ... which calls
VerifyRespondent - ... which finally verifies the hash, casting a positive or negative vote
| assertThat(inference.executedBy).isEqualTo(inference.assignedTo) | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Thank you for the extensive comments detailing the flow.
These tests will be pretty useful when we have MsgFinishInferenceWithMissingPayload. We can then assert all of these votes using these test cases.
Just one thing: in my experience, when dealing with testermint, even when the tests succeed, you should still check the standard output and error tabs for the systems we are testing. I'm not implying there's something wrong or anything, and I haven't checked myself, but just as sanity to see if nothing slipped right under our noses.
Problem: Voters blindly process verification requests without checking that the disputed inference actually exists on chain, that it hasn't already been finished, or that the respondent matches the on-chain TA. Solution: impl ChainVerifier.QueryInferenceState in postVotingVerify to validate requests before the voter does any work. Reject if inference is not found, already finished, or respondent doesn't match. Add a failing test for when MsgStartInference does not exist on chain.
Problem: The test mode and seed checks were duplicated across multiple if-statements in handleTransferRequest, making the logic harder to follow. Solution: Compute isTestMode, isMissingPayload, isVoterRecovery, and isVoterNegative once at the top and reuse them throughout.
Problem: Voter selection used GetVoterURLs which queried all participants via ParticipantsWithBalances, making it non-deterministic and not replayable by on-chain validators. Solution: Replace GetVoterURLs with SampleVotersForInference which uses epochgroup.MakeRandomMemberReplayableFn, the same deterministic algorithm the chain uses. The EpochGroupData is queried from the chain using the inference's EpochId. Also adds a nil check for empty payloads in RetrievePayloadToRequester to prevent panics.
| blockHash := bytes.HexBytes(blockResp.BlockId.Hash) | ||
|
|
||
| // Build an EpochGroup backed by gRPC adapters so we can call | ||
| // MakeRandomMemberReplayableFn with the exact same algorithm the chain uses. |
There was a problem hiding this comment.
You keep mentioning, the exact same algorithm the chain uses, but this algorithm is unused (well, now it isn't). Although we should also use it for validations.
| ) | ||
|
|
||
| // DefaultMaxVoters is the default number of voters to sample for fallback verification. | ||
| // TODO: move this to config? |
There was a problem hiding this comment.
I think it should be moved.
| // to surface accidental misuse immediately. | ||
|
|
||
| // grpcGroupKeeper implements types.GroupMessageKeeper via gRPC. | ||
| type grpcGroupKeeper struct{ client group.QueryClient } |
There was a problem hiding this comment.
Do you think it might be better to change it to not use EpochGroup?
Also, perhaps we should write some more unit tests, there is currently only one test for those samples which might be insufficient. But I think we can perhaps leave it for another PR. I have this test (epochgroup.random_test.go) which AFAIR was not passing and I didn't figure out why before moving to more important tasks:
func TestCanSampleMany(t *testing.T) {
k, ctx, mocks := keepertest.InferenceKeeperReturningMocks(t)
sdkCtx := sdk.UnwrapSDKContext(ctx)
pocStartHeight := int64(100)
epochIndex := uint64(1)
k.SetEpoch(sdkCtx, &types.Epoch{Index: epochIndex, PocStartBlockHeight: pocStartHeight})
require.NoError(t, k.SetEffectiveEpochIndex(sdkCtx, epochIndex))
mocks.ExpectCreateGroupWithPolicyCall(ctx, epochIndex)
eg, err := k.CreateEpochGroup(ctx, epochIndex, epochIndex)
require.NoError(t, err)
require.NoError(t, eg.CreateGroup(ctx))
blockHash := []byte("blockhash")
numParticipants := 8
participants := make([]types.Participant, numParticipants)
for i := range numParticipants {
addr := testutil.Bech32Addr(i)
participant := types.Participant {
Index: addr,
Address: addr,
Weight: 150 * int32(i + 1),
Status: types.ParticipantStatus_ACTIVE,
CurrentEpochStats: types.NewCurrentEpochStats(),
}
require.NoError(t, eg.ParticipantKeeper.SetParticipant(ctx, participant))
participants[i] = participant
mocks.GroupKeeper.EXPECT().
GroupMembers(ctx, gomock.Any()).
Return(
&group.QueryGroupMembersResponse {
Members: []*group.GroupMember {
&group.GroupMember {
Member: &group.Member {
Address: participant.Address,
Weight: strconv.Itoa(int(participant.Weight)),
Metadata: "",
AddedAt: time.Now(),
},
},
},
Pagination: &query.PageResponse{},
},
nil,
)
}
rrCtx, err := eg.NewReplayableRandomContext(ctx, blockHash)
require.NoError(t, err)
for _, participant := range participants {
expectedParticipant, err := eg.GetRandomMemberReplayable(ctx, rrCtx)
require.NoError(t, err)
require.Equal(t, *expectedParticipant, participant)
}
expectedParticipant, err := eg.GetRandomMemberReplayable(ctx, rrCtx)
require.Error(t, err) // No participants to sample
require.Nil(t, expectedParticipant)
}
No description provided.