Skip to content

Brgndy25/tests/payload retrieval via voting#7

Draft
Brgndy25 wants to merge 6 commits intoserokell-fix-missing-payloadfrom
brgndy25/tests/payload-retrieval-via-voting
Draft

Brgndy25/tests/payload retrieval via voting#7
Brgndy25 wants to merge 6 commits intoserokell-fix-missing-payloadfrom
brgndy25/tests/payload-retrieval-via-voting

Conversation

@Brgndy25
Copy link

No description provided.

Problem: No way to test voter fallback scenarios in integration tests since the executor always receives the payload directly from the TA.

Solution: Introduce VoterRecoverySeed and VoterNegativeSeed test seeds.
VoterRecoverySeed stores the payload but blocks direct forwarding to
the executor, forcing voter fallback. VoterNegativeSeed skips prompt
storage entirely so voters also fail to find it.
Problem: When the executor's direct payload retrieval from the TA
fails, the inference is stuck, since there is no recovery path via voters.
The voting and server/public packages also have an import cycle that
prevents clean integration.

Solution: Extract shared API types into an apitypes package to break
the import cycle. Add GetVoterURLs and VoterFallback to NodePinger so
the executor can sample voters when direct retrieval fails. Wire up the
fallback in event_listener and expose POST /v1/voting/verify for voters
to verify respondent payloads on behalf of challengers.
Problem: The voter fallback mechanism has no integration test coverage
to verify end-to-end behavior in a multi-node cluster.

Solution: Add two test cases in VoterTests.kt:
- voter recovers payload when executor direct retrieval is blocked
- all voters cast negative votes when payload does not exist
Comment on lines +493 to +497
// Create verification request.
// Note: PromptHash is left empty intentionally. The on-chain hash is computed from the
// canonicalized request body, but prompt storage stores the full ChatRequest (with auth keys,
// timestamps, etc.). Comparing hashes here would always mismatch. The voter's job is to
// check payload availability, not hash correctness.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct this is the flow?:

  1. Call RequestVerificationFromVoters
  2. ... For each voter, call np.requestVerificationFromSingleVoter
  3. ... which will reach v1/voting/verify, i.e., postVotingVerify
  4. ... which calls VerifyRespondent
  5. ... which finally verifies the hash, casting a positive or negative vote

assertThat(inference.executedBy).isEqualTo(inference.assignedTo)
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the extensive comments detailing the flow.

These tests will be pretty useful when we have MsgFinishInferenceWithMissingPayload. We can then assert all of these votes using these test cases.

Just one thing: in my experience, when dealing with testermint, even when the tests succeed, you should still check the standard output and error tabs for the systems we are testing. I'm not implying there's something wrong or anything, and I haven't checked myself, but just as sanity to see if nothing slipped right under our noses.

Problem: Voters blindly process verification requests without checking
that the disputed inference actually exists on chain, that it hasn't
already been finished, or that the respondent matches the on-chain TA.

Solution: impl ChainVerifier.QueryInferenceState in postVotingVerify
to validate requests before the voter does any work. Reject if inference
is not found, already finished, or respondent doesn't match. Add a
failing test for when MsgStartInference does not exist on chain.
Problem: The test mode and seed checks were duplicated across multiple
if-statements in handleTransferRequest, making the logic harder to
follow.

Solution: Compute isTestMode, isMissingPayload, isVoterRecovery, and
isVoterNegative once at the top and reuse them throughout.
Problem: Voter selection used GetVoterURLs which queried all participants
via ParticipantsWithBalances, making it non-deterministic and not
replayable by on-chain validators.

Solution: Replace GetVoterURLs with SampleVotersForInference which uses epochgroup.MakeRandomMemberReplayableFn, the same deterministic
algorithm the chain uses. The EpochGroupData is queried from the chain
using the inference's EpochId.
Also adds a nil check for empty payloads in RetrievePayloadToRequester to prevent panics.
blockHash := bytes.HexBytes(blockResp.BlockId.Hash)

// Build an EpochGroup backed by gRPC adapters so we can call
// MakeRandomMemberReplayableFn with the exact same algorithm the chain uses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You keep mentioning, the exact same algorithm the chain uses, but this algorithm is unused (well, now it isn't). Although we should also use it for validations.

)

// DefaultMaxVoters is the default number of voters to sample for fallback verification.
// TODO: move this to config?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be moved.

// to surface accidental misuse immediately.

// grpcGroupKeeper implements types.GroupMessageKeeper via gRPC.
type grpcGroupKeeper struct{ client group.QueryClient }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it might be better to change it to not use EpochGroup?

Also, perhaps we should write some more unit tests, there is currently only one test for those samples which might be insufficient. But I think we can perhaps leave it for another PR. I have this test (epochgroup.random_test.go) which AFAIR was not passing and I didn't figure out why before moving to more important tasks:

func TestCanSampleMany(t *testing.T) {
       k, ctx, mocks := keepertest.InferenceKeeperReturningMocks(t)
       sdkCtx := sdk.UnwrapSDKContext(ctx)

       pocStartHeight := int64(100)
       epochIndex := uint64(1)
       k.SetEpoch(sdkCtx, &types.Epoch{Index: epochIndex, PocStartBlockHeight: pocStartHeight})
       require.NoError(t, k.SetEffectiveEpochIndex(sdkCtx, epochIndex))
       mocks.ExpectCreateGroupWithPolicyCall(ctx, epochIndex)

       eg, err := k.CreateEpochGroup(ctx, epochIndex, epochIndex)
       require.NoError(t, err)

       require.NoError(t, eg.CreateGroup(ctx))

       blockHash := []byte("blockhash")
       numParticipants := 8
       participants := make([]types.Participant, numParticipants)
       for i := range numParticipants {
               addr := testutil.Bech32Addr(i)
               participant := types.Participant {
                       Index: addr,
                       Address: addr,
                       Weight: 150 * int32(i + 1),
                       Status: types.ParticipantStatus_ACTIVE,
                       CurrentEpochStats: types.NewCurrentEpochStats(),
               }
               require.NoError(t, eg.ParticipantKeeper.SetParticipant(ctx, participant))
               participants[i] = participant
               mocks.GroupKeeper.EXPECT().
                       GroupMembers(ctx, gomock.Any()).
                       Return(
                               &group.QueryGroupMembersResponse {
                                       Members: []*group.GroupMember {
                                               &group.GroupMember {
                                                       Member: &group.Member {
                                                               Address: participant.Address,
                                                               Weight: strconv.Itoa(int(participant.Weight)),
                                                               Metadata: "",
                                                               AddedAt: time.Now(),
                                                       },
                                               },
                                       },
                                       Pagination: &query.PageResponse{},
                               },
                               nil,
                       )
       }
       rrCtx, err := eg.NewReplayableRandomContext(ctx, blockHash)
       require.NoError(t, err)

       for _, participant := range participants {
               expectedParticipant, err := eg.GetRandomMemberReplayable(ctx, rrCtx)
               require.NoError(t, err)
               require.Equal(t, *expectedParticipant, participant)
       }

       expectedParticipant, err := eg.GetRandomMemberReplayable(ctx, rrCtx)
       require.Error(t, err) // No participants to sample
       require.Nil(t, expectedParticipant)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants