fix: stabilize analytics sample cohort by silentgeckoaudit3801 · Pull Request #5 · sonic-mast/x402-analytics

silentgeckoaudit3801 · 2026-06-11T06:42:02Z

Closes #3.

Sorts agents by verification timestamp and BTC address before selecting the 20-agent sample. This makes the cohort deterministic when the API changes result ordering, preventing cumulative estimates from regressing solely because a different cohort was selected.

The existing public data demonstrates the failure: total_messages fell 651 -> 602 and later 602 -> 553.

gemini-code-assist

Code Review

This pull request stabilizes the sampled cohort of agents in scripts/collect-metrics.js by sorting them by registration date and Bitcoin address before slicing. The reviewer identified a critical selection bias issue with this approach, as sorting by registration date permanently limits the sample to the oldest agents. They recommended using a deterministic, unbiased hashed sampling method based on the SHA-256 hash of the Bitcoin address to ensure fair representation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-11T06:43:13Z

+  // Keep the cohort stable when the API changes its result ordering.
+  const sample = [...agents]
+    .sort((a, b) => a.verifiedAt.localeCompare(b.verifiedAt) || a.btcAddress.localeCompare(b.btcAddress))
+    .slice(0, sampleSize);


Methodological Issue: Selection Bias in Cohort Sampling

Sorting the agents by verifiedAt ascending means the sample cohort is permanently frozen to the 20 oldest registered agents in the network. As the network grows, new agents are completely excluded from the sample.

This introduces a severe selection bias because:

The behavior of the oldest 20 agents is extrapolated to the entire network using scaleFactor = agents.length / sample.length.

If these 20 oldest agents become inactive or have unusually high activity compared to the rest of the network, the extrapolated metrics (total messages, sats received/sent) will be highly inaccurate.

Recommended Solution: Deterministic, Unbiased Hashed Sampling

To keep the cohort stable and deterministic without biasing it to the oldest agents, we should sort by a property that is randomly distributed across all agents, such as a hash of their btcAddress.

Using a hash (like SHA-256) is crucial because:

It ensures that new agents have an equal chance of being sampled.

It avoids address format bias (e.g., legacy addresses starting with 1 sorting before Bech32 addresses starting with bc1 lexicographically).

Suggested change

// Keep the cohort stable when the API changes its result ordering.

const sample = [...agents]

.sort((a, b) => a.verifiedAt.localeCompare(b.verifiedAt) || a.btcAddress.localeCompare(b.btcAddress))

.slice(0, sampleSize);

// Keep the cohort stable and unbiased by sorting by the hash of the BTC address.

const crypto = require('crypto');

const getHash = (addr) => crypto.createHash('sha256').update(addr || '').digest('hex');

const sample = [...agents]

.sort((a, b) => getHash(a.btcAddress).localeCompare(getHash(b.btcAddress)))

.slice(0, sampleSize);

fix: stabilize analytics sample cohort

67eac8c

gemini-code-assist Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stabilize analytics sample cohort#5

fix: stabilize analytics sample cohort#5
silentgeckoaudit3801 wants to merge 1 commit into
sonic-mast:mainfrom
silentgeckoaudit3801:main

silentgeckoaudit3801 commented Jun 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

silentgeckoaudit3801 commented Jun 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Methodological Issue: Selection Bias in Cohort Sampling

Recommended Solution: Deterministic, Unbiased Hashed Sampling

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant