Skip to content

Conversation

@LilianBoulard
Copy link
Contributor

@LilianBoulard LilianBoulard commented Jan 11, 2026

I noticed a bug when using the filtered random button where the selected posts would always be in the first few hundreds, seemingly not selecting the later ones at all.

After investigating the issue, it looks like the current method for selecting posts only does so on the cached pages (pageSize * cachePages first posts), which seems like undesired behavior. This fixes it by choosing an offset from the total number of posts.
From the few tests I've done it works as expected.

While I was in this file, I noticed a minor bug with the query: using OFFSET + LIMIT without an ORDER_BY can lead to underfined behavior (https://www.postgresql.org/docs/current/queries-order.html#QUERIES-ORDER).
I'm not 100% sure this can cause a bug in practice, but might as well follow the manual.

@funmaker
Copy link
Owner

funmaker commented Jan 12, 2026

Thanks.

https://www.postgresql.org/docs/current/queries-limit.html
When using LIMIT, it is important to use an ORDER BY clause that constrains the result rows into a unique order. Otherwise you will get an unpredictable subset of the query's rows. You might be asking for the tenth through twentieth rows, but tenth through twentieth in what ordering? The ordering is unknown, unless you specified ORDER BY.

From what I'm reading, there is no undefined behavior here. The order is just unspecified and unpredictable, and that's fine since we only do this to choose one record at random, we do not care about relative order of records and sample uniformly anyway.
I have suspicion that specifying order might introduce some performance overhead. Or maybe it might it faster, idk. Either way I wouldn't change this part unless there is a supporting benchmark(for large databases) or an actual bug to be fixed here.

The rest looks fine.

@LilianBoulard
Copy link
Contributor Author

Yeah, fair enough, I thought this would not impact the performance but it very slightly does (7k posts dataset):

PostgreSQL Query Benchmark
==========================
Iterations: 1000 | Warm-up: 10

=== Benchmarking Query 1 ===
Query: SELECT id FROM posts ORDER BY id OFFSET floor(random() * (SELECT posts FROM global)) LIMIT 1;
Warm-up: 10 runs...
Benchmark: 1000 runs...
  Min: 0.815ms | Max: 3.925ms | Avg: 1.25ms
  P50: 1.260ms | P95: 1.665ms | P99: 2.101ms

=== Benchmarking Query 2 ===
Query: SELECT id FROM posts OFFSET floor(random() * (SELECT posts FROM global)) LIMIT 1;
Warm-up: 10 runs...
Benchmark: 1000 runs...
  Min: 0.736ms | Max: 3.485ms | Avg: 1.16ms
  P50: 1.175ms | P95: 1.570ms | P99: 1.934ms

=== Comparison ===
Query 1 avg: 1.25ms
Query 2 avg: 1.16ms
Query 2 is 7% faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants