Skip to content

Conversation

@andr-sokolov
Copy link
Contributor

@andr-sokolov andr-sokolov commented Jan 11, 2026

The default value of join_collapse_limit was 20. When this value is set and
the query contains about 20 joins (see added test), Postgres query optimizer
cannot build a plan during hours and consumes a lot of memory, because the
planner checks a lot of possible ways to join the tables.
When join_collapse_limit is 13, the query plan is built in reasonable time.

@andr-sokolov
Copy link
Contributor Author

I suggest join_collapse_limit=8 to reduce the difference from PostgreSQL 14, but you can consider slightly larger values. In the table below there are time and memory consumption to process the query from the test.

join_collapse_limit Query processing time, ms Backend memory usage
8 96 52 MB
9 230 88 MB
10 653 210 MB
11 1269 394 MB
12 2994 889 MB
13 9003 2.4 GB
14 27245 7.3 GB
15 80904 22 GB
16 670578 66 GB
17 4158314 200 GB

Query processing time was got using \timing on in psql. Backend memory usage was measured at the end of the planner function

@leborchuk
Copy link
Contributor

I suggest join_collapse_limit=8 to reduce the difference from PostgreSQL 14, but you can consider slightly larger values. In the table below there are time and memory consumption to process the query from the test.

join_collapse_limit Query processing time, ms Backend memory usage
8 96 52 MB
9 230 88 MB
10 653 210 MB
11 1269 394 MB
12 2994 889 MB
13 9003 2.4 GB
14 27245 7.3 GB
15 80904 22 GB
16 670578 66 GB
17 4158314 200 GB
Query processing time was got using \timing on in psql. Backend memory usage was measured at the end of the planner function

Thank you such interesting research!

I would suggest setting join_collapse_limit to 13 or 14 for your choice.

I agree, 20 is too much, but 8 is also too small. The reason is to exchange execution time for planning time and free memory (on the master). However, the master has its own limitations. Join_collapse_limit should be set to a value that prevents exhaustion of resources by a single user. Typically, even test/development installations have 8 GB of free memory space. I'm not sure about 22 GB; it seems quite large. I have seen many dev/test instances with 10 GB to 16 GB of free memory space.

@andr-sokolov
Copy link
Contributor Author

I would suggest setting join_collapse_limit to 13 or 14 for your choice.

I agree, 20 is too much, but 8 is also too small. The reason is to exchange execution time for planning time and free memory (on the master). However, the master has its own limitations. Join_collapse_limit should be set to a value that prevents exhaustion of resources by a single user. Typically, even test/development installations have 8 GB of free memory space. I'm not sure about 22 GB; it seems quite large. I have seen many dev/test instances with 10 GB to 16 GB of free memory space.

I have replaced 8 with 13. 14 is too high, because in this case I get ERROR: Canceling query because of high VMEM usage. Used: 7376MB, available 816MB, red zone: 7372MB (runaway_cleaner.c:189) on demo cluster with default gp_vmem_protect_limit value

@andr-sokolov andr-sokolov changed the title Set join_collapse_limit default value to the same as used in PostgreSQL 14 Set join_collapse_limit default value to 13 Jan 13, 2026
Copy link
Contributor

@Smyatkin-Maxim Smyatkin-Maxim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for anything between 12 and 13

@avamingli
Copy link
Contributor

I suggest join_collapse_limit=8 to reduce the difference from PostgreSQL 14, but you can consider slightly larger values. In the table below there are time and memory consumption to process the query from the test.

join_collapse_limit Query processing time, ms Backend memory usage
8 96 52 MB
9 230 88 MB
10 653 210 MB
11 1269 394 MB
12 2994 889 MB
13 9003 2.4 GB
14 27245 7.3 GB
15 80904 22 GB
16 670578 66 GB
17 4158314 200 GB
Query processing time was got using \timing on in psql. Backend memory usage was measured at the end of the planner function

I’m surprised that the planner consumes so much memory — typically it uses very little. Your test results are surprising to me. Could you share the complete test setup so others can reproduce it?

Another point is that this parameter is intended to generate better query plans. Reducing it could lead to worse plans. Memory usage during the planning phase is usually negligible compared to execution memory.

We need to verify in standard benchmarks whether lowering this parameter causes any performance regression. Typically, we should test it with TPCDS at 1TB scale.

@andr-sokolov
Copy link
Contributor Author

Could you share the complete test setup so others can reproduce it?

I ran queries on standard demo cluster.

do $$
begin 
  for i in 1..20 loop
    execute 'create table tj' ||  i ||  '(id int)';
  end loop; 
end
$$;

set optimizer to off;

select *
from tj1 
  join tj2 on tj1.id = tj2.id
  join tj3 on tj2.id = tj3.id
  join tj4 on tj3.id = tj4.id
  join tj5 on tj4.id = tj5.id
  join tj6 on tj5.id = tj6.id
  join tj7 on tj6.id = tj7.id
  join tj8 on tj7.id = tj8.id
  join tj9 on tj8.id = tj9.id
  join tj10 on tj9.id = tj10.id
  join tj11 on tj10.id = tj11.id
  join tj12 on tj11.id = tj12.id
  join tj13 on tj12.id = tj13.id
  join tj14 on tj13.id = tj14.id
  join tj15 on tj14.id = tj15.id
  join tj16 on tj15.id = tj16.id
  join tj17 on tj16.id = tj17.id
  join tj18 on tj17.id = tj18.id
  join tj19 on tj18.id = tj19.id
  join tj20 on tj19.id = tj20.id;

OS: Ubuntu 22.04.5
RAM: 32 GB
Processor: Core i7 13700H

@avamingli
Copy link
Contributor

Could you share the complete test setup so others can reproduce it?

I ran queries on standard demo cluster.

do $$
begin 
  for i in 1..20 loop
    execute 'create table tj' ||  i ||  '(id int)';
  end loop; 
end
$$;

set optimizer to off;

select *
from tj1 
  join tj2 on tj1.id = tj2.id
  join tj3 on tj2.id = tj3.id
  join tj4 on tj3.id = tj4.id
  join tj5 on tj4.id = tj5.id
  join tj6 on tj5.id = tj6.id
  join tj7 on tj6.id = tj7.id
  join tj8 on tj7.id = tj8.id
  join tj9 on tj8.id = tj9.id
  join tj10 on tj9.id = tj10.id
  join tj11 on tj10.id = tj11.id
  join tj12 on tj11.id = tj12.id
  join tj13 on tj12.id = tj13.id
  join tj14 on tj13.id = tj14.id
  join tj15 on tj14.id = tj15.id
  join tj16 on tj15.id = tj16.id
  join tj17 on tj16.id = tj17.id
  join tj18 on tj17.id = tj18.id
  join tj19 on tj18.id = tj19.id
  join tj20 on tj19.id = tj20.id;

OS: Ubuntu 22.04.5 RAM: 32 GB Processor: Core i7 13700H

Thank you for sharing the test case—I was able to reproduce the issue. Indeed, the planner does become very slow in this scenario. The reason is that with 20 tables joined purely via inner joins and no restrictions (such as LEFT JOIN), the join order can be rearranged arbitrarily. In GPDB and our Cloudberry, join ordering relies on dynamic programming for search. Consider the number of possible join permutations with 20 tables—this becomes extremely costly.

This highlights a fundamental limitation: unlike upstream PostgreSQL, which can fall back to genetic algorithm-based join search when the number of joins exceeds a threshold, we currently do not inherit that mechanism. I recall this topic was previously discussed with @reshke

I appreciate you raising this. Overall, I’m supportive of exploring adjustments here.

The current default of 20 has been inherited from Greenplum and has remained unchanged for years. To my knowledge, no customer has reported this issue before, and as your PR shows no plan differences, it suggests the impact may be limited in typical workloads. However, this parameter is quite sensitive. While your example—which uses only inner joins—justifies lowering it to around 12 or 13, we must be cautious: a lower limit could prevent the planner from finding optimal join orders in real-world complex queries. In queries like those in TPCDS, execution time often far outweighs planning time.

Therefore, before moving forward, I recommend we validate this change with a TPCDS 1T benchmark (using AOCO tables). If no plan changes are observed, I have no objection. If we can identify a better-performing setting through such testing, that would be even more valuable.

@avamingli
Copy link
Contributor

This highlights a fundamental limitation: unlike upstream PostgreSQL, which can fall back to genetic algorithm-based join search when the number of joins exceeds a threshold, we currently do not inherit that mechanism. I recall this topic was previously discussed with @reshke

here: #657

@avamingli
Copy link
Contributor

Therefore, before moving forward, I recommend we validate this change with a TPCDS 1T benchmark (using AOCO tables). If no plan changes are observed, I have no objection. If we can identify a better-performing setting through such testing, that would be even more valuable.

Based on my manual review of all 99 TPC-DS queries, I have confirmed that Query 72 contains the highest number of explicit JOINs: 11 tables with 10 JOIN operations, which is below the value modified in your PR. Therefore, this change should theoretically have no impact on query planning for explicit joins.

Regarding queries with a higher number of table references, such as Query 64, which lists over 18 tables in its FROM clause, these are presented as a FROM-LIST and are governed by the from_collapse_limit GUC. Consequently, your change will not affect them either.

This change looks good to me, thanks,

@avamingli avamingli force-pushed the fix_join_collapse_limit branch from cf638b4 to 6f095e0 Compare January 22, 2026 05:26
…QL 14

The default value of join_collapse_limit was 20. When this value is set and
the query contains about 20 joins (see added test), Postgres query optimizer
cannot build a plan during hours and consumes a lot of memory, because the
planner checks a lot of possible ways to join the tables.
When join_collapse_limit is 8, the query plan is built in reasonable time.
@avamingli avamingli force-pushed the fix_join_collapse_limit branch from 7215bbb to 32061ee Compare January 22, 2026 12:39
@avamingli avamingli merged commit b245e4d into apache:main Jan 22, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants