(unnamed)
text/html
Filename: (unnamed)
Type: text/html
Part: 0
<div><div><div>Dear PostgreSQL Hackers,</div><div> </div><div>I’ve been testing the proposed v5 plan id work and found out instability of</div><div>computing the plan identifier after feeding different query texts that</div><div>produces the same physical plan trees but with different plan ids. The main</div><div>pattern regards with fields of Plan node structures that depend on positions of</div><div>RTEs in a RTE list.</div><div> </div><div>For your convenience, I’ve attached minimal reproducible examples that</div><div>demonstrate this pattern. Both are based on the v6-0001 patch, which</div><div>incorporates v5-0001..v5-0003 changes and fixes jumbling empty arrays.</div><div> </div><div>```</div><div>create table foo1 (num int);</div><div>create table foo2 (num int);</div><div> </div><div>insert into foo1 (num)</div><div>select *</div><div>from generate_series(1, 1000);</div><div> </div><div>insert into foo2 (num)</div><div>select *</div><div>from generate_series(1, 10);</div><div> </div><div>analyze foo1;</div><div>analyze foo2;</div><div> </div><div>set compute_plan_id to true;</div><div>```</div><div> </div><div>The first example changes join order between two tables:</div><div>```</div><div>explain (costs off, verbose)</div><div>select 1 from foo1 join foo2 on foo1.num = foo2.num;</div><div> </div><div> QUERY PLAN</div><div>-------------------------------------</div><div> Hash Join</div><div> Output: 1</div><div> Hash Cond: (foo1.num = foo2.num)</div><div> -> Seq Scan on public.foo1</div><div> Output: foo1.num</div><div> -> Hash</div><div> Output: foo2.num</div><div> -> Seq Scan on public.foo2</div><div> Output: foo2.num</div><div> Plan Identifier: 538643160186222168</div><div> </div><div>explain (costs off, verbose)</div><div>select 1 from foo2 join foo1 on foo1.num = foo2.num;</div><div> </div><div> QUERY PLAN</div><div>--------------------------------------</div><div> Hash Join</div><div> Output: 1</div><div> Hash Cond: (foo1.num = foo2.num)</div><div> -> Seq Scan on public.foo1</div><div> Output: foo1.num</div><div> -> Hash</div><div> Output: foo2.num</div><div> -> Seq Scan on public.foo2</div><div> Output: foo2.num</div><div> Plan Identifier: -953143034841089498</div><div>```</div><div>Here the reordering of relations in the JOIN operator changes their order in</div><div>the RTE list, which in turn changes RTE position-sensitive fields like `varno`</div><div>in Vars, and those differences leak into jumbling.</div><div> </div><div>The second example assumes transformation of some table expressions to a</div><div>new form like CTE-inlining or completely its eliminating. Let's see example:</div><div>```</div><div>explain (costs off, verbose)</div><div>WITH foo_cte as (</div><div> SELECT num FROM foo1)</div><div>select * from foo_cte;</div><div> </div><div> QUERY PLAN</div><div>--------------------------------------</div><div> Seq Scan on public.foo1</div><div> Output: foo1.num</div><div> Plan Identifier: 3494394630757173099</div><div> </div><div>explain (costs off, verbose)</div><div>select * from foo1;</div><div> </div><div> QUERY PLAN</div><div>--------------------------------------</div><div> Seq Scan on public.foo1</div><div> Output: num</div><div> Plan Identifier: 8116143677260771228</div><div>```</div><div>In the example with a CTE, RTE list contains the outdated subquery item and the</div><div>relation one, whereas without a CTE we have just the relation item. Although</div><div>the CTE is inlined, its RTE is not removed from the rtable; as a result,</div><div>position-sensitive fields (such as `varno` for Vars) differ from the no-CTE</div><div>case.</div><div> </div><div>A possible way to address it during jumbling process:</div><div>1. Remove/skip unused RTE list elements.</div><div>2. Sort active RTE list elements by some stable criteria.</div><div>3. Adjust fields referring to active RTEs.</div><div> </div><div>But the main challenge is identifying all “position-sensitive” fields across</div><div>node types efficiently and maintainably. I’d happy to see your feedback on</div><div>this issue.</div><div> </div><div>Additionally, I noticed that the `location` field is being jumbled for several</div><div>structures (for example, `Var`). Since it’s only a token location, I believe we</div><div>should not include it in the final plan id value.</div><div> </div><div>--</div><div>Sincerely,</div><div>Andrey Kazachkov</div></div></div>