Thread
-
Re: Is there value in having optimizer stats for joins/foreignkeys?
Tom Lane <tgl@sss.pgh.pa.us> — 2026-05-21T20:25:13Z
Alexandra Wang <alexandra.wang.oss@gmail.com> writes: > Here's v7, another attempt to fix the unstable tests. Hi Alexandra, I signed up for an in-person review of this at PGConf.dev, but the schedule doesn't seem to be working in favor of making that happen. If you see this and happen to run into me in the hallway, I'm happy to chat, but in any case here are my rather-hasty review notes. I think it's okay if v1 only handles 2-way joins, as long as the catalog representation is prepared for more. Restricting to cases where we can do index-based sampling seems fine too. Those things could be relaxed later if it seems worthwhile, but we'd have a creditable feature even without. I didn't read the sampling code in any detail. I think you will need to put more thought into what is user-friendly behavior in case the required index doesn't exist or doesn't have the right properties. (I think the tests for that might not be strong enough, either.) I think you could simplify some code noticeably if you included the anchor rel's OID as the first element of stxjoinrels[]. Yeah, it'd be redundant with stxrelid, but so what? It's not like pg_statistic_ext rows are narrow enough that anyone would notice the extra 4 bytes. I think this would simplify some of the relationships within the data structures, too, eg all varnos in the expressions could be considered to reference stxjoinrels[]. I don't love stxkeyrefs[]. I wonder if it's time to throw away stxkeys[], represent all the target columns as regular expression trees in stxexprs, and then special-case columns that are simple Vars where appropriate at execution. (In the same vein, I dislike the grammar's separation of plain columns from expressions; I'd like to replace stats_params with expr_list and sort it all out later. But perhaps that's material for a separate patch.) We will need to put more thought into permissions: I don't think requiring all the tables to have the same owner is workable. (What happens if someone tries to ALTER OWNER later?) However, if they don't all have the same owner, there are potential security problems, so the right restriction is not obvious. This is not necessary to solve now; there are bigger questions to worry about. But we'll need an answer before it's committable. It's not too soon to write some user-facing documentation. CREATE STATISTICS man page obviously needs attention, but also the discussion of extended stats in perform.sgml. And catalogs.sgml. I find that writing that sort of stuff helps to clarify where one's design is weak. regards, tom lane