Thread
-
Re: Skipping schema changes in publication
Shlok Kyal <shlok.kyal.oss@gmail.com> — 2025-12-16T09:20:41Z
On Thu, 11 Dec 2025 at 04:10, Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Shlok - > > Here are some review comments for v31-0001 (EXCEPT (tablelist)) > > ====== > Commit message > > 1. > The new syntax allows specifying excluded relations when creating or altering > a publication. For example: > CREATE PUBLICATION pub1 FOR ALL TABLES EXCEPT TABLE (t1,t2); > > ~ > > In v30, you removed all the ALTER PUBLICATION changes, so the "or > altering" in the message above also needs to be removed. > > ====== > doc/src/sgml/logical-replication.sgml > > 2. > <para> > - To add tables to a publication, the user must have ownership rights on the > - table. To add all tables in schema to a publication, the user must be a > - superuser. To create a publication that publishes all tables, all tables in > - schema, or all sequences automatically, the user must be a superuser. > + To create a publication using <literal>FOR ALL TABLES</literal>, > + <literal>FOR ALL SEQUENCES</literal> or > + <literal>FOR TABLES IN SCHEMA</literal>, the user must be a > superuser. To add > + <literal>ALL TABLES</literal> or <literal>TABLES IN SCHEMA</literal> to a > + publication, the user must be a superuser. To add tables to a publication, > + the user must have ownership rights on the table. > </para> > > This is a good improvement, but I was not sure why it is in this > patch. Should it be a separate thread for a docs improvement? > > ====== > src/backend/catalog/pg_publication.c > > GetTopMostAncestorInPublication: > > 3. > { > Oid ancestor = lfirst_oid(lc); > - List *apubids = GetRelationPublications(ancestor); > - List *aschemaPubids = NIL; > + List *apubids = NIL; > + List *aexceptpubids = NIL; > + List *aschemapubids = NIL; > + bool set_top = false; > + > + GetRelationPublications(ancestor, &apubids, &aexceptpubids); > > level++; > > - if (list_member_oid(apubids, puboid)) > + /* check if member of table publications */ > + set_top = list_member_oid(apubids, puboid); > + if (!set_top) > { > - topmost_relid = ancestor; > + aschemapubids = GetSchemaPublications(get_rel_namespace(ancestor)); > > - if (ancestor_level) > - *ancestor_level = level; > + /* check if member of schema publications */ > + set_top = list_member_oid(aschemapubids, puboid); > + > + /* > + * If the publication is all tables publication and the table is > + * not part of exception tables. > + */ > + if (!set_top && puballtables) > + set_top = !list_member_oid(aexceptpubids, puboid); > } > - else > + > + if (set_top) > { > - aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor)); > - if (list_member_oid(aschemaPubids, puboid)) > - { > - topmost_relid = ancestor; > + topmost_relid = ancestor; > > - if (ancestor_level) > - *ancestor_level = level; > - } > + if (ancestor_level) > + *ancestor_level = level; > } > > list_free(apubids); > - list_free(aschemaPubids); > + list_free(aschemapubids); > + list_free(aexceptpubids); > } > > That 'aschemapubids' can be declared and freed within the if block. > > ~~~ > > publication_add_relation: > > 4. > + /* > + * Check when a partition is excluded via EXCEPT TABLE while the > + * publication has publish_via_partition_root = true. > + */ > + if (pub->alltables && pub->pubviaroot && pri->except && > + targetrel->rd_rel->relispartition) > + ereport(WARNING, > > > This comment doesn't sound quite right: > > SUGGESTION > Handle the case where a partition is excluded by EXCEPT TABLE while > publish_via_partition_root = true. > > ~~~ > > 5. > + /* > + * Check when a partitioned table is excluded via EXCEPT TABLE while the > + * publication has publish_via_partition_root = false. > + */ > + if (pub->alltables && !pub->pubviaroot && pri->except && > + targetrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > + ereport(WARNING, > > Ditto. Reword like suggested in the previous review comment. > > ~~~ > > 6. > +/* > + * Get the list of publication oids associated with a specified relation. > + * pubids is filled with the list of publication oids the relation is part of. > + * except_pubids is filled with the list of publication oids the relation is > + * excluded from. > + * > + * This function returns true if the relation is part of any publication. > + */ > > Maybe putting 'pubids' and 'except_pubids' in single quotes will help > readability of this comment? > > Also, these are already Lists, so they are not filled with lists. > > SUGGESTION > Parameter 'pubids' returns the OIDs of the publications the relation is part of. > Parameter 'except_pubids' returns the OIDs of publications the > relation is excluded from. > > ~~~ > > GetPublicationRelations: > > 7. > /* > - * Gets list of relation oids for a publication. > + * Return the list of relation OIDs for a publication. > + * > + * For a FOR ALL TABLES publication, this returns the list of tables that were > + * explicitly excluded via an EXCEPT TABLE clause. > + * > + * For a FOR TABLE publication, this returns the list of tables explicitly > + * included in the publication. > * > - * This should only be used FOR TABLE publications, the FOR ALL > TABLES/SEQUENCES > - * should use GetAllPublicationRelations(). > + * Publications declared with FOR ALL TABLES or FOR ALL SEQUENCES should use > + * GetAllPublicationRelations() to obtain the complete set of tables covered by > + * the publication. > */ > List * > GetPublicationRelations(Oid pubid, PublicationPartOpt pub_partopt) > > 7a. > The function is called 'GetPublicationRelations', so it seems > unintuitive that it sometimes returns the list of all the tables that > are *excluded* from the publication. If you are going to have one > single function that does everything, then IMO it might be better to > hide that behind some wrapper functions like: > GetPublicationMemberRelations > GetPublicationExcludedRelations > > Consider also that all these assumptions might be OK today but they > won't be OK in the future. e.g. One day, when named FOR SEQUENCE > sq1,sq2 are supported then you will be alble to write a command like > FOR ALL TABLES EXCEPT (t1), FOR SEQUENCE sq1,sq2. That's going to be a > muddle of some included and some excluded relations. So, it is better > to cater for that scenario now, rather than have to rewrite all of > this function again in the future. e.g. Maybe instead of this function > returning one list it is better to return included/excluded Lists or > relations as output parameters? > > ~ > > 7b. > Also, comments like "Publications declared with FOR ALL TABLES or FOR > ALL SEQUENCES should use..." seems like too many assumptions are being > made. It would be better to enforce the calling requirements using > parameter checking and Asserts instead instead of hoping that callers > are going to abide by the comments. > > ~~~ > > GetAllPublicationRelations: > > 8. > + exceptlist = GetPublicationRelations(pubid, pubviaroot ? > + PUBLICATION_PART_ALL : > + PUBLICATION_PART_ROOT); > > This is similar to the above review comment. I'm not sure how you can > just assume that this must be the "except list" -- AFAICT this assumes > that 'GetAllPublicationRelations' can only be called by FOR ALL TABLES > (???). Seems like a lot of assumptions, that would be much better to > be enforced by Asserts in the code. > I agree with comments 7 and 8. I have added two functions 'GetPublicationIncludedRelations' and 'GetPublicationExcludedRelations'. To get Relations which are included or excluded in a publication. Both functions will call 'GetPublicationRelationsInternal' function. I have also reintroduced the 'except_flag' variable > ====== > src/backend/commands/publicationcmds.c > > pub_rf_contains_invalid_column: > > 9. > bool > pub_rf_contains_invalid_column(Oid pubid, Relation relation, List *ancestors, > - bool pubviaroot) > + bool pubviaroot, bool puballtables) > > I felt that 'puballtables' is more "important" than 'pubviaroot' so > maybe it should come earlier in the parameter list. (e.g. make it more > similar to 'pub_contains_invalid_column') > > ====== > src/backend/parser/gram.y > > 10. > + pub_except_obj_list opt_except_clause > > I felt that 'opt_except_clause' should better be called > 'opt_pub_except_clause' or 'pub_opt_except_clause' because without > 'pub' it is a bit vague. > I agree. I prefer 'opt_pub_except_clause'. By looking at other variables it better make sense to start the variable name with 'opt_' as it indicates that it is optional. Made changes for the same. > ~~~ > > 11. > +/* > + * ALL TABLES EXCEPT ( table_name [, ...] ) specification > + */ > > 11a > This comment should be up where all the other CREATE PUBLICATION > syntax is commented. > > ~ > > 11b. > Also, there is a missing optional "[TABLE]" part. > > ~~~ > > 12. > +pub_except_obj_list: PublicationExceptObjSpec > + { $$ = list_make1($1); } > + | pub_except_obj_list ',' PublicationExceptObjSpec > + { $$ = lappend($1, $3); } > + ; > + > +opt_except_clause: > + EXCEPT opt_table '(' pub_except_obj_list ')' { $$ = $4; } > + | /*EMPTY*/ { $$ = NIL; } > + ; > > I felt the clause should be defined before the obj list because that > seems the natural order to read these. > > ====== > src/bin/pg_dump/pg_dump.c > > 13. > +static SimplePtrList exceptinfo = {NULL, NULL}; > > Having this as global seems a bit hacky. It has nothing in common with > all the other nearby lists, which are commented as being based on > "patterns given by command-line switches" > I agree, I have added it in the PublicationInfo struct and made the corresponding code changes. > ~~~ > > dumpPublication: > > 14. > + /* Include exception tables if the publication has EXCEPT TABLEs */ > + for (SimplePtrListCell *cell = exceptinfo.head; cell; cell = cell->next) > + { > + PublicationRelInfo *pubrinfo = (PublicationRelInfo *) cell->ptr; > + TableInfo *tbinfo; > + > + if (pubinfo == pubrinfo->publication) > + { > + tbinfo = pubrinfo->pubtable; > > That 'tbinfo' can be declared within the "if". > > ~~~ > > 15. > + appendPQExpBuffer(query, "ONLY %s", fmtQualifiedDumpable(tbinfo)); > > ONLY is not the default. How did you decide that "ONLY" is the correct > thing to do here? > For pg_dump for publication we use "ONLY" by default while specifying the table For Alter publication we use similar thing: ``` appendPQExpBuffer(query, "ALTER PUBLICATION %s ADD TABLE ONLY", fmtId(pubinfo->dobj.name)); ``` Also if we specify a parent table in a publication(without ONLY) all its child tables are also added to the pg_publication_rel table. So when we dump such a publication we get something like: .... EXCEPT TABLE(ONLY parent_table, ONLY child_table)... > ~~~ > > getPublicationTables: > > 16. > - pubrinfo[j].dobj.objType = DO_PUBLICATION_REL; > + if (prexcept) > + pubrinfo[j].dobj.objType = DO_PUBLICATION_EXCEPT_REL; > + else > + pubrinfo[j].dobj.objType = DO_PUBLICATION_REL; > + > > Would a single assignment (ternary) make this code simpler and easier to read? > > SUGGESTION > pubrinfo[j].dobj.objType = prexcept ? > DO_PUBLICATION_EXCEPT_REL : > DO_PUBLICATION_REL; > > ====== > src/bin/pg_dump/t/002_pg_dump.pl > > 17. > + 'CREATE PUBLICATION pub10' => { > + create_order => 50, > + create_sql => > + 'CREATE PUBLICATION pub10 FOR ALL TABLES EXCEPT TABLE > (dump_test.test_table_generated);', > + regexp => qr/^ > + \QCREATE PUBLICATION pub10 FOR ALL TABLES EXCEPT TABLE (ONLY > dump_test.test_table_generated, ONLY > dump_test.test_table_generated_child2, ONLY > dump_test.test_table_generated_child1) WITH (publish = 'insert, > update, delete, truncate');\E > + /xm, > + like => { %full_runs, section_post_data => 1, }, > + }, > + > > These "generated" names seem unusual. I saw there are some other > tables like 'dump_test.test_inheritance_child' and > 'dump_test.test_inheritance_parent'. Can you use those more normal > table names instead? > > Also curious - does the order of the tests matter? I saw that the > CREATE TABLE tests seem to be coming after the CREATE PUBLICATION > tests that are using them. > I looked into it and came to the conclusion that this is controlled using 'create_order' while specifying the tests. Tests with a lower create_order value are executed earlier. So to ensure 'CREATE PUBLICATION' runs correctly we have to make sure the 'create_order' of these statements is higher than that of the respective 'CREATE TABLE' statement. > ~~~ > 18. > - if (!defined($tests{$test}->{all_runs}) > + if ( !defined($tests{$test}->{all_runs}) > > Why add this whitespace? > pg_perltidy makes this change. I have reverted it. > ====== > src/include/nodes/parsenodes.h > > 19. > AP_SetObjects, /* set list of objects */ > + AP_Reset, /* reset the publication */ > } AlterPublicationAction; > > AFAIK, you removed all ALTER command changes from v30-0001. So this > should not be here. > > ~~~ > > 20. > + bool for_all_tables; /* Special publication for all tables in db */ > AlterPublicationAction action; /* What action to perform with the given > * objects */ > } AlterPublicationStmt; > > AFAIK, you removed all ALTER command changes from v30-0001. So this > should not be here. > > ====== > src/test/regress/sql/publication.sql > > 21. > +SET client_min_messages = 'ERROR'; > +CREATE PUBLICATION testpub_foralltables_excepttable FOR ALL TABLES > EXCEPT TABLE (testpub_tbl1, testpub_tbl2); > +-- specify EXCEPT without TABLE > +CREATE PUBLICATION testpub_foralltables_excepttable1 FOR ALL TABLES > EXCEPT (testpub_tbl1); > > Should be 2 comments here for the 2x CREATE: > > # Exclude tables using FOR ALL TABLES EXCEPT TABLE (tablelist) > > # Exclude tables using FOR ALL TABLES EXCEPT (tablelist) > > ~~~ > > 22. > CREATE TABLE testpub_tbl3 (a int); > CREATE TABLE testpub_tbl3a (b text) INHERITS (testpub_tbl3); > > If you rename these tables like 'testpub_tbl_parent' and > 'testpub_tbl_child', it will be much easier to see what is going on. > > ~~~ > > 23. > +CREATE PUBLICATION testpub5 FOR ALL TABLES EXCEPT TABLE (testpub_tbl3); > > Missing comment -- something like: > # Exclude parent table, omitting both of 'ONLY' and '*' > > ~~~ > > 24. > +-- EXCEPT with wildcard: exclude table and all descendants > +CREATE PUBLICATION testpub6 FOR ALL TABLES EXCEPT TABLE (testpub_tbl3*); > > 24a. > TBH, I don't think this is a "wildcard" -- it is not doing any pattern > matching. IMO just call it an "asterisk" or a "star". > > ~ > > 24b. > And put a space before the '*' here. > > ====== > .../t/037_rep_changes_except_table.pl > > 25. > +# ============================================ > +# EXCEPT TABLE test cases for partition tables > +# ============================================ > +# Check behavior of EXCEPT TABLE together with publish_via_partition_root > +# when applied to a partitioned table and its partitions. > > > Really, that "Check behavior" sentence is generic for all of the > following tests, so it should also be (within the "=======" of the > previous comment) > > ~~~ > > 26. > +$node_publisher->safe_psql( > + 'postgres', qq( > + CREATE TABLE sch1.t1(a int) PARTITION BY RANGE(a); > + CREATE TABLE sch1.part1 PARTITION OF sch1.t1 FOR VALUES FROM (0) TO (5); > + CREATE TABLE sch1.part2 PARTITION OF sch1.t1 FOR VALUES FROM (6) TO (10); > + INSERT INTO sch1.t1 VALUES (1), (6); > +)); > + > +$node_subscriber->safe_psql( > + 'postgres', qq( > + CREATE TABLE sch1.t1(a int); > + CREATE TABLE sch1.part1(a int); > + CREATE TABLE sch1.part2(a int); > +)); > > 26a. > There should be a comment for this part that just says something like > "Setup partition table and partitions on the publisher that map to > normal tables on the subscriber" > > ~ > > 26b. > The INSERT should be done later, after the CREATE PUBLICATION but > before the CREATE SUBSCRIPTION. The pattern will be the same for all > the test cases. > > ~~~ > > 27. > +$node_publisher->safe_psql('postgres', > + "CREATE PUBLICATION tap_pub_part FOR ALL TABLES EXCEPT TABLE (sch1.part1)" > +); > > Even though the publish_via_partition_root is 'false' by default, I > think you should spell it out explicitly here for clarity. > > ~~~ > > 28. > +# EXCEPT TABLE (sch1.t1) with publish_via_partition_root = false > +# Excluding the partitioned table while publish_via_partition_root = false > +# still allows rows inserted into its partitions to be replicated. > > I felt you should word this differently. I don't think you should say > "inserted into its partitions" because actually, you inserted into the > partition table, and the data just ends up in the partitions. > > ~~~ > > 29. > +$node_publisher->safe_psql( > + 'postgres', qq( > + CREATE PUBLICATION tap_pub_part FOR ALL TABLES EXCEPT TABLE (sch1.t1); > + INSERT INTO sch1.t1 VALUES (1), (6); > +)); > > Ditto earlier comment. Better to explicitly say > "publish_via_partition_root=false". > I have also addressed the remaining comments and attached the latest patch. Thanks, Shlok Kyal