Thread
-
Re: Proposal: Conflict log history table for Logical Replication
Nisha Moond <nisha.moond412@gmail.com> — 2026-05-22T04:51:21Z
On Wed, May 20, 2026 at 3:05 PM vignesh C <vignesh21@gmail.com> wrote: > > Rest of the comments were fixed. > The attached v37 version patch has the changes for the same. Also > Peter's comments on the documentation patch from [1] and Shveta's > comments from [2] are addressed in the attached patch. > Here are few comments based on v37 testing: 1) Should we consider using TOAST tables for tuple-data columns like remote_tuple and local_conflicts (the JSON columns)? This may be a corner case, but if the tuple data becomes too large to fit into an 8KB heap tuple, then the apply worker keeps failing while inserting into the CLT with errors like: ERROR: row is too big: size 19496, maximum size 8160 LOG: background worker "logical replication apply worker" (PID 41226) exited with exit code 1 Noticed that even disable_on_error=true does not disable the subscription in this case. We can think about optimizations such as deciding when TOAST tables should be created, or avoiding the error by trimming/capping the data size before inserting into the CLT if don't want TOAST. ~~~ 2) Currently, parallel apply workers do not seem to insert conflicts into the CLT. The parallel worker logs the conflict to the logfile and then exits with an error without handling CLT insertion. A small test to reproduce this with a 't1' table subscription using a CLT table: -- on publisher ALTER SYSTEM SET logical_decoding_work_mem = '64kB'; SELECT pg_reload_conf(); -- Create a conflict scenario on subscriber: pre-insert a row that will conflict INSERT INTO t1 VALUES (99999, 11); -- on publisher: big transaction that hits the conflict BEGIN; INSERT INTO t1 SELECT i, i FROM generate_series(1, 10000) i; INSERT INTO t1 VALUES (99999, 99); -- this conflicts COMMIT; logfile: ERROR: conflict detected on relation "public.t1": conflict=insert_exists DETAIL: Could not apply remote change: remote row (99999, 99). Key already exists in unique index "t1_pkey", modified locally in transaction 842 at 2026-05-21 21:10:51.497681+05:30: key (a)=(99999), local row (99999, 42). ... ERROR: logical replication parallel apply worker exited due to error CONTEXT: processing remote data for replication origin "pg_16398" during message type "INSERT" for replication target relation "public.t1" in transaction 720 logical replication parallel apply worker processing remote data for replication origin "pg_16398" during message type "STREAM COMMIT" in transaction 720, finished at 0/01AC9758 LOG: subscription "sub1" has been disabled because of an error ERROR: lost connection to the logical replication parallel apply worker LOG: background worker "logical replication parallel worker" (PID 66271) exited with exit code 1 ~~~ 3) I think somewhere in patch-0005, the remote_tuple and replica_identity columns may have been swapped. The replica identity key seems to be written into the remote_tuple column, while the remote slot row is written into replica_identity, for example: postgres=# select relname, conflict_type, remote_xid, remote_tuple, replica_identity from pg_conflict_log_for_subid_16398; relname | conflict_type | remote_xid | remote_tuple | replica_identity ---------+-----------------------+------------+--------------+------------------ t1 | insert_exists | 699 | | {"a":3,"b":11} t1 | update_origin_differs | 700 | {"a":3} | {"a":3,"b":111} (2 rows) -- Thanks, Nisha