Thread
-
clog segment truncation
Grigory Smolkin <smallkeen@gmail.com> — 2025-11-10T12:29:32Z
Hello, hackers! I`m currently investigating the case of one of our PostgreSQL instance having started issuing error messages about missing clog segment: ERROR: could not access status of transaction 1550558894 DETAIL: Could not open file "pg_xact/05C6": No such file or directory. After examining the WAL records just prior the first sighing of these errors, it became apparent, that clog segment was truncated by autovacuum rmgr: CLOG len (rec/tot): 38/ 38, tx: 0, lsn: 81A/87C899E0, prev 81A/87C895A0, desc: TRUNCATE page 48264; oldestXact 1581542039 After looking closely at the affected heap pages (there are scores of them) with tuples, created by 1550558894 transaction, and comparing them with pg_visibility state and pg_class.relfrozenxig, it became apparent, that there is may be something awry with either setting all_frozen bit in _vm, or oldestXact calculation by vacuum. According to pg_visibility, all_frozen bit is set: wms-svc-inspecting=# select * from pg_visibility('affected_relation', 5456224) wms-svc-inspecting-# ; all_visible | all_frozen | pd_all_visible -------------+------------+---------------- t | t | f (1 row) So all tuples on this page should be marked with HEAP_XMIN_FROZEN, but that is not the case SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM heap_page_items(get_raw_page('affected_relation', 5456224)) ; -[ RECORD 1 ]-------------+----------------------------------------------------------------------------- lp | 1 lp_off | 7976 lp_flags | 1 lp_len | 216 t_xmin | 1550558894 t_xmax | 0 t_field3 | 0 t_ctid | (5456224,1) t_infomask2 | 12 t_infomask | 2051 t_hoff | 32 t_bits | 1111111011100000 heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{}) -[ RECORD 2 ]-------------+----------------------------------------------------------------------------- lp | 2 lp_off | 7768 lp_flags | 1 lp_len | 208 t_xmin | 1550558894 t_xmax | 0 t_field3 | 0 t_ctid | (5456224,2) t_infomask2 | 12 t_infomask | 2307 t_hoff | 32 t_bits | 1111111111100000 heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID}",{}) -[ RECORD 3 ]-------------+----------------------------------------------------------------------------- lp | 3 lp_off | 7520 lp_flags | 1 lp_len | 248 t_xmin | 1550558894 t_xmax | 0 t_field3 | 0 t_ctid | (5456224,3) t_infomask2 | 12 t_infomask | 2051 t_hoff | 32 t_bits | 1111111111100000 heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{}) -[ RECORD 4 ]-------------+----------------------------------------------------------------------------- .... -[ RECORD 23 ]------------+----------------------------------------------------------------------------- lp | 23 lp_off | 272 lp_flags | 1 lp_len | 312 t_xmin | 1550558894 t_xmax | 0 t_field3 | 0 t_ctid | (5456224,23) t_infomask2 | 12 t_infomask | 2051 t_hoff | 32 t_bits | 1111111111100000 heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{}) Also pg_class.relfrozenxig contains xid which exceed 1550558894: # select relname, relfrozenxid from pg_class where oid = 'affected_relation'::regclass::oid; -[ RECORD 1 ]+----------- relname | affected_relation relfrozenxid | 1609822074 So is it possible that either all_frozen bit was set incorrectly or wansn't unset when tuples were created? I will be happy to provide any additional information if required. PostgreSQL version: 15.6, data_checksums: on -
Re: clog segment truncation
Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-10T13:05:26Z
On Mon Nov 10, 2025 at 9:29 AM -03, Smolkin Grigory wrote: > Hello, hackers! > > I`m currently investigating the case of one of our PostgreSQL instance > having started issuing error messages about missing clog segment: > ERROR: could not access status of transaction 1550558894 > DETAIL: Could not open file "pg_xact/05C6": No such file or directory. > Is this error happening with the LISTEN/NOTIFY? There is a discussion going on [1] that aims to fix this issue for the LISTEN/NOTIFY. [1] https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw%40mail.gmail.com -- Matheus Alcantara EDB: http://www.enterprisedb.com
-
Re: clog segment truncation
Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-10T13:43:47Z
On Mon Nov 10, 2025 at 9:29 AM -03, Smolkin Grigory wrote: > So is it possible that either all_frozen bit was set incorrectly or wansn't > unset when tuples were created? > I will be happy to provide any additional information if required. > PostgreSQL version: 15.6, data_checksums: on > Actually the 15.6 is quite old. I would also recommend to upgrade to the latest 15 release which is 15.14. I didn't find any specific release note about clog truncation but you are missing some important fixes though. There is this fix [1] on 15.8 but I'm not sure if it's fully related with your issue faced here. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c809a2b2d -- Matheus Alcantara EDB: http://www.enterprisedb.com
-
Re: clog segment truncation
Grigory Smolkin <smallkeen@gmail.com> — 2025-11-10T14:35:02Z
> On Mon Nov 10, 2025 at 9:29 AM -03, Smolkin Grigory wrote: > > Hello, hackers! > > > > I`m currently investigating the case of one of our PostgreSQL instance > > having started issuing error messages about missing clog segment: > > ERROR: could not access status of transaction 1550558894 > > DETAIL: Could not open file "pg_xact/05C6": No such file or directory. > > > Is this error happening with the LISTEN/NOTIFY? There is a discussion > going on [1] that aims to fix this issue for the LISTEN/NOTIFY. No, LISTEN/NOTIFY are not used. So do you have a step by step to reproduce the issue? Unfortunately, no. > Or at least some more information about what the server was running that > caused the > error? By the looks of it, nothing special, really, just plain inserts and occasional index-scan reads (tuple (5456224,2) looks like it was read by index-scan, because it has HEAP_XMIN_COMMITTED stamped on). Instance wasn't subjected to power outages, kernel crashes or hardware issues. On Mon, Nov 10, 2025 at 4:43 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > On Mon Nov 10, 2025 at 9:29 AM -03, Smolkin Grigory wrote: > > So is it possible that either all_frozen bit was set incorrectly or > wansn't > > unset when tuples were created? > > I will be happy to provide any additional information if required. > > PostgreSQL version: 15.6, data_checksums: on > > > Actually the 15.6 is quite old. > It is, but I didn't see any fixes that can possibly be related to this case, but maybe I've missed something. > > I didn't find any specific release note about clog truncation but you > are missing some important fixes though. > > There is this fix [1] on 15.8 but I'm not sure if it's fully related > with your issue faced here. > > [1] > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c809a2b2d > > Looks like it is more about multi-xact truncation, so I'm also not sure.