Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()
Andres Freund <andres@anarazel.de>
From: Andres Freund <andres@anarazel.de>
To: "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>
Cc: Masahiko Sawada <sawada.mshk@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, "ocean_li_996@163.com" <ocean_li_996@163.com>, "pgsql-bugs@lists.postgresql.org" <pgsql-bugs@lists.postgresql.org>
Date: 2023-08-20T21:05:45Z
Lists: pgsql-bugs
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix uninitialized access to InitialRunningXacts during decoding after ERROR.
- c7256e6564fa 15.5 landed
- f7d25117ba87 14.10 landed
- c570bb4d61b6 13.13 landed
- 7e57208ed51a 12.17 landed
- feb4e218e5f9 11.22 landed
Hi, On 2023-08-18 04:21:53 +0000, Zhijie Hou (Fujitsu) wrote: > From ee1dfccc0306812c356c84bbd7e2558f27d7d348 Mon Sep 17 00:00:00 2001 > From: Hou Zhijie <houzj.fnst@cn.fujitsu.com> > Date: Thu, 17 Aug 2023 19:29:34 +0800 > Subject: [PATCH v4] cleanup decoding context in error cases > > Some of the management functions for logical decoding didn't clean up the > decoding context when an error occurs during decoding. This can > result in accessing unfreed resources and assert failure the next time we call > these functions. Fix it by cleaning up the decoding context and slots in error > cases. I don't think this is the right fix - at all. We shouldn't run arbitrary code after a failure, which we do by calling the shutdown callback. It's not going to expect to be called in a context where you're not allowed to do very much. Consider what would happen if FreeDecodingContext() or such failed - we'd not have executed InvalidateSystemCaches(), and therefore would continue with corrupted system caches. There's also no need to free memory here, that's already done via the memory context cleanup of the "surrounding" memory context (i.e. CurrentMemoryContext when StartupDecodingContext() is called) I think the real problem is that 272248a0c added InitialRunningXacts a global variable. If it just lived in in struct SnapBuild, this whole thing wouldn't be an issue? The commit justified this choice with avoiding an ABI breakage - but isn't that bogus? The struct is private to snapbuild.c. It doesn't live on disk (that's SnapBuildOnDisk). There's no ABI to preserve? Greetings, Andres Freund