Thread

Re: RFC: PostgreSQL Storage I/O Transformation Hooks

Tomas Vondra <tomas@vondra.me> — 2025-12-30T01:19:15Z
Please don't top-post. We generally prefer to reply in-line, which makes
it easier to follow the discussion. With top-posting I have to seek what
are you responding to.

On 12/29/25 03:35, Henson Choi wrote:
> Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
> 
> Hi Tomas,
> 
> Thank you for this critical feedback. Your concerns go to the heart of
> the proposal's viability, and I appreciate your directness.
> 
> 
> 1. Multiple Extensions and Hook Chaining
> 
> You're right to question this. To be honest, I have significant doubts
> about allowing multiple transformation extensions simultaneously.
> 
> The Transform ID coordination problem is real: without a registry or
> protocol between extensions, they cannot cooperate safely. Hook chaining
> for read/write operations might work (extension A encrypts, extension B
> compresses), but the Transform ID field creates conflicts.
> 
> Perhaps I should be more direct: transformation hook chaining is not
> realistically possible with the current design. TDE extensions would
> need exclusive use of these hooks. This is a fundamental limitation I
> should have stated clearly in the RFC.
> 

Isn't that just another argument against using hooks? Chaining is what
hooks do, and there's no protection against a hook being set by multiple
extensions.

> 
> 2. pd_flags Reservation - I Hope You'll Consider This
> 
> I understand your concern about reserving pd_flags bits for extensions.
> However, I'd like to ask you to consider the reasoning behind this choice.
> 
> The 5-bit Transform ID serves a critical purpose: it allows the core to
> identify the page's transformation state without attempting decryption.
> This is important for:
> 
> - Error reporting: "This page is encrypted with transform ID 5, but no
> extension is loaded to handle it"
> - Migration safety: Distinguishing between untransformed pages (ID=0)
> and transformed pages during gradual encryption
> - Crash recovery: The core can detect transformation state inconsistencies
> 
> That said, I recognize pd_flags is precious and limited. Let me propose
> an alternative approach that might better align with core principles:
> 

The information may be crucial, but pd_flags is simply not meant to be
used by extensions to store custom data.

> Instead of extension-specific Transform IDs, what if we allow extensions
> to reserve space at pd_upper (similar to how special space works at
> pd_special)?
> 
> The core could manage a small flag (2-3 bits) indicating "N bytes at
> pd_upper are reserved for transformation metadata". By encoding N as
> multiples of 2 or 4 bytes, we maximize the flag's efficiency:
> 
> - 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases)
> - 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs)
> - 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
> 
> This approach uses minimal pd_flags bits while providing substantial
> metadata space. It would:
> 
> - Keep the flag in core control (not extension-specific)
> - Allow extensions to store IV, authentication tags, key version, etc.
> in a standardized location
> - Be self-describing (the flag tells you how much space is reserved)
> - Generalize beyond encryption (compression, checksums, etc. could use it)
> 
> In our internal implementation, we actually add opaque bytes to
> PageHeader for encryption metadata. This pd_upper approach could
> formalize that pattern for extensions.
> 
> I believe some form of page-level metadata for transformations is
> necessary. Would either approach (Transform ID or pd_upper reservation)
> be acceptable with the right design, or do you see fundamental issues
> with page-level transformation metadata itself?
> 

AFAICS this is pretty much exactly what this patch aimed to do (also to
allow implementing TDE):

https://commitfest.postgresql.org/patch/3986/

Clearly, it's not as simple as it may seem, otherwise the patch would
not be WIP for 3 years.

> 
> 3. Maintenance Burden and Test Coverage
> 
> I deeply appreciate this concern. Having worked across various DBMS
> implementations, I've seen solution vendors ship without comprehensive
> regression testing - but never a database vendor. DBMS maintenance is
> extraordinarily difficult, and storage errors are catastrophic.
> 
> This is precisely why test_tde exists as a reference implementation. But
> you've identified the real issue: we need much stronger test coverage
> for the hooks themselves.
> 
> The test cases should:
> - Detect when core changes break hook contracts
> - Verify hook behavior under all I/O paths (sync, async, error cases)
> - Validate critical section safety
> - Test interaction with checksums, crash recovery, replication
> 
> I agree the current test coverage is insufficient for core inclusion.
> Would expanding the test suite to cover these scenarios address your
> maintenance concerns, or do you see fundamental fragility beyond what
> testing can solve?
> 

I wasn't talking about test coverage. My point is we'd have to keep this
working forever, even if we choose to change how the SMGR works. Which
is not entirely theoretical.

> 
> 4. Hooks vs Transform Layer - Pragmatic Timeline
> 
> You suggested improving SMGR extensibility rather than adding hooks. I
> think you're architecturally right about the long-term direction.
> 
> However, I want to be pragmatic about timelines:
> 
> The hook and pd_flags approach, despite its limitations, can deliver
> working TDE in the shortest time. Organizations facing regulatory
> deadlines need something that works now, not in 2-3 years.
> 

Others may see it differently, but my opinion is using pd_flags is a
dead end.

I realize users may wish for a solution "soon", but we're not going to
accept a flawed approach because of that. Exchanging short-term benefit
for long-term pain does not seem like a good trade off.


> That said, your feedback has sparked a better idea: what if we think of
> this not as "SMGR extension" or "hooks" but as a pluggable Transform
> Layer that SMGR and WAL subsystems delegate to?
> 
> Conceptually:
> 
>     Application Layer
>            |
>     Buffer Manager
>            |
>     +------------------+
>     | Transform Layer  | <-- Encryption, etc.
>     +------------------+
>            |
>       SMGR / WAL
>            |
>        File I/O
> 
> This is architecturally cleaner than scattered hooks, and more focused
> than full SMGR extensibility. The Transform Layer would:
> 
> - Provide a unified interface for data transformation
> - Work across backend, frontend tools, and replication
> - Handle metadata management in a standardized way
> - Support encryption, compression, or other transformations
> 
> I think this deserves its own discussion thread rather than conflating
> it with the current hook proposal. Would you be interested in starting a
> separate conversation about designing a Transform Layer interface for
> PostgreSQL?
> 

Maybe. But I'm not convinced it'd be great to have many parallel thread
discussing approaches for the same ultimate end goal.

> In the meantime, the hook approach could serve organizations with
> immediate needs, and extensions could migrate to the Transform Layer
> once it's stabilized.
> 

It's not like there are no alternatives, though. We have FDE/LUKS,
application-level encryption, etc. Now there's also pg_tde.

FWIW the hypothetical migration would be far from trivial.

> 
> 5. Frontend Tool Access
> 
> Both SMGR and hook approaches face a shared limitation: frontend tools
> (pg_checksums, pg_basebackup, etc.) that read files directly.
> 

I'm not a TDE expert, but I don't see why would tools like pg_basebackup
need to be aware of this at all. A basebackup is just a filesystem copy.

> I previously suggested allowing initdb to specify a shared library that
> both backend and frontend can load for transformation. But as I
> reconsider this, it feels like it converges toward the Transform Layer
> idea: a well-defined interface that any PostgreSQL component can use.
> 
> This might be the real architectural question: not "hooks vs SMGR" but
> "how should PostgreSQL provide transformation points that work across
> backend, frontend, and replication boundaries?"
> 

Maybe. I was not proposing a new "transformation" layer, though. My
suggestion was entirely within the current SMGR architecture.


regards


-- 
Tomas Vondra