Thread

  1. Re: RFC: PostgreSQL Storage I/O Transformation Hooks

    Tomas Vondra <tomas@vondra.me> — 2025-12-30T01:19:15Z

    Please don't top-post. We generally prefer to reply in-line, which makes
    it easier to follow the discussion. With top-posting I have to seek what
    are you responding to.
    
    On 12/29/25 03:35, Henson Choi wrote:
    > Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
    > 
    > Hi Tomas,
    > 
    > Thank you for this critical feedback. Your concerns go to the heart of
    > the proposal's viability, and I appreciate your directness.
    > 
    > 
    > 1. Multiple Extensions and Hook Chaining
    > 
    > You're right to question this. To be honest, I have significant doubts
    > about allowing multiple transformation extensions simultaneously.
    > 
    > The Transform ID coordination problem is real: without a registry or
    > protocol between extensions, they cannot cooperate safely. Hook chaining
    > for read/write operations might work (extension A encrypts, extension B
    > compresses), but the Transform ID field creates conflicts.
    > 
    > Perhaps I should be more direct: transformation hook chaining is not
    > realistically possible with the current design. TDE extensions would
    > need exclusive use of these hooks. This is a fundamental limitation I
    > should have stated clearly in the RFC.
    > 
    
    Isn't that just another argument against using hooks? Chaining is what
    hooks do, and there's no protection against a hook being set by multiple
    extensions.
    
    > 
    > 2. pd_flags Reservation - I Hope You'll Consider This
    > 
    > I understand your concern about reserving pd_flags bits for extensions.
    > However, I'd like to ask you to consider the reasoning behind this choice.
    > 
    > The 5-bit Transform ID serves a critical purpose: it allows the core to
    > identify the page's transformation state without attempting decryption.
    > This is important for:
    > 
    > - Error reporting: "This page is encrypted with transform ID 5, but no
    > extension is loaded to handle it"
    > - Migration safety: Distinguishing between untransformed pages (ID=0)
    > and transformed pages during gradual encryption
    > - Crash recovery: The core can detect transformation state inconsistencies
    > 
    > That said, I recognize pd_flags is precious and limited. Let me propose
    > an alternative approach that might better align with core principles:
    > 
    
    The information may be crucial, but pd_flags is simply not meant to be
    used by extensions to store custom data.
    
    > Instead of extension-specific Transform IDs, what if we allow extensions
    > to reserve space at pd_upper (similar to how special space works at
    > pd_special)?
    > 
    > The core could manage a small flag (2-3 bits) indicating "N bytes at
    > pd_upper are reserved for transformation metadata". By encoding N as
    > multiples of 2 or 4 bytes, we maximize the flag's efficiency:
    > 
    > - 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases)
    > - 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs)
    > - 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
    > 
    > This approach uses minimal pd_flags bits while providing substantial
    > metadata space. It would:
    > 
    > - Keep the flag in core control (not extension-specific)
    > - Allow extensions to store IV, authentication tags, key version, etc.
    > in a standardized location
    > - Be self-describing (the flag tells you how much space is reserved)
    > - Generalize beyond encryption (compression, checksums, etc. could use it)
    > 
    > In our internal implementation, we actually add opaque bytes to
    > PageHeader for encryption metadata. This pd_upper approach could
    > formalize that pattern for extensions.
    > 
    > I believe some form of page-level metadata for transformations is
    > necessary. Would either approach (Transform ID or pd_upper reservation)
    > be acceptable with the right design, or do you see fundamental issues
    > with page-level transformation metadata itself?
    > 
    
    AFAICS this is pretty much exactly what this patch aimed to do (also to
    allow implementing TDE):
    
    https://commitfest.postgresql.org/patch/3986/
    
    Clearly, it's not as simple as it may seem, otherwise the patch would
    not be WIP for 3 years.
    
    > 
    > 3. Maintenance Burden and Test Coverage
    > 
    > I deeply appreciate this concern. Having worked across various DBMS
    > implementations, I've seen solution vendors ship without comprehensive
    > regression testing - but never a database vendor. DBMS maintenance is
    > extraordinarily difficult, and storage errors are catastrophic.
    > 
    > This is precisely why test_tde exists as a reference implementation. But
    > you've identified the real issue: we need much stronger test coverage
    > for the hooks themselves.
    > 
    > The test cases should:
    > - Detect when core changes break hook contracts
    > - Verify hook behavior under all I/O paths (sync, async, error cases)
    > - Validate critical section safety
    > - Test interaction with checksums, crash recovery, replication
    > 
    > I agree the current test coverage is insufficient for core inclusion.
    > Would expanding the test suite to cover these scenarios address your
    > maintenance concerns, or do you see fundamental fragility beyond what
    > testing can solve?
    > 
    
    I wasn't talking about test coverage. My point is we'd have to keep this
    working forever, even if we choose to change how the SMGR works. Which
    is not entirely theoretical.
    
    > 
    > 4. Hooks vs Transform Layer - Pragmatic Timeline
    > 
    > You suggested improving SMGR extensibility rather than adding hooks. I
    > think you're architecturally right about the long-term direction.
    > 
    > However, I want to be pragmatic about timelines:
    > 
    > The hook and pd_flags approach, despite its limitations, can deliver
    > working TDE in the shortest time. Organizations facing regulatory
    > deadlines need something that works now, not in 2-3 years.
    > 
    
    Others may see it differently, but my opinion is using pd_flags is a
    dead end.
    
    I realize users may wish for a solution "soon", but we're not going to
    accept a flawed approach because of that. Exchanging short-term benefit
    for long-term pain does not seem like a good trade off.
    
    
    > That said, your feedback has sparked a better idea: what if we think of
    > this not as "SMGR extension" or "hooks" but as a pluggable Transform
    > Layer that SMGR and WAL subsystems delegate to?
    > 
    > Conceptually:
    > 
    >     Application Layer
    >            |
    >     Buffer Manager
    >            |
    >     +------------------+
    >     | Transform Layer  | <-- Encryption, etc.
    >     +------------------+
    >            |
    >       SMGR / WAL
    >            |
    >        File I/O
    > 
    > This is architecturally cleaner than scattered hooks, and more focused
    > than full SMGR extensibility. The Transform Layer would:
    > 
    > - Provide a unified interface for data transformation
    > - Work across backend, frontend tools, and replication
    > - Handle metadata management in a standardized way
    > - Support encryption, compression, or other transformations
    > 
    > I think this deserves its own discussion thread rather than conflating
    > it with the current hook proposal. Would you be interested in starting a
    > separate conversation about designing a Transform Layer interface for
    > PostgreSQL?
    > 
    
    Maybe. But I'm not convinced it'd be great to have many parallel thread
    discussing approaches for the same ultimate end goal.
    
    > In the meantime, the hook approach could serve organizations with
    > immediate needs, and extensions could migrate to the Transform Layer
    > once it's stabilized.
    > 
    
    It's not like there are no alternatives, though. We have FDE/LUKS,
    application-level encryption, etc. Now there's also pg_tde.
    
    FWIW the hypothetical migration would be far from trivial.
    
    > 
    > 5. Frontend Tool Access
    > 
    > Both SMGR and hook approaches face a shared limitation: frontend tools
    > (pg_checksums, pg_basebackup, etc.) that read files directly.
    > 
    
    I'm not a TDE expert, but I don't see why would tools like pg_basebackup
    need to be aware of this at all. A basebackup is just a filesystem copy.
    
    > I previously suggested allowing initdb to specify a shared library that
    > both backend and frontend can load for transformation. But as I
    > reconsider this, it feels like it converges toward the Transform Layer
    > idea: a well-defined interface that any PostgreSQL component can use.
    > 
    > This might be the real architectural question: not "hooks vs SMGR" but
    > "how should PostgreSQL provide transformation points that work across
    > backend, frontend, and replication boundaries?"
    > 
    
    Maybe. I was not proposing a new "transformation" layer, though. My
    suggestion was entirely within the current SMGR architecture.
    
    
    regards
    
    
    -- 
    Tomas Vondra