Thread

  1. Re: RFC: PostgreSQL Storage I/O Transformation Hooks

    Henson Choi <assam258@gmail.com> — 2025-12-28T10:44:33Z

    Updated patches with meson build support:
    
    v2:
    - Added meson.build for test_tde extension
    - Added test_tde to contrib/meson.build
    
    Regards,
    Henson Choi
    
    2025년 12월 28일 (일) PM 6:47, Henson Choi <assam258@gmail.com>님이 작성:
    
    > Hello,
    >
    > Following up on the RFC, I am submitting the initial patch set for the
    > proposed infrastructure. These patches introduce a minimal hook-based
    > protocol to allow extensions to handle data transformation, such as TDE,
    > while keeping the PostgreSQL core independent of specific cryptographic
    > implementations.
    >
    > Implementation Details:
    >
    > Hook Points in Storage I/O Path
    > The patch introduces five strategic hook points:
    >
    > mdread_post_hook: Called after blocks are read from disk. The extension
    > can reverse-transform data in place.
    >
    > mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending
    > blocks. These hooks return a pointer to transformed buffers.
    >
    > xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for WAL
    > records during insertion and replay.
    >
    > Data Integrity and Checksum Protocol
    > To ensure robust error detection, the hooks follow a specific verification
    > protocol:
    >
    > On Write: The extension transforms the page, sets the Transform ID, then
    > recalculates the checksum on the transformed data.
    >
    > On Read: The extension verifies the on-disk checksum of the transformed
    > data first. After reverse-transformation, it clears the Transform ID and
    > recalculates the checksum for the plaintext data. This ensures corruption
    > is detected regardless of the transformation state.
    >
    > WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
    > For WAL records, I have introduced a specific block ID (251) to mark
    > transformed data. If the decryption extension is not loaded, the WAL reader
    > will encounter this unknown block ID and fail-fast, preventing the system
    > from incorrectly interpreting encrypted data as valid WAL records.
    >
    > PageHeader Transform ID (5-bit)
    > I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform
    > ID. This allows the engine and extensions to identify the transformation
    > state of a page (e.g., key versioning or algorithm type) without attempting
    > decryption. It ensures backward compatibility: pages with Transform ID 0
    > are treated as standard untransformed pages.
    >
    > Memory and Critical Section Safety
    > As demonstrated in the contrib/test_tde reference implementation, cipher
    > contexts are pre-allocated in _PG_init to avoid memory allocation during
    > critical sections. For WAL transformation,
    > MemoryContextAllowInCriticalSection() is used to allow buffer reallocation
    > within critical sections; if OOM occurs during buffer growth, it results in
    > a controlled PANIC.
    >
    > Performance Considerations
    > When hooks are not set (default), the overhead is limited to a single NULL
    > pointer comparison per I/O operation. This is architecturally consistent
    > with existing PostgreSQL hooks and is designed to have a negligible impact
    > on performance.
    >
    > Attached Patches:
    >
    > v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core
    > infrastructure.
    > v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference
    > implementation using AES-256-CTR.
    >
    > I look forward to your comments and feedback.
    >
    > Regards,
    >
    > Henson Choi
    >
    > 2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258@gmail.com>님이 작성:
    >
    >> RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
    >> Technical Protocol Between RDBMS Core and Data Security Experts
    >>
    >> *Author:* Henson Choi assam258@gmail.com
    >>
    >> *Date:* 2025-12-28
    >>
    >> *PostgreSQL Version:* master (Development)
    >> ------------------------------
    >> 1. Summary & Motivation
    >>
    >> This RFC proposes the introduction of minimal hooks into the PostgreSQL
    >> storage layer and the addition of a *Transformation ID* field to the
    >> PageHeader.
    >> A Diplomatic Protocol Between Expert Groups
    >>
    >> The core motivation of this proposal is *“Separation of Concerns and
    >> Mutual Respect.”*
    >>
    >> Historically, discussions around Transparent Data Encryption (TDE) have
    >> often felt like putting security experts on trial in a foreign
    >> court—specifically, the “Court of RDBMS.” It is time to treat them not as
    >> defendants to be judged by database-specific rules, but as an *equal
    >> neighboring community* with their own specialized sovereignty.
    >>
    >> *The issue has never been a failure of technology, but rather a
    >> misplacement of the focal point.* While previous discussions were mired
    >> in the technicalities of “how to hardcode encryption into the core,” this
    >> proposal shifts the debate toward an architectural solution: “what
    >> interface the core should provide to external experts.”
    >>
    >>    - *RDBMS Experts* provide a trusted pipeline responsible for data I/O
    >>    paths and consistency.
    >>    - *Security Experts* take responsibility for the specialized domain
    >>    of encryption algorithms and key management.
    >>
    >> This hook system functions as a *Technical Protocol*—a high-level
    >> agreement that allows these two expert groups to exchange data securely
    >> without encroaching on each other’s territory.
    >> ------------------------------
    >> 2. Design Principles
    >>
    >>    1. *Delegation of Authority:* The core remains independent of
    >>    specific encryption standards, providing a “free territory” where security
    >>    experts can respond to an ever-changing security landscape.
    >>    2. *Diplomatic Convention:* The Transformation ID acts as a
    >>    communication protocol between the engine and the extension. The engine
    >>    uses this ID to identify the state of the data and hands over control to
    >>    the appropriate expert (the extension).
    >>    3. *Minimal Interference:* Overhead is kept near zero when hooks are
    >>    not in use, ensuring the native performance of the PostgreSQL engine.
    >>
    >> ------------------------------
    >> 3. Proposal Specifications 3.1 The Interface (Hook Points)
    >>
    >> We allow intervention by security experts through five contact points
    >> along the I/O path:
    >>
    >>    - *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
    >>    (Transformation of the data area)
    >>    - *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
    >>    transaction logs)
    >>
    >> 3.2 The Protocol Identifier (PageHeader Transformation ID)
    >>
    >> We allocate 5 bits of pd_flags to define the “Security State” of a page.
    >> This serves as a *Status Message* sent by the security expert to the
    >> engine, utilized for key versioning and as a migration marker.
    >> ------------------------------
    >> 4. Reference Implementation: contrib/test_tde A Standard Code of Conduct
    >> for Security Experts
    >>
    >> This reference implementation exists not as a commercial product, but to
    >> define the *Standards of the Diplomatic Protocol* that
    >> encryption/decryption experts must follow when entering the PostgreSQL
    >> domain.
    >>
    >>    1. *Deterministic IV Derivation:* Demonstrates how to achieve
    >>    cryptographic safety by trusting unique values provided by the engine
    >>    (e.g., LSN).
    >>    2. *Critical Section Safety:* Defines memory management regulations
    >>    that security logic must follow within “Critical Sections” to maintain
    >>    system stability.
    >>    3. *Hook Chaining:* Demonstrates a cooperative structure that allows
    >>    peaceful coexistence with other expert tools (e.g., compression, auditing).
    >>
    >> ------------------------------
    >> 5. Scope
    >>
    >>    - *In-Scope:* Backend hook infrastructure, Transformation ID field,
    >>    and reference code demonstrating diplomatic protocol compliance.
    >>    - *Out-of-Scope:* Specific Key Management Systems (KMS), selection of
    >>    specific cryptographic algorithms, and integration with external tools.
    >>
    >> This proposal represents a strategic diplomatic choice: rather than the
    >> PostgreSQL core assuming all security responsibilities, it grants security
    >> experts a *sovereign territory through extensions* where they can
    >> perform at their best.
    >>
    >