Thread

  1. Re: RFC: PostgreSQL Storage I/O Transformation Hooks

    Henson Choi <assam258@gmail.com> — 2025-12-29T04:41:34Z

    Hi hackers,
    
    This is the fourth version of the Storage I/O Transformation Hooks patch
    series for implementing Transparent Data Encryption (TDE) in PostgreSQL.
    
    Changes in v4:
    
    This version fixes cross-platform compatibility issues found in CI testing
    that caused failures on BSD and Windows:
    
    - Fixed BSD regression test warning about tablespace naming conventions
    (renamed to "regress_tde_tblspc")
    - Fixed Windows test failures caused by platform-specific shell commands
    (mkdir -p)
    - Replaced filesystem-based tablespace tests with
    allow_in_place_tablespaces approach for cross-platform compatibility
    
    The core hook infrastructure (patch 0001) and reference TDE implementation
    (patch 0002) remain unchanged from v3. Patch 0003 contains only the test
    compatibility fixes.
    
    Patch series:
    
    0001: Core hook infrastructure for I/O transformation
    0002: Reference TDE implementation using AES-256-CTR
    0003: Cross-platform test fixes for BSD and Windows
    
    Testing:
    
    The test_tde extension demonstrates:
    - Page-level encryption/decryption with AES-256-CTR
    - IV derivation using LSN, block number, and relation file number
    - Tablespace-level encryption configuration
    - WAL encryption support
    
    These fixes resolve the BSD and Windows test failures.
    
    Best regards,
    
    2025년 12월 28일 (일) PM 11:19, Henson Choi <assam258@gmail.com>님이 작성:
    
    > Hi,
    >
    > Here is v3 of the Storage I/O Transform Hooks patch.
    >
    > Changes from v2:
    > - Fix -Wincompatible-pointer-types error in bufmgr.c by casting
    >   &bufdata to (void **) for mdread_post_hook call
    >
    > v2 changes were:
    > - Add meson.build test configuration for test_tde extension
    >
    > --
    > Best regards,
    > Sungkyun Park
    >
    > 2025년 12월 28일 (일) PM 7:44, Henson Choi <assam258@gmail.com>님이 작성:
    >
    >> Updated patches with meson build support:
    >>
    >> v2:
    >> - Added meson.build for test_tde extension
    >> - Added test_tde to contrib/meson.build
    >>
    >> Regards,
    >> Henson Choi
    >>
    >> 2025년 12월 28일 (일) PM 6:47, Henson Choi <assam258@gmail.com>님이 작성:
    >>
    >>> Hello,
    >>>
    >>> Following up on the RFC, I am submitting the initial patch set for the
    >>> proposed infrastructure. These patches introduce a minimal hook-based
    >>> protocol to allow extensions to handle data transformation, such as TDE,
    >>> while keeping the PostgreSQL core independent of specific cryptographic
    >>> implementations.
    >>>
    >>> Implementation Details:
    >>>
    >>> Hook Points in Storage I/O Path
    >>> The patch introduces five strategic hook points:
    >>>
    >>> mdread_post_hook: Called after blocks are read from disk. The extension
    >>> can reverse-transform data in place.
    >>>
    >>> mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending
    >>> blocks. These hooks return a pointer to transformed buffers.
    >>>
    >>> xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for
    >>> WAL records during insertion and replay.
    >>>
    >>> Data Integrity and Checksum Protocol
    >>> To ensure robust error detection, the hooks follow a specific
    >>> verification protocol:
    >>>
    >>> On Write: The extension transforms the page, sets the Transform ID, then
    >>> recalculates the checksum on the transformed data.
    >>>
    >>> On Read: The extension verifies the on-disk checksum of the transformed
    >>> data first. After reverse-transformation, it clears the Transform ID and
    >>> recalculates the checksum for the plaintext data. This ensures corruption
    >>> is detected regardless of the transformation state.
    >>>
    >>> WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
    >>> For WAL records, I have introduced a specific block ID (251) to mark
    >>> transformed data. If the decryption extension is not loaded, the WAL reader
    >>> will encounter this unknown block ID and fail-fast, preventing the system
    >>> from incorrectly interpreting encrypted data as valid WAL records.
    >>>
    >>> PageHeader Transform ID (5-bit)
    >>> I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform
    >>> ID. This allows the engine and extensions to identify the transformation
    >>> state of a page (e.g., key versioning or algorithm type) without attempting
    >>> decryption. It ensures backward compatibility: pages with Transform ID 0
    >>> are treated as standard untransformed pages.
    >>>
    >>> Memory and Critical Section Safety
    >>> As demonstrated in the contrib/test_tde reference implementation, cipher
    >>> contexts are pre-allocated in _PG_init to avoid memory allocation during
    >>> critical sections. For WAL transformation,
    >>> MemoryContextAllowInCriticalSection() is used to allow buffer reallocation
    >>> within critical sections; if OOM occurs during buffer growth, it results in
    >>> a controlled PANIC.
    >>>
    >>> Performance Considerations
    >>> When hooks are not set (default), the overhead is limited to a single
    >>> NULL pointer comparison per I/O operation. This is architecturally
    >>> consistent with existing PostgreSQL hooks and is designed to have a
    >>> negligible impact on performance.
    >>>
    >>> Attached Patches:
    >>>
    >>> v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch:
    >>> Core infrastructure.
    >>> v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference
    >>> implementation using AES-256-CTR.
    >>>
    >>> I look forward to your comments and feedback.
    >>>
    >>> Regards,
    >>>
    >>> Henson Choi
    >>>
    >>> 2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258@gmail.com>님이 작성:
    >>>
    >>>> RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
    >>>> Technical Protocol Between RDBMS Core and Data Security Experts
    >>>>
    >>>> *Author:* Henson Choi assam258@gmail.com
    >>>>
    >>>> *Date:* 2025-12-28
    >>>>
    >>>> *PostgreSQL Version:* master (Development)
    >>>> ------------------------------
    >>>> 1. Summary & Motivation
    >>>>
    >>>> This RFC proposes the introduction of minimal hooks into the PostgreSQL
    >>>> storage layer and the addition of a *Transformation ID* field to the
    >>>> PageHeader.
    >>>> A Diplomatic Protocol Between Expert Groups
    >>>>
    >>>> The core motivation of this proposal is *“Separation of Concerns and
    >>>> Mutual Respect.”*
    >>>>
    >>>> Historically, discussions around Transparent Data Encryption (TDE) have
    >>>> often felt like putting security experts on trial in a foreign
    >>>> court—specifically, the “Court of RDBMS.” It is time to treat them not as
    >>>> defendants to be judged by database-specific rules, but as an *equal
    >>>> neighboring community* with their own specialized sovereignty.
    >>>>
    >>>> *The issue has never been a failure of technology, but rather a
    >>>> misplacement of the focal point.* While previous discussions were
    >>>> mired in the technicalities of “how to hardcode encryption into the core,”
    >>>> this proposal shifts the debate toward an architectural solution: “what
    >>>> interface the core should provide to external experts.”
    >>>>
    >>>>    - *RDBMS Experts* provide a trusted pipeline responsible for data
    >>>>    I/O paths and consistency.
    >>>>    - *Security Experts* take responsibility for the specialized domain
    >>>>    of encryption algorithms and key management.
    >>>>
    >>>> This hook system functions as a *Technical Protocol*—a high-level
    >>>> agreement that allows these two expert groups to exchange data securely
    >>>> without encroaching on each other’s territory.
    >>>> ------------------------------
    >>>> 2. Design Principles
    >>>>
    >>>>    1. *Delegation of Authority:* The core remains independent of
    >>>>    specific encryption standards, providing a “free territory” where security
    >>>>    experts can respond to an ever-changing security landscape.
    >>>>    2. *Diplomatic Convention:* The Transformation ID acts as a
    >>>>    communication protocol between the engine and the extension. The engine
    >>>>    uses this ID to identify the state of the data and hands over control to
    >>>>    the appropriate expert (the extension).
    >>>>    3. *Minimal Interference:* Overhead is kept near zero when hooks
    >>>>    are not in use, ensuring the native performance of the PostgreSQL engine.
    >>>>
    >>>> ------------------------------
    >>>> 3. Proposal Specifications 3.1 The Interface (Hook Points)
    >>>>
    >>>> We allow intervention by security experts through five contact points
    >>>> along the I/O path:
    >>>>
    >>>>    - *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
    >>>>    (Transformation of the data area)
    >>>>    - *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
    >>>>    transaction logs)
    >>>>
    >>>> 3.2 The Protocol Identifier (PageHeader Transformation ID)
    >>>>
    >>>> We allocate 5 bits of pd_flags to define the “Security State” of a
    >>>> page. This serves as a *Status Message* sent by the security expert to
    >>>> the engine, utilized for key versioning and as a migration marker.
    >>>> ------------------------------
    >>>> 4. Reference Implementation: contrib/test_tde A Standard Code of
    >>>> Conduct for Security Experts
    >>>>
    >>>> This reference implementation exists not as a commercial product, but
    >>>> to define the *Standards of the Diplomatic Protocol* that
    >>>> encryption/decryption experts must follow when entering the PostgreSQL
    >>>> domain.
    >>>>
    >>>>    1. *Deterministic IV Derivation:* Demonstrates how to achieve
    >>>>    cryptographic safety by trusting unique values provided by the engine
    >>>>    (e.g., LSN).
    >>>>    2. *Critical Section Safety:* Defines memory management regulations
    >>>>    that security logic must follow within “Critical Sections” to maintain
    >>>>    system stability.
    >>>>    3. *Hook Chaining:* Demonstrates a cooperative structure that
    >>>>    allows peaceful coexistence with other expert tools (e.g., compression,
    >>>>    auditing).
    >>>>
    >>>> ------------------------------
    >>>> 5. Scope
    >>>>
    >>>>    - *In-Scope:* Backend hook infrastructure, Transformation ID field,
    >>>>    and reference code demonstrating diplomatic protocol compliance.
    >>>>    - *Out-of-Scope:* Specific Key Management Systems (KMS), selection
    >>>>    of specific cryptographic algorithms, and integration with external tools.
    >>>>
    >>>> This proposal represents a strategic diplomatic choice: rather than the
    >>>> PostgreSQL core assuming all security responsibilities, it grants security
    >>>> experts a *sovereign territory through extensions* where they can
    >>>> perform at their best.
    >>>>
    >>>