Thread

  1. Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

    Dharin Shah <dharinshah95@gmail.com> — 2025-12-31T15:02:24Z

    Thanks Michael,
    
    After looking more closely at your “8‑byte TOAST values / infinite loop”
    thread and patch series, I see this is very much the same direction you
    outlined there: introduce a normalized in-memory representation for
    external pointers (toast_external_data) and keep most call sites from
    having to reason about vartag_external/va_extinfo details directly [1].
    
    For this refactor patch I kept the decoder local to detoast.c to minimize
    scope and avoid committing to a broader API boundary too early. But if the
    consensus heads toward a shared interface closer to the format definitions
    (as in your toast_external approach), I’m happy to respin/rework this patch
    to align with that direction, rather than working on parallel
    abstractions. It should also be straightforward to mold this refactor in
    the direction of the 8‑byte value-id work without changing the overall
    detoast structure.
    
    On HAS_TCINFO flag: the intent is to make payload layout explicit. In the
    current code, “external is compressed” effectively implies “payload begins
    with tcinfo”, which is wired into fetch/slice logic. For a vartag-based
    follow-up (e.g., zstd), we may want compressed payloads without a tcinfo
    prefix, so having an explicit flag keeps detoast paths uniform and avoids
    method-specific shortcuts.
    
    Let me know what you’d prefer for next steps: keep this patch as a
    detoast-local refactor, or respin it to align more directly with a shared
    decoded external-pointer interface in the direction of the 8‑byte work.
    
    [1]
    https://www.postgresql.org/message-id/flat/CAN-LCVNsE4x0k11ZRWvU4ySTbe98fwA16qzV7p8dxogWnD5Jng%40mail.gmail.com#8253d63beab18b73706e3555ce19a0f4
    
    
    Thanks,
    Dharin
    
    On Tue, Dec 30, 2025 at 12:46 AM Michael Paquier <michael@paquier.xyz>
    wrote:
    
    > On Mon, Dec 29, 2025 at 02:45:27PM +0100, Dharin Shah wrote:
    > > The goal is to centralize “how do we interpret an external datum?” so
    > that
    > > detoast code paths don’t have to reason directly about va_extinfo
    > encoding
    > > vs payload layout details. This is intended as groundwork for a follow-up
    > > patch adding a new vartag-based method (e.g., zstd) without scattering
    > > special cases in detoast paths.
    >
    > +static bool
    > +decode_external_toast_pointer(const struct varlena *attr,
    > +                                                  DecodedExternalToast
    > *decoded)
    > [...]
    > +typedef enum ToastDecompressMethod
    > +{
    > +       TOAST_DECOMP_NONE = 0,
    > +       TOAST_DECOMP_PGLZ = 1,
    > +       TOAST_DECOMP_LZ4 = 2
    > +} ToastDecompressMethod;
    > +
    > +typedef struct DecodedExternalToast
    > +{
    > +       Oid                     toastrelid;
    > +       Oid                     valueid;
    > +       uint32          rawsize;                /* Decompressed size; for
    > future methods without tcinfo */
    > +       uint32          extsize;                /* On-disk payload size */
    > +       ToastDecompressMethod method;
    > +       uint8           flags;
    > +} DecodedExternalToast;
    >
    > Yeah, honestly this is a layer I have been thinking about as well as
    > one option, but contrary to you I have been focusing on putting that
    > into varatt.h, with the exception of the value being an Oid8.  I think
    > that you have an interesting point in focusing your implementation to
    > be stored in the detoast part, though.  I'd need to spend a bit more
    > time to see the result this would lead at with the larger 8-byte issue
    > in mind, but this is something that would come at no real cost as it
    > has no function pointer redirection compared to what I was first
    > envisioning on the other thread.  That's especially true if it makes
    > the CompressionId business easier to mold around when adding a new
    > vartag.
    >
    > > Why HAS_TCINFO?
    > > - Previously, “is compressed?” was used as a proxy for whether the
    > external
    > > payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO
    > > captures payload layout, which is distinct from whether the value is
    > > compressed. This separation is needed for future methods that may store
    > > external compressed payloads without tcinfo.
    >
    > It is possible to model the on-memory data as we want.  This
    > suggestion would be OK with some flags.
    > --
    > Michael
    >