Thread

  1. Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

    Michael Paquier <michael@paquier.xyz> — 2025-12-29T23:45:47Z

    On Mon, Dec 29, 2025 at 02:45:27PM +0100, Dharin Shah wrote:
    > The goal is to centralize “how do we interpret an external datum?” so that
    > detoast code paths don’t have to reason directly about va_extinfo encoding
    > vs payload layout details. This is intended as groundwork for a follow-up
    > patch adding a new vartag-based method (e.g., zstd) without scattering
    > special cases in detoast paths.
    
    +static bool
    +decode_external_toast_pointer(const struct varlena *attr,
    +						   DecodedExternalToast *decoded)
    [...]
    +typedef enum ToastDecompressMethod
    +{
    +	TOAST_DECOMP_NONE = 0,
    +	TOAST_DECOMP_PGLZ = 1,
    +	TOAST_DECOMP_LZ4 = 2
    +} ToastDecompressMethod;
    +
    +typedef struct DecodedExternalToast
    +{
    +	Oid			toastrelid;
    +	Oid			valueid;
    +	uint32		rawsize;		/* Decompressed size; for future methods without tcinfo */
    +	uint32		extsize;		/* On-disk payload size */
    +	ToastDecompressMethod method;
    +	uint8		flags;
    +} DecodedExternalToast;
    
    Yeah, honestly this is a layer I have been thinking about as well as
    one option, but contrary to you I have been focusing on putting that
    into varatt.h, with the exception of the value being an Oid8.  I think
    that you have an interesting point in focusing your implementation to
    be stored in the detoast part, though.  I'd need to spend a bit more
    time to see the result this would lead at with the larger 8-byte issue
    in mind, but this is something that would come at no real cost as it
    has no function pointer redirection compared to what I was first
    envisioning on the other thread.  That's especially true if it makes
    the CompressionId business easier to mold around when adding a new
    vartag.
    
    > Why HAS_TCINFO?
    > - Previously, “is compressed?” was used as a proxy for whether the external
    > payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO
    > captures payload layout, which is distinct from whether the value is
    > compressed. This separation is needed for future methods that may store
    > external compressed payloads without tcinfo.
    
    It is possible to model the on-memory data as we want.  This
    suggestion would be OK with some flags.
    --
    Michael