Thread

  1. Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

    Dharin Shah <dharinshah95@gmail.com> — 2025-12-24T00:47:16Z

    Hello,
    
    Following up on my earlier patch submission, I've reworked the zstd TOAST
    compression implementation based on our discussion here. The new patch now
    avoids the 20-byte extended header.
    
    Current Approach
    - New `VARTAG_ONDISK_ZSTD` (value 19) for ZSTD external storage
    - Maintains existing 16-byte varatt_external structure
    - ZSTD external-only (no inline compression)
    
    Note: Using a dedicated VARTAG_ONDISK_ZSTD keeps the on-disk TOAST pointer
    payload at 16 bytes, but it is not a general extensible metadata carrier.
    If PostgreSQL later adopts a more general extensible TOAST framework, this
    change should not block it; VARTAG_ONDISK_ZSTD would remain as a supported
    legacy encoding, while new toasted values could be written using the newer
    framework and old values rewritten via normal table rewrites.
    
    Storage (170 MB uncompressed):
        ZSTD: 22 MB (7.60x) - 38.7% space savings vs LZ4
        PGLZ: 36 MB (4.76x)
        LZ4:  36 MB (4.66x)
    
    Key findings:
    - Large values (>50KB): ZSTD 33% better compression than PGLZ (~30% better
    than LZ4)
    - Low-entropy data: ZSTD compresses what LZ77 methods cannot
    - Small values: ZSTD pays external overhead vs inline PGLZ/LZ4
    While ZSTD uses slightly less space overall, the external storage mechanism
    incurs a TOAST fetch overhead for small values, potentially impacting
    performance.
    Backwards Compatibility Tests
    - Mixed compression: Rows with PGLZ, LZ4, and ZSTD coexist and decompress
    correctly
    - Lazy recompression: ALTER COLUMN ... SET COMPRESSION zstd affects new
    data; existing data is lazily recompressed upon UPDATE or VACUUM FULL.
    - Inline vs external: Small values remain inline; large values use
    appropriate external compression.
    Data integrity: All data decompresses correctly across all methods.
    
    Trade-offs and Design Considerations
    
    - External-only avoids consuming cmid=3 and extended header complexity
    
    - Slice access: no ZSTD-specific optimization (follow-up area)
    
    - Hybrid inline/external for small values: not in this patch (feedback
    welcome)
    
    Reviewer Questions - Is vartag-based external-only acceptable?
    - Should compression level (currently 3) be configurable? - Is the external
    storage overhead for small values acceptable, or is hybrid inline/external
    behavior needed?
    Thanks, Dharin
    
    On Thu, Dec 18, 2025 at 11:44 PM Michael Paquier <michael@paquier.xyz>
    wrote:
    
    > On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote:
    > > I want to make sure I understand your main point: you're OK with a new
    > > `vartag_external`, but prefer we avoid increasing the heap TOAST pointer
    > > from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in
    > > the main heap tuple.
    >
    > That would be my choice, yes.  Not sure about the opinion of others on
    > this matter.
    >
    > > I also realize the "compatibility" of the extended header doesn't buy us
    > > much — we'll need to support the existing 16-byte varatt_external forever
    > > for backward compatibility. Adding a 20-byte structure just means two
    > > formats to maintain indefinitely.
    >
    > Yes.  Patches have to maintain on-disk compatibility.
    >
    > > A couple clarifying questions if we go with new vartag (e.g.,
    > > `VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as
    > > discriminator
    > > 1. How should we handle future methods beyond zstd? One tag per method,
    > or
    > > store a method id elsewhere (e.g., in TOAST chunk header)?
    >
    > My suspicion would be that we could either use a new set of vartags in
    > the future for each compression method.  When it comes to zstd there
    > is something that comes in play: we could set some bits related to
    > dictionnaries at tuple level.  Not sure if this is the best design or
    > if using an attribute-level option is more adapted (for example a
    > JSONB blob could be applied as an attribute with common keys in a
    > dictionnary saving a lot of on-disk space even before compression),
    > but keeping some bits free in the 16-byte header leaves this option
    > open with a new vartag_external.  Saying that, zstd is good enough
    > that I strongly suspect that we would not regret it for quite a few
    > years.  One issue that has pushed towards the addition of lz4 as an
    > option for toast compression is that pglz was worse in terms of CPU
    > cost.  zlib is also more expensive than lz4 or zstd, especially at
    > very high compression level for usually little compression gains.
    >
    > > 2. And re: "as long as the TOAST value is 32 bits" — are you referring to
    > > the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from
    > > extsize for method encoding)?
    >
    > I mean extending the TOAST value to 8 bytes, as per the following
    > issues:
    > https://www.postgresql.org/message-id/764273.1669674269%40sss.pgh.pa.us
    > https://commitfest.postgresql.org/patch/5830/
    >
    > > *Key findings (i guess well known at this point):*
    > > - ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ)
    > > - For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better
    > > - The T4 result showing zstd as "worse" is not about compression quality
    > -
    > > it's about missing inline storage support. ZSTD actually compresses
    > better,
    > > but pays unnecessary TOAST overhead.
    > >
    > > I'll share the detailed benchmark script with the next patch revision.
    > But
    > > also a potential path forward could be that we could just fully replace
    > > pglz (can bring it up later in different thread)
    >
    > I don't think that we will ever be able to remove pglz.  It would be
    > nice, as final result of course, but I also expect that not being able
    > to decompress pglz data is going to lead to a lot of user pain.  That
    > would be also very expensive to check at upgrade for large instances.
    >
    > > *On Testing and Patch Structure*
    > > Agreed on both points:
    > > - I'll use `compression_zstd.sql` following the `compression_lz4.sql`
    > > pattern (removing the test_toast_ext module)
    >
    > Okay.
    >
    > > - I'll split the GUC refactoring into a separate preparatory patch
    >
    > This refactoring, if done nicely, is worth an independent piece.  It's
    > something that I have actually done for the sake of the other thread,
    > though the result was not really much liked by others.  Perhaps I'm
    > just lacking imagination with this abstraction, and I'd surely welcome
    > different ideas.
    > --
    > Michael
    >