Thread
-
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
Dharin Shah <dharinshah95@gmail.com> — 2025-12-24T00:47:16Z
Hello, Following up on my earlier patch submission, I've reworked the zstd TOAST compression implementation based on our discussion here. The new patch now avoids the 20-byte extended header. Current Approach - New `VARTAG_ONDISK_ZSTD` (value 19) for ZSTD external storage - Maintains existing 16-byte varatt_external structure - ZSTD external-only (no inline compression) Note: Using a dedicated VARTAG_ONDISK_ZSTD keeps the on-disk TOAST pointer payload at 16 bytes, but it is not a general extensible metadata carrier. If PostgreSQL later adopts a more general extensible TOAST framework, this change should not block it; VARTAG_ONDISK_ZSTD would remain as a supported legacy encoding, while new toasted values could be written using the newer framework and old values rewritten via normal table rewrites. Storage (170 MB uncompressed): ZSTD: 22 MB (7.60x) - 38.7% space savings vs LZ4 PGLZ: 36 MB (4.76x) LZ4: 36 MB (4.66x) Key findings: - Large values (>50KB): ZSTD 33% better compression than PGLZ (~30% better than LZ4) - Low-entropy data: ZSTD compresses what LZ77 methods cannot - Small values: ZSTD pays external overhead vs inline PGLZ/LZ4 While ZSTD uses slightly less space overall, the external storage mechanism incurs a TOAST fetch overhead for small values, potentially impacting performance. Backwards Compatibility Tests - Mixed compression: Rows with PGLZ, LZ4, and ZSTD coexist and decompress correctly - Lazy recompression: ALTER COLUMN ... SET COMPRESSION zstd affects new data; existing data is lazily recompressed upon UPDATE or VACUUM FULL. - Inline vs external: Small values remain inline; large values use appropriate external compression. Data integrity: All data decompresses correctly across all methods. Trade-offs and Design Considerations - External-only avoids consuming cmid=3 and extended header complexity - Slice access: no ZSTD-specific optimization (follow-up area) - Hybrid inline/external for small values: not in this patch (feedback welcome) Reviewer Questions - Is vartag-based external-only acceptable? - Should compression level (currently 3) be configurable? - Is the external storage overhead for small values acceptable, or is hybrid inline/external behavior needed? Thanks, Dharin On Thu, Dec 18, 2025 at 11:44 PM Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote: > > I want to make sure I understand your main point: you're OK with a new > > `vartag_external`, but prefer we avoid increasing the heap TOAST pointer > > from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in > > the main heap tuple. > > That would be my choice, yes. Not sure about the opinion of others on > this matter. > > > I also realize the "compatibility" of the extended header doesn't buy us > > much — we'll need to support the existing 16-byte varatt_external forever > > for backward compatibility. Adding a 20-byte structure just means two > > formats to maintain indefinitely. > > Yes. Patches have to maintain on-disk compatibility. > > > A couple clarifying questions if we go with new vartag (e.g., > > `VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as > > discriminator > > 1. How should we handle future methods beyond zstd? One tag per method, > or > > store a method id elsewhere (e.g., in TOAST chunk header)? > > My suspicion would be that we could either use a new set of vartags in > the future for each compression method. When it comes to zstd there > is something that comes in play: we could set some bits related to > dictionnaries at tuple level. Not sure if this is the best design or > if using an attribute-level option is more adapted (for example a > JSONB blob could be applied as an attribute with common keys in a > dictionnary saving a lot of on-disk space even before compression), > but keeping some bits free in the 16-byte header leaves this option > open with a new vartag_external. Saying that, zstd is good enough > that I strongly suspect that we would not regret it for quite a few > years. One issue that has pushed towards the addition of lz4 as an > option for toast compression is that pglz was worse in terms of CPU > cost. zlib is also more expensive than lz4 or zstd, especially at > very high compression level for usually little compression gains. > > > 2. And re: "as long as the TOAST value is 32 bits" — are you referring to > > the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from > > extsize for method encoding)? > > I mean extending the TOAST value to 8 bytes, as per the following > issues: > https://www.postgresql.org/message-id/764273.1669674269%40sss.pgh.pa.us > https://commitfest.postgresql.org/patch/5830/ > > > *Key findings (i guess well known at this point):* > > - ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ) > > - For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better > > - The T4 result showing zstd as "worse" is not about compression quality > - > > it's about missing inline storage support. ZSTD actually compresses > better, > > but pays unnecessary TOAST overhead. > > > > I'll share the detailed benchmark script with the next patch revision. > But > > also a potential path forward could be that we could just fully replace > > pglz (can bring it up later in different thread) > > I don't think that we will ever be able to remove pglz. It would be > nice, as final result of course, but I also expect that not being able > to decompress pglz data is going to lead to a lot of user pain. That > would be also very expensive to check at upgrade for large instances. > > > *On Testing and Patch Structure* > > Agreed on both points: > > - I'll use `compression_zstd.sql` following the `compression_lz4.sql` > > pattern (removing the test_toast_ext module) > > Okay. > > > - I'll split the GUC refactoring into a separate preparatory patch > > This refactoring, if done nicely, is worth an independent piece. It's > something that I have actually done for the sake of the other thread, > though the result was not really much liked by others. Perhaps I'm > just lacking imagination with this abstraction, and I'd surely welcome > different ideas. > -- > Michael >