Thread

  1. Fwd: [PATCH] Add zstd compression for TOAST using extended header format

    Dharin Shah <dharinshah95@gmail.com> — 2025-12-16T10:51:26Z

    Hello PG Hackers,
    
    Want to submit a patch that implements zstd compression for TOAST data
    using a 20-byte TOAST pointer format, directly addressing the concerns
    raised in prior discussions [1
    <https://www.postgresql.org/message-id/flat/CAFAfj_F4qeRCNCYPk1vgH42fDZpjQWKO%2Bufq3FyoVyUa5AviFA%40mail.gmail.com#e41c78674adfa4d16b2fa82e59faf9aa>
    ][2
    <https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com>
    ][3
    <https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS@paquier.xyz>].
    
    A bit of a background in the 2022 thread [3
    <https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS@paquier.xyz>],
    The overall suggestion was to have something extensible for the TOAST header
    
    i.e. something like:
    00 = PGLZ
    01 = LZ4
    10 = reserved for future emergencies
    11 = extended header with additional type byte
    
    This patch implements that idea.
    The new header format:
    
      struct varatt_external_extended {
          int32   va_rawsize;     /* same as legacy */
          uint32  va_extinfo;     /* cmid=3 signals extended format */
          uint8   va_flags;       /* feature flags */
          uint8   va_data[3];     /* va_data[0] = compression method */
          Oid     va_valueid;     /* same as legacy */
          Oid     va_toastrelid;  /* same as legacy */
      };
    
    *A few notes:*
    
    - Zstd only applies to external TOAST, not inline compression. The 2-bit
    limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine
    anyway. Zstd's wins show up on larger values.
    - A GUC use_extended_toast_header controls whether pglz/lz4 also use the
    20-byte format (defaults to off for compatibility, can enable it if you
    want consistency).
    - Legacy 16-byte pointers continue to work - we check the vartag to
    determine which format to read.
    
    The 4 extra bytes per pointer is negligible for typical TOAST data sizes,
    and it gives us room to grow.
    
    Regards,
    Dharin