Thread

  1. PoC: pg_dump --filter-data <file> (like Oracle Where Clause on RMAN for specific tables)

    Kirk Wolak <wolakk@gmail.com> — 2025-08-15T16:10:58Z

    Hackers,
      I am asking for feedback on this.  We just migrated a client to PG and
    all is well.  Except the developers requested a copy of recent production
    data for testing.
    
      We have a handful of tables that have 90% of the data going back 30 years.
    We NEVER restore this data to Dev or Staging.  We used a special RMAN
    backup where these tables had a "WHERE clause" applied to them during the
    backup/dump process.
    
      It would be awesome if pg_dump offered something similar.  I am willing
    to code this up, but there are questions.  And I want to do it in a way
    that makes the most sense to the PG Community.
    
      My initial thought is a simple filtering file, of the format:
    schema.table = WHERE ...
    "schema"."Table" = WHERE ...
    "schema"."t1" = LIMIT 50000
    
      If the --filter-data <filename> is not specified, the code would not
    change anything.  If it was, it would see if the table was in the file with
    a filter, it would read that filter, and apply it to the resulting COPY
    command.
    
      I don't believe this impacts pg_restore.  But if we wanted to include any
    kind of messaging in the restore process that "pg_dump --filter-data was
    used, this is NOT a complete dump!", then I would appreciate that, and
    include it in pg_restore.
    
      Just to make the point.  The full pg_dump takes 60 minutes (During which,
    we must turn off certain features to avoid throwing errors/locking
    issues).  Excluding these tables takes 2-3 minutes.  (Side dumping limited
    versions of them with \COPY takes 3 minutes).  And frankly we have enough
    backups of the many years of data, we don't need daily snapshots of them,
    or to carry them around.
    
    Thanks in advance.  I am hoping that I am not the only one that would
    benefit from a filtered dump (as opposed to all or nothing).
    
    Finally, I considered using an entire query, which could allow data-masking
    and more complex queries if there are FKs involved.  But that seemed like a
    much bigger ask (and a potential foot-gun).
    
    PS: A Quick Hack feature we could leverage would be to flag the
    ignore-table-data to generate an EMPTY .dat file, and the internal toc.dat
    reference to load that file... THEN simply overwrite that file with our
    manual \COPY command.  This would be almost a trivial change, and would
    work for what we do/need.  BUT it feels "off" a bit.