AW: Assorted improvements in pg_dump
Hans Buschmann <buschmann@nidsa.net>
From: Hans Buschmann <buschmann@nidsa.net>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>, "Andres
Freund" <andres@anarazel.de>
Date: 2021-10-25T12:23:59Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
pg_dump: avoid unsafe function calls in getPolicies().
- b7333e826955 11.19 landed
- a5b26aaafe4f 13.10 landed
- 1ed6f1b9116c 12.14 landed
- 03ac48549438 14.7 landed
- 3e6e86abca01 15.0 landed
-
Postpone calls of unsafe server-side functions in pg_dump.
- e46e986baef0 13.10 landed
- b1f106420b1a 11.19 landed
- 55f30e6c7640 14.7 landed
- 344b7849200f 12.14 landed
- e3fcbbd623b9 15.0 landed
-
Account for TOAST data while scheduling parallel dumps.
- 65aaed22a849 15.0 landed
-
Use PREPARE/EXECUTE for repetitive per-object queries in pg_dump.
- be85727a3df7 15.0 landed
-
Avoid per-object queries in performance-critical paths in pg_dump.
- 9895961529ef 15.0 landed
-
Rethink pg_dump's handling of object ACLs.
- 0c9d84427f44 15.0 landed
-
Refactor pg_dump's tracking of object components to be dumped.
- 5209c0ba0bfd 15.0 landed
-
pg_dump: fix mis-dumping of non-global default privileges.
- 2acc84c6fd29 15.0 cited
________________________________________ 1. Von: Tom Lane <tgl@sss.pgh.pa.us> >Maybe we should back-patch 0005. OTOH, how likely is it that anyone >is wrangling tables exceeding 16TB on a machine with 32-bit off_t? >Or that poor parallel dump scheduling would be a real problem in >such a case? I tested your patch on Windows x64, pg15_devel_25.10.2021 against the customer database ( 2 core/4 thread NUC,32 GB RAM, 1 NVME-SSD, 4 jobs) pg_dump manually patched with your changes database pg14.0, 20 GB shared buffers. The dump of the database tables took 3min7sec for a 16 GB database resulting in a directory of 31.1 GB with 1628 files! The dump worked like a rush: full cpu-usage, finish. I don't have the old performance data available, but it is a real improvement, so backpatching may be really woth the effort. The former slowing-down table has a ratio from 5169 relpages to 673024 toastpages. Despite of the great disk usage (about dubbling the size from the db) directory mode seems to be by far the fastest mode especially for databases in the range 1TB+. For archiving purposes an extern_to_ postgres tool often fits better for compression and can be applied to the dumped data not withholding the dump process. I am still working on another big speedup in this scenario (comming soon). ----------------------------------------------------------- 2. Another suggestion considering pg_dump With some customer databases I follow a yearly practice of pg_dump/pg_restore to the new major version. This eliminates all bloat and does a full reindex, so the disk data layout is already quite clean. It would be very favorable to dump the pages according to the CLUSTER index when defined for a table. This would only concern the select to retrieve the rows and not harm pg_dump's logic. This would give perfectly reorganized tables in a pg_dump/pg_restore round. If a cluster index is defined by the customer, this expresses the whish to have the table layout in this way and nothing is introduced arbitrarily. I would suggest to have a flag (--cluster) for pg_dump to activate this new behavior. I think this is not immediately part of the current patchset, but should be taken into account for pg_dump ameliorations in PG15. In the moment I have not yet enough knowledge to propose a patch of this kind ( a logic similar to the cluster command itself). Perhaps someone could jump in ... Thanks for the patch and awaiting your thoughts Hans Buschmann