Re: Assorted improvements in pg_dump
Hans Buschmann <buschmann@nidsa.net>
From: Hans Buschmann <buschmann@nidsa.net>
To: "tgl@sss.pgh.pa.us" <tgl@sss.pgh.pa.us>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Date: 2021-10-22T16:36:27Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
pg_dump: avoid unsafe function calls in getPolicies().
- b7333e826955 11.19 landed
- a5b26aaafe4f 13.10 landed
- 1ed6f1b9116c 12.14 landed
- 03ac48549438 14.7 landed
- 3e6e86abca01 15.0 landed
-
Postpone calls of unsafe server-side functions in pg_dump.
- e46e986baef0 13.10 landed
- b1f106420b1a 11.19 landed
- 55f30e6c7640 14.7 landed
- 344b7849200f 12.14 landed
- e3fcbbd623b9 15.0 landed
-
Account for TOAST data while scheduling parallel dumps.
- 65aaed22a849 15.0 landed
-
Use PREPARE/EXECUTE for repetitive per-object queries in pg_dump.
- be85727a3df7 15.0 landed
-
Avoid per-object queries in performance-critical paths in pg_dump.
- 9895961529ef 15.0 landed
-
Rethink pg_dump's handling of object ACLs.
- 0c9d84427f44 15.0 landed
-
Refactor pg_dump's tracking of object components to be dumped.
- 5209c0ba0bfd 15.0 landed
-
pg_dump: fix mis-dumping of non-global default privileges.
- 2acc84c6fd29 15.0 cited
Hello Tom! I noticed you are improving pg_dump just now. Some time ago I experimented with a customer database dump in parallel directory mode -F directory -j (2-4) I noticed it took quite long to complete. Further investigation showed that in this mode with multiple jobs the tables are processed in decreasing size order, which makes sense to avoid a long tail of a big table in one of the jobs prolonging overall dump time. Exactly one table took very long, but seemed to be of moderate size. But the size-determination fails to consider the size of toast tables and this table had a big associated toast-table of bytea column(s). Even with an analyze at loading time there where no size information of the toast-table in the catalog tables. I thought of the following alternatives to ameliorate: 1. Using pg_table_size() function in the catalog query Pos: This reflects the correct size of every relation Neg: This goes out to disk and may take a huge impact on databases with very many tables 2. Teaching vacuum to set the toast-table size like it sets it on normal tables 3. Have a command/function for occasionly setting the (approximate) size of toast tables I think with further work under the way (not yet ready), pg_dump can really profit from parallel/not compressing mode, especially considering the huge amount of bytea/blob/string data in many big customer scenarios. Thoughts? Hans Buschmann