Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add options to control whether VACUUM runs vac_update_datfrozenxid.
- a46a7011b271 16.0 landed
Use catalog query to discover tables to process in vacuumdb
- e0c2933a767c 12.0 cited

BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

PG Bug reporting form <noreply@postgresql.org> — 2022-12-13T10:57:39Z

The following bug has been logged on the website:

Bug reference:      17717
Logged by:          Gunnar L
Email address:      postgresql@taljaren.se
PostgreSQL version: 15.0
Operating system:   Ubuntu Linux
Description:        

We have observed a significant slowdown in vacuumdb performance between
different versions of postgresql. And possibly also a memory issue.

We run a specific data model, where each customer has its own schema with
its own set of tables. Each database server hosts 16 databases, each
containing around 250 customer schemas. Due to postgres creating a new file
for each database object, we end up with around 5 million files on each
database server. This may or may not be related to the issue we're seeing
(new algorithms with new time complexity?)

We upgraded from postgresql 9.5 to postgresql 13, and noticed a significant
slowdown in how vacuumdb performs. Before, we could run a vacuumdb -a -z
each night, taking around 2 hours to complete. After the upgrade, we see a
constant 100% CPU utilization during the vacuumdb process (almost no I/O
activity), and vacuumdb cannot complete within a reasonable time. We're able
to vacuum about 3-4 databases each night.

We are able to recreate this issue, using a simple bash script to generate a
similar setup.

From local testing, here are our findings:

Concerning speed:
* Version 9.5, 10, 11 are fast  (9.5 slower than 10 and 11)
* Version 12, 13, 14 are very, very slow
* Version 15 is faster (a lot faster than 12,13,14) but not nearly as fast
as 10 or 11.

Concerning memory usage:
* Version 15 is using a lot more shared memory OR it might not be releasing
it properly after vacuuming a db.

These are the timings for vacuuming the 16 dbs.

Version   Seconds   Completed
------------------------------
9.5       412       16/16
10        178       16/16
11        166       16/16
12       8319        1/16 or 2/16 (manually aborted)
13      18853        3/16 or 4/16 (manually aborted)
14      16857        3/16 or 4/16 (manually aborted)
15        617        1/16 (crashed!)
15       4158        6/16 (crashed! --shm-size=256mb)
15       9500       16/16 (--shm-size=4096mb)

The timing of the only successful run for postgres 15 is somewhat flaky,
since the machine was suspended for about 1-1.5 hours so 9500 is only an
estimate, but the first run (1 db completed in 10 minutes) gives that it is
faster than 12-14 but slower than 10 and 11 (3 minutes to complete
everything)


The following describes our setup
This is the script (called  setup.sh)  we’re using to populate the databases
(we give a port number as parameter)

##### start of setup.sh
export PGPASSWORD=mysecretpassword
PORT=$1

echo ""> tables_$PORT.sql
for schema in `seq -w 1 250`; do
    echo "create schema schema$schema;" >> tables_$PORT.sql
    for table in `seq -w 1 500`; do
        echo "create table schema$schema.table$table (id int);" >>
tables_$PORT.sql
    done
done

echo "Setting up db: 01"
createdb -h localhost -U postgres -p $PORT  db01
psql -q -h localhost -U postgres -p $PORT db01 -f tables_$PORT.sql

# This seems to be the fastest way to create the databases
for db in `seq -w 2 16`; do
    echo "Setting up db: $db"
    createdb -h localhost -U postgres -p $PORT --template db01 db$db
done
####### end of setup.sh



To execute a test for a particular postgres version (in this example PG
9.5), we run the following. It will setup PG 9.5 on port 15432.

docker run --rm --name pg95 -e POSTGRES_PASSWORD=mysecretpassword -p
15432:5432 -d postgres:9.5
./setup.sh 15432
date; time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"; date

(The date commands are added to keep track of when tasks were started).





Here are complete set of commands and output and  comments 
(We use different ports for different versions of PG)

date; time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"; date
(The date commands since it takes some time to run)









time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db16"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"

real	6m52,070s
user	0m0,048s
sys	0m0,029s


time docker exec -it pg10 bash -c "vacuumdb -a -z -U postgres"
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db16"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"

real	2m58,354s
user	0m0,043s
sys	0m0,013s





time docker exec -it pg11 bash -c "vacuumdb -a -z -U postgres"
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db16"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"

real	2m46,181s
user	0m0,047s
sys	0m0,012s




date; time docker exec -it pg12 bash -c "vacuumdb -a -z -U postgres"; date
lör 10 dec 2022 18:57:43 CET
vacuumdb: vacuuming database "db01"
vacuumdb: vacuuming database "db02"
^CCancel request sent
vacuumdb: error: vacuuming of table "schema241.table177" in database "db02"
failed: ERROR:  canceling statement due to user request

real	138m39,600s
user	0m0,177s
sys	0m0,418s
lör 10 dec 2022 21:16:22 CET




date;time docker exec -it pg13 bash -c "vacuumdb -a -z -U postgres"
lör 10 dec 2022 07:22:32 CET
vacuumdb: vacuuming database "db01"
vacuumdb: vacuuming database "db02"
vacuumdb: vacuuming database "db03"
vacuumdb: vacuuming database "db04"
^CCancel request sent

real	314m13,172s
user	0m0,551s
sys	0m0,663s
lör 10 dec 2022 12:37:03 CET



date;time docker exec -it pg14 bash -c "vacuumdb -a -z -U postgres"; date
lör 10 dec 2022 14:15:37 CET
vacuumdb: vacuuming database "db01"
vacuumdb: vacuuming database "db02"
vacuumdb: vacuuming database "db03"
vacuumdb: vacuuming database "db04"
^CCancel request sent

real	280m57,172s
user	0m0,586s
sys	0m0,559s
lör 10 dec 2022 18:56:34 CET



date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date
lör 10 dec 2022 12:50:25 CET
vacuumdb: vacuuming database "db01"
vacuumdb: vacuuming database "db02"
vacuumdb: error: processing of database "db02" failed: ERROR:  could not
resize shared memory segment "/PostgreSQL.2952321776" to 27894720 bytes: No
space left on device

real	10m17,913s
user	0m0,030s
sys	0m0,049s
lör 10 dec 2022 13:00:43 CET

# it was faster, but we need  to extend shared memory to make it work


docker run --rm --name pg15 --shm-size=256mb -e
POSTGRES_PASSWORD=mysecretpassword -p 55555:5432 -d postgres:15

date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date
mån 12 dec 2022 08:56:17 CET
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db07"
vacuumdb: error: processing of database "db07" failed: ERROR:  could not
resize shared memory segment "/PostgreSQL.1003084622" to 27894720 bytes: No
space left on device

real	69m18,345s
user	0m0,217s
sys	0m0,086s
mån 12 dec 2022 10:05:36 CET



docker run --rm --name pg15 --shm-size=4096mb -e
POSTGRES_PASSWORD=mysecretpassword -p 55555:5432 -d postgres:15

date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date
mån 12 dec 2022 11:16:11 CET
vacuumdb: vacuuming database "db01"
…<snip>...
vacuumdb: vacuuming database "db16"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"

real	232m46,168s
user	0m0,227s
sys	0m0,467s
mån 12 dec 2022 15:08:57 CET



Here is the hardware that was used
AMD Ryzen 7 PRO 5850U with Radeon Graphics
8 Cores, 16 threads

$ free
               total        used        free      shared  buff/cache  
available
Mem:        28562376     5549716      752624     1088488    22260036   
21499752
Swap:         999420      325792      673628

Disk:	NVMe device, Samsung SSD 980 1TB

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-15T18:56:30Z

PG Bug reporting form <noreply@postgresql.org> writes:
> We run a specific data model, where each customer has its own schema with
> its own set of tables. Each database server hosts 16 databases, each
> containing around 250 customer schemas. Due to postgres creating a new file
> for each database object, we end up with around 5 million files on each
> database server. This may or may not be related to the issue we're seeing
> (new algorithms with new time complexity?)

> We upgraded from postgresql 9.5 to postgresql 13, and noticed a significant
> slowdown in how vacuumdb performs. Before, we could run a vacuumdb -a -z
> each night, taking around 2 hours to complete. After the upgrade, we see a
> constant 100% CPU utilization during the vacuumdb process (almost no I/O
> activity), and vacuumdb cannot complete within a reasonable time. We're able
> to vacuum about 3-4 databases each night.

I poked into this a little bit.  On HEAD, watching things with "perf"
identifies vac_update_datfrozenxid() as the main time sink.  It's not
hard to see why: that does a seqscan of pg_class, and it's invoked
at the end of each vacuum() call.  So if you try to vacuum each table
in the DB separately, you're going to end up spending O(N^2) time
in often-useless rescans of pg_class.  This isn't a huge problem in
ordinary-sized DBs, but with 125000 small tables in the DB it becomes
the dominant cost.

> Concerning speed:
> * Version 9.5, 10, 11 are fast  (9.5 slower than 10 and 11)
> * Version 12, 13, 14 are very, very slow
> * Version 15 is faster (a lot faster than 12,13,14) but not nearly as fast
> as 10 or 11.

The reason for the v12 performance change is that up through v11,
"vacuumdb -a -z" would just issue "VACUUM (ANALYZE);" in each DB.
So vac_update_datfrozenxid only ran once.  Beginning in v12 (commit
e0c2933a7), vacuumdb issues a separate VACUUM command for each
targeted table, which causes the problem.

I'm not sure why there's a performance delta from 14 to 15.
It doesn't look like vacuumdb itself had any material changes,
so we must have done something different on the backend side.
This may indicate that there's another O(N^2) behavior that
we got rid of in v15.  Anyway, that change isn't bad, so I did
not poke into it too much.

Conclusions:

* As a short-term fix, you could try using vacuumdb from v11
with the newer servers.  Or just do "psql -c 'vacuum analyze'"
and not bother with vacuumdb at all.  (On HEAD, with this
example database, 'vacuum analyze' takes about 7 seconds per DB
for me, versus ~10 minutes using vacuumdb.)

* To fix vacuumdb properly, it might be enough to get it to
batch VACUUMs, say by naming up to 1000 tables per command
instead of just one.  I'm not sure how that would interact
with its parallelization logic, though.  It's not really
solving the O(N^2) issue either, just pushing it further out.

* A better idea, though sadly not very back-patchable, could
be to expose a VACUUM option to control whether it runs
vac_update_datfrozenxid, so that vacuumdb can do that just
once at the end.  Considering that vac_update_datfrozenxid
requires an exclusive lock, the current behavior is poison for
parallel vacuuming quite aside from the O(N^2) issue.  This
might tie into some work Peter G. has been pursuing, too.

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Peter Geoghegan <pg@bowt.ie> — 2022-12-15T20:06:57Z

On Thu, Dec 15, 2022 at 10:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> * A better idea, though sadly not very back-patchable, could
> be to expose a VACUUM option to control whether it runs
> vac_update_datfrozenxid, so that vacuumdb can do that just
> once at the end.  Considering that vac_update_datfrozenxid
> requires an exclusive lock, the current behavior is poison for
> parallel vacuuming quite aside from the O(N^2) issue.  This
> might tie into some work Peter G. has been pursuing, too.

That sounds like a good idea to me. But do we actually need a VACUUM
option for this? I wonder if we could get away with having the VACUUM
command never call vac_update_datfrozenxid(), except when run in
single-user mode. It would be nice to make pg_xact/clog truncation
autovacuum's responsibility.

Autovacuum already does things differently to the VACUUM command, and
for reasons that seem related to this complaint about vacuumdb.
Besides, autovacuum is already on the hook to call
vac_update_datfrozenxid() for the benefit of databases that haven't
actually been vacuumed, per the do_autovacuum() comments right above
its vac_update_datfrozenxid() call.

-- 
Peter Geoghegan

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-15T21:57:16Z

Peter Geoghegan <pg@bowt.ie> writes:
> On Thu, Dec 15, 2022 at 10:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> * A better idea, though sadly not very back-patchable, could
>> be to expose a VACUUM option to control whether it runs
>> vac_update_datfrozenxid, so that vacuumdb can do that just
>> once at the end.  Considering that vac_update_datfrozenxid
>> requires an exclusive lock, the current behavior is poison for
>> parallel vacuuming quite aside from the O(N^2) issue.  This
>> might tie into some work Peter G. has been pursuing, too.

> That sounds like a good idea to me. But do we actually need a VACUUM
> option for this? I wonder if we could get away with having the VACUUM
> command never call vac_update_datfrozenxid(), except when run in
> single-user mode. It would be nice to make pg_xact/clog truncation
> autovacuum's responsibility.

I could get behind manual VACUUM not invoking vac_update_datfrozenxid
by default, perhaps.  But if it can never call it, then that is a
fairly important bit of housekeeping that is unreachable except by
autovacuum.  No doubt the people who turn off autovacuum are benighted,
but they're still out there.

Could we get somewhere by saying that manual VACUUM calls
vac_update_datfrozenxid only if it's a full-DB vacuum (ie, no table
was specified)?  That would fix the problem at hand.  However, it'd
mean (since v12) that a vacuumdb run never calls vac_update_datfrozenxid
at all, which would result in horrible problems for any poor sods
who think that a cronjob running "vacuumdb -a" is an adequate substitute
for autovacuum.

Or maybe we could modify things so that "autovacuum = off" doesn't prevent
occasional cycles of vac_update_datfrozenxid-and-nothing-else?

In the end I feel like a manual way to call vac_update_datfrozenxid
would be the least magical way of running this.

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Peter Geoghegan <pg@bowt.ie> — 2022-12-16T04:39:54Z

On Thu, Dec 15, 2022 at 1:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I could get behind manual VACUUM not invoking vac_update_datfrozenxid
> by default, perhaps.  But if it can never call it, then that is a
> fairly important bit of housekeeping that is unreachable except by
> autovacuum.  No doubt the people who turn off autovacuum are benighted,
> but they're still out there.

I wouldn't mind adding another option for this to VACUUM. We already
have a couple of VACUUM options that are only really needed as escape
hatches, or perhaps as testing tools used by individual Postgres
hackers. Another one doesn't seem too bad. The VACUUM command should
eventually become totally niche, so I'm not too concerned about going
overboard here.

> Could we get somewhere by saying that manual VACUUM calls
> vac_update_datfrozenxid only if it's a full-DB vacuum (ie, no table
> was specified)?  That would fix the problem at hand.

That definitely seems reasonable.

> Or maybe we could modify things so that "autovacuum = off" doesn't prevent
> occasional cycles of vac_update_datfrozenxid-and-nothing-else?

That's what I was thinking of. It seems like a more natural approach
to me, at least offhand.

I have to imagine that the vast majority of individual calls to
vac_update_datfrozenxid have just about zero chance of updating
datfrozenxid or datminmxid as things stand. There is bound to be some
number of completely static tables in every database (maybe just
system catalogs). Those static tables are bound to be the tables that
hold back datfrozenxid/datminmxid approximately all the time. To me
this suggests that vac_update_datfrozenxid should fully own the fact
that it's supposed to be called out of band, possibly only in
autovacuum.

Separately, I wonder if it would make sense to invent a new fast-path
for the VACUUM command that is designed to inexpensively determine
that it cannot possibly matter if vac_update_datfrozenxid is never
called, given the specifics (the details of the target rel and its
TOAST rel).

-- 
Peter Geoghegan

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-16T14:49:09Z

Peter Geoghegan <pg@bowt.ie> writes:
> On Thu, Dec 15, 2022 at 1:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Or maybe we could modify things so that "autovacuum = off" doesn't prevent
>> occasional cycles of vac_update_datfrozenxid-and-nothing-else?

> That's what I was thinking of. It seems like a more natural approach
> to me, at least offhand.

Seems worth looking into.  But I suppose the launch frequency would
have to be more often than the current behavior for autovacuum = off,
so it would complicate the logic in that area.

> I have to imagine that the vast majority of individual calls to
> vac_update_datfrozenxid have just about zero chance of updating
> datfrozenxid or datminmxid as things stand.

That is a really good point.  How about teaching VACUUM to track
the oldest original relfrozenxid and relminmxid among the table(s)
it processed, and skip vac_update_datfrozenxid unless at least one
of those matches the database's values?  For extra credit, also
skip if we didn't successfully advance the source rel's value.

This might lead to a fix that solves the OP's problem while not
changing anything fundamental, which would make it reasonable
to back-patch.

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Peter Geoghegan <pg@bowt.ie> — 2022-12-16T18:47:07Z

On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I have to imagine that the vast majority of individual calls to
> > vac_update_datfrozenxid have just about zero chance of updating
> > datfrozenxid or datminmxid as things stand.
>
> That is a really good point.  How about teaching VACUUM to track
> the oldest original relfrozenxid and relminmxid among the table(s)
> it processed, and skip vac_update_datfrozenxid unless at least one
> of those matches the database's values?  For extra credit, also
> skip if we didn't successfully advance the source rel's value.

Hmm. I think that that would probably work.

It would certainly work on 15+, because there tends to be "natural
diversity" among the relfrozenxid values seen for each table, due to
the "track oldest extant XID" work; we no longer see many tables that
all have the same relfrozenxid, that advance in lockstep. But even
that factor probably doesn't matter, since we only need one "laggard
relfrozenxid" static table for the scheme to work and work well. That
is probably a safe bet on all versions, though I'd have to check to be
sure.

> This might lead to a fix that solves the OP's problem while not
> changing anything fundamental, which would make it reasonable
> to back-patch.

That's a big plus. This is a nasty regression. I wouldn't call it a
must-fix, but it's bad enough to be worth fixing if we can come up
with a reasonably non-invasive approach.

-- 
Peter Geoghegan

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-16T20:33:33Z

Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That is a really good point.  How about teaching VACUUM to track
>> the oldest original relfrozenxid and relminmxid among the table(s)
>> it processed, and skip vac_update_datfrozenxid unless at least one
>> of those matches the database's values?  For extra credit, also
>> skip if we didn't successfully advance the source rel's value.

> Hmm. I think that that would probably work.

> It would certainly work on 15+, because there tends to be "natural
> diversity" among the relfrozenxid values seen for each table, due to
> the "track oldest extant XID" work; we no longer see many tables that
> all have the same relfrozenxid, that advance in lockstep. But even
> that factor probably doesn't matter, since we only need one "laggard
> relfrozenxid" static table for the scheme to work and work well. That
> is probably a safe bet on all versions, though I'd have to check to be
> sure.

Oh, I see your point: if a whole lot of tables have the same relfrozenxid
and it matches datfrozenxid, this won't help.  Still, we can hope that
that's an uncommon situation.  If we postulate somebody trying to use
scheduled "vacuumdb -z" in place of autovacuum, they shouldn't really have
that situation.  Successively vacuuming many tables should normally
result in the tables' relfrozenxids not being all the same, unless they
were unlucky enough to have a very long-running transaction holding back
the global xmin horizon the whole time.

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Michael Paquier <michael@paquier.xyz> — 2022-12-18T02:21:47Z

On Thu, Dec 15, 2022 at 01:56:30PM -0500, Tom Lane wrote:
> * To fix vacuumdb properly, it might be enough to get it to
> batch VACUUMs, say by naming up to 1000 tables per command
> instead of just one.  I'm not sure how that would interact
> with its parallelization logic, though.  It's not really
> solving the O(N^2) issue either, just pushing it further out.

I have been thinking about this part, and using a hardcoded rule for
the batches would be tricky.  The list of relations returned by the
scan of pg_class are ordered by relpages, so depending on the
distribution of the sizes (few tables with a large size and a lot of
table with small sizes, exponential distribution of table sizes), we
may finish with more downsides than upsides in some cases, even if we
use a linear rule based on the number of relations, or even if we
distribute the relations across the slots in a round robin fashion for
example.

In order to control all that, rather than a hardcoded rule, could it
be as simple as introducing an option like vacuumdb --batch=N
defaulting to 1 to let users control the number of relations grouped
in a single command with a round robin distribution for each slot?
--
Michael

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Christophe Pettus <xof@thebuild.com> — 2022-12-18T02:23:27Z


> In order to control all that, rather than a hardcoded rule, could it
> be as simple as introducing an option like vacuumdb --batch=N
> defaulting to 1 to let users control the number of relations grouped
> in a single command with a round robin distribution for each slot?

My first reaction to that is: Is it possible to explain to a DBA what N should be for a particular cluster?

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Nathan Bossart <nathandbossart@gmail.com> — 2022-12-18T23:55:00Z

On Thu, Dec 15, 2022 at 08:39:54PM -0800, Peter Geoghegan wrote:
> On Thu, Dec 15, 2022 at 1:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I could get behind manual VACUUM not invoking vac_update_datfrozenxid
>> by default, perhaps.  But if it can never call it, then that is a
>> fairly important bit of housekeeping that is unreachable except by
>> autovacuum.  No doubt the people who turn off autovacuum are benighted,
>> but they're still out there.
> 
> I wouldn't mind adding another option for this to VACUUM. We already
> have a couple of VACUUM options that are only really needed as escape
> hatches, or perhaps as testing tools used by individual Postgres
> hackers. Another one doesn't seem too bad. The VACUUM command should
> eventually become totally niche, so I'm not too concerned about going
> overboard here.

Perhaps there could also be an update-datfrozenxid function that vacuumdb
calls when finished with a database.  Even if vacuum becomes smarter about
calling vac_update_datfrozenxid, this might still be worth doing.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Michael Paquier <michael@paquier.xyz> — 2022-12-19T03:21:26Z

On Sat, Dec 17, 2022 at 06:23:27PM -0800, Christophe Pettus wrote:
> My first reaction to that is: Is it possible to explain to a DBA
> what N should be for a particular cluster?

Assuming that we can come up with a rather straight-forward still
portable rule for the distribution of the relations across of the
slots like something I mentioned above (which is not the best thing
depending on the sizes and the number of tables), that would be quite
tricky IMO.
--
Michael

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-28T20:13:23Z

[ redirecting to -hackers because patch attached ]

Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That is a really good point.  How about teaching VACUUM to track
>> the oldest original relfrozenxid and relminmxid among the table(s)
>> it processed, and skip vac_update_datfrozenxid unless at least one
>> of those matches the database's values?  For extra credit, also
>> skip if we didn't successfully advance the source rel's value.

> Hmm. I think that that would probably work.

I poked into that idea some more and concluded that getting VACUUM to
manage it behind the user's back is not going to work very reliably.
The key problem is explained by this existing comment in autovacuum.c:

     * Even if we didn't vacuum anything, it may still be important to do
     * this, because one indirect effect of vac_update_datfrozenxid() is to
     * update ShmemVariableCache->xidVacLimit.  That might need to be done
     * even if we haven't vacuumed anything, because relations with older
     * relfrozenxid values or other databases with older datfrozenxid values
     * might have been dropped, allowing xidVacLimit to advance.

That is, if the table that's holding back datfrozenxid gets dropped
between VACUUM runs, VACUUM would never think that it might have
advanced the global minimum.

I'm forced to the conclusion that we have to expose some VACUUM
options if we want this to work well.  Attached is a draft patch
that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options
(name bikeshedding welcome) and teaches vacuumdb to use them.

Light testing says that this is a win: even on the regression
database, which isn't all that big, I see a drop in vacuumdb's
runtime from ~260 ms to ~175 ms.  Of course this is a case where
VACUUM doesn't really have anything to do, so it's a best-case
scenario ... but still, I was expecting the effect to be barely
above noise with this many tables, yet it's a good bit more.

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Nathan Bossart <nathandbossart@gmail.com> — 2022-12-28T21:12:53Z

On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
> I'm forced to the conclusion that we have to expose some VACUUM
> options if we want this to work well.  Attached is a draft patch
> that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options
> (name bikeshedding welcome) and teaches vacuumdb to use them.

This is the conclusion I arrived at, too.  In fact, I was just about to
post a similar patch set.  I'm attaching it here anyway, but I'm fine with
proceeding with your version.

I think the main difference between your patch and mine is that I've
exposed vac_update_datfrozenxid() via a function instead of a VACUUM
option.  IMHO that feels a little more natural, but I can't say I feel too
strongly about it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-28T21:20:19Z

Nathan Bossart <nathandbossart@gmail.com> writes:
> I think the main difference between your patch and mine is that I've
> exposed vac_update_datfrozenxid() via a function instead of a VACUUM
> option.  IMHO that feels a little more natural, but I can't say I feel too
> strongly about it.

I thought about that but it seems fairly unsafe, because that means
that vac_update_datfrozenxid is executing inside a user-controlled
transaction.  I don't think it will hurt us if the user does a
ROLLBACK afterward --- but if he sits on the open transaction,
that would be bad, if only because we're still holding the
LockDatabaseFrozenIds lock which will block other VACUUMs.
There might be more hazards besides that; certainly no one has ever
tried to run vac_update_datfrozenxid that way before.

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Nathan Bossart <nathandbossart@gmail.com> — 2022-12-28T21:21:50Z

On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
> +	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
> +	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE && !failed)
> +	{
> +		executeCommand(conn, "VACUUM (ONLY_DATABASE_STATS);", echo);
> +	}

When I looked at this, I thought it would be better to send the command
through the parallel slot machinery so that failures would use the same
code path as the rest of the VACUUM commands.  However, you also need to
adjust ParallelSlotsWaitCompletion() to mark the slots as idle so that the
slot array can be reused after it is called.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Nathan Bossart <nathandbossart@gmail.com> — 2022-12-28T21:23:21Z

On Wed, Dec 28, 2022 at 04:20:19PM -0500, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> I think the main difference between your patch and mine is that I've
>> exposed vac_update_datfrozenxid() via a function instead of a VACUUM
>> option.  IMHO that feels a little more natural, but I can't say I feel too
>> strongly about it.
> 
> I thought about that but it seems fairly unsafe, because that means
> that vac_update_datfrozenxid is executing inside a user-controlled
> transaction.  I don't think it will hurt us if the user does a
> ROLLBACK afterward --- but if he sits on the open transaction,
> that would be bad, if only because we're still holding the
> LockDatabaseFrozenIds lock which will block other VACUUMs.
> There might be more hazards besides that; certainly no one has ever
> tried to run vac_update_datfrozenxid that way before.

That's a good point.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-28T22:05:39Z

Nathan Bossart <nathandbossart@gmail.com> writes:
> On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
>> +		executeCommand(conn, "VACUUM (ONLY_DATABASE_STATS);", echo);

> When I looked at this, I thought it would be better to send the command
> through the parallel slot machinery so that failures would use the same
> code path as the rest of the VACUUM commands.  However, you also need to
> adjust ParallelSlotsWaitCompletion() to mark the slots as idle so that the
> slot array can be reused after it is called.

Hm.  I was just copying the way commands are issued further up in the
same function.  But I think you're right: once we've done

	ParallelSlotsAdoptConn(sa, conn);

it's probably not entirely kosher to use the conn directly.

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Justin Pryzby <pryzby@telsasoft.com> — 2022-12-29T01:23:54Z

On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> That is a really good point.  How about teaching VACUUM to track
> >> the oldest original relfrozenxid and relminmxid among the table(s)
> >> it processed, and skip vac_update_datfrozenxid unless at least one
> >> of those matches the database's values?  For extra credit, also
> >> skip if we didn't successfully advance the source rel's value.
> 
> > Hmm. I think that that would probably work.
> 
> I'm forced to the conclusion that we have to expose some VACUUM
> options if we want this to work well.  Attached is a draft patch
> that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options
> (name bikeshedding welcome) and teaches vacuumdb to use them.

I was surprised to hear that this added *two* options.

I assumed it would look like:

VACUUM (UPDATE_DATABASE_STATS {yes,no,only})

-- 
Justin

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Justin Pryzby <pryzby@telsasoft.com> — 2022-12-29T01:29:10Z

On Sun, Dec 18, 2022 at 11:21:47AM +0900, Michael Paquier wrote:
> On Thu, Dec 15, 2022 at 01:56:30PM -0500, Tom Lane wrote:
> > * To fix vacuumdb properly, it might be enough to get it to
> > batch VACUUMs, say by naming up to 1000 tables per command
> > instead of just one.  I'm not sure how that would interact
> > with its parallelization logic, though.  It's not really
> > solving the O(N^2) issue either, just pushing it further out.
> 
> I have been thinking about this part, and using a hardcoded rule for
> the batches would be tricky.  The list of relations returned by the
> scan of pg_class are ordered by relpages, so depending on the
> distribution of the sizes (few tables with a large size and a lot of
> table with small sizes, exponential distribution of table sizes), we
> may finish with more downsides than upsides in some cases, even if we
> use a linear rule based on the number of relations, or even if we
> distribute the relations across the slots in a round robin fashion for
> example.

I've always found it weird that it uses "ORDER BY relpages".

I'd prefer if it could ORDER BY age(relfrozenxid) or
GREATEST(age(relfrozenxid), age(relminmxid)), at least if you specify
one of the --min-*age parms.  Or something less hardcoded and
unconfigurable.

-- 
Justin

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-29T02:17:54Z

Justin Pryzby <pryzby@telsasoft.com> writes:
> On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
>> I'm forced to the conclusion that we have to expose some VACUUM
>> options if we want this to work well.  Attached is a draft patch
>> that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options
>> (name bikeshedding welcome) and teaches vacuumdb to use them.

> I assumed it would look like:
> VACUUM (UPDATE_DATABASE_STATS {yes,no,only})

Meh.  We could do it like that, but I think options that look like
booleans but aren't are messy.

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Andres Freund <andres@anarazel.de> — 2022-12-29T03:03:29Z

Hi,

On 2022-12-28 15:13:23 -0500, Tom Lane wrote:
> [ redirecting to -hackers because patch attached ]
> 
> Peter Geoghegan <pg@bowt.ie> writes:
> > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> That is a really good point.  How about teaching VACUUM to track
> >> the oldest original relfrozenxid and relminmxid among the table(s)
> >> it processed, and skip vac_update_datfrozenxid unless at least one
> >> of those matches the database's values?  For extra credit, also
> >> skip if we didn't successfully advance the source rel's value.
> 
> > Hmm. I think that that would probably work.
> 
> I poked into that idea some more and concluded that getting VACUUM to
> manage it behind the user's back is not going to work very reliably.
> The key problem is explained by this existing comment in autovacuum.c:
> 
>      * Even if we didn't vacuum anything, it may still be important to do
>      * this, because one indirect effect of vac_update_datfrozenxid() is to
>      * update ShmemVariableCache->xidVacLimit.  That might need to be done
>      * even if we haven't vacuumed anything, because relations with older
>      * relfrozenxid values or other databases with older datfrozenxid values
>      * might have been dropped, allowing xidVacLimit to advance.
> 
> That is, if the table that's holding back datfrozenxid gets dropped
> between VACUUM runs, VACUUM would never think that it might have
> advanced the global minimum.

I wonder if a less aggressive version of this idea might still work. Perhaps
we could use ShmemVariableCache->latestCompletedXid or
ShmemVariableCache->nextXid to skip at least some updates?

Obviously this isn't going to help if there's a lot of concurrent activity,
but the case of just running vacuumdb -a might be substantially improved.


Separately I wonder if it's worth micro-optimizing vac_update_datfrozenxid() a
bit. I e.g. see a noticable speedup bypassing systable_getnext() and using
heap_getnext().  It's really too bad that we want to check for "in the future"
xids, otherwise we could use a ScanKey to filter at a lower level.

Greetings,

Andres Freund

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-29T17:22:58Z

I wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
>> I assumed it would look like:
>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only})

> Meh.  We could do it like that, but I think options that look like
> booleans but aren't are messy.

Note that I'm not necessarily objecting to there being just one option,
only to its values being a superset-of-boolean.  Perhaps this'd work:

VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY})

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Nathan Bossart <nathandbossart@gmail.com> — 2022-12-29T18:26:57Z

On Thu, Dec 29, 2022 at 12:22:58PM -0500, Tom Lane wrote:
>> Justin Pryzby <pryzby@telsasoft.com> writes:
>>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only})
> VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY})

+1 for only introducing one option.  IMHO UPDATE_DATABASE_STATS fits a
little better since it states the action like most of the other options,
but I think both choices are sufficiently clear.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-29T18:45:49Z

Nathan Bossart <nathandbossart@gmail.com> writes:
> +1 for only introducing one option.  IMHO UPDATE_DATABASE_STATS fits a
> little better since it states the action like most of the other options,
> but I think both choices are sufficiently clear.

Consistency of VACUUM's options seems like a lost cause already :-(.
Between them, DISABLE_PAGE_SKIPPING, SKIP_LOCKED, and PROCESS_TOAST
cover just about the entire set of possibilities for describing a
boolean option --- we couldn't even manage to be consistent about
whether ON or OFF is the default, let alone where the verb is.
And it's hard to argue that FULL, VERBOSE, or PARALLEL is a verb.

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Nathan Bossart <nathandbossart@gmail.com> — 2022-12-29T18:52:14Z

On Wed, Dec 28, 2022 at 07:03:29PM -0800, Andres Freund wrote:
> Separately I wonder if it's worth micro-optimizing vac_update_datfrozenxid() a
> bit. I e.g. see a noticable speedup bypassing systable_getnext() and using
> heap_getnext().  It's really too bad that we want to check for "in the future"
> xids, otherwise we could use a ScanKey to filter at a lower level.

Another thing I'm exploring is looking up the datfrozenxid/datminmxid
before starting the pg_class scan so that the scan can be stopped early if
it sees that we cannot possibly advance the values.  The
overwrite-corrupt-values logic might make this a little more complicated,
but I think it'd be sufficient to force the pg_class scan to complete if we
ever see a value "in the future."  Overwriting the corrupt value might be
delayed, but it would eventually happen once the table ages advance.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-29T20:29:15Z

Nathan Bossart <nathandbossart@gmail.com> writes:
> On Thu, Dec 29, 2022 at 12:22:58PM -0500, Tom Lane wrote:
>> Justin Pryzby <pryzby@telsasoft.com> writes:
>>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only})
>>>> VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY})

> +1 for only introducing one option.  IMHO UPDATE_DATABASE_STATS fits a
> little better since it states the action like most of the other options,
> but I think both choices are sufficiently clear.

I tried to make a patch along these lines, and soon hit a stumbling
block: ONLY is a fully-reserved SQL keyword.  I don't think this
syntax is attractive enough to justify requiring people to
double-quote the option, so we are back to square one.  Anybody
have a different suggestion?

			regards, tom lane

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Nathan Bossart <nathandbossart@gmail.com> — 2022-12-29T21:37:19Z

On Thu, Dec 29, 2022 at 03:29:15PM -0500, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> On Thu, Dec 29, 2022 at 12:22:58PM -0500, Tom Lane wrote:
>>> Justin Pryzby <pryzby@telsasoft.com> writes:
>>>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only})
>>>>> VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY})
> 
>> +1 for only introducing one option.  IMHO UPDATE_DATABASE_STATS fits a
>> little better since it states the action like most of the other options,
>> but I think both choices are sufficiently clear.
> 
> I tried to make a patch along these lines, and soon hit a stumbling
> block: ONLY is a fully-reserved SQL keyword.  I don't think this
> syntax is attractive enough to justify requiring people to
> double-quote the option, so we are back to square one.  Anybody
> have a different suggestion?

Hm.  I thought about using PreventInTransactionBlock() for the function,
but that probably won't work for a few reasons.  AFAICT we'd need to
restrict it to only be callable via "SELECT update_database_stats()", which
feels a bit unnatural.

There was some discussion elsewhere [0] about adding a
PROCESS_MAIN_RELATION option or expanding PROCESS_TOAST to simplify
vacuuming the TOAST table directly.  If such an option existed, you could
call

	VACUUM (PROCESS_MAIN_RELATION FALSE, PROCESS_TOAST FALSE, UPDATE_DATABASE_STATES TRUE) pg_class;

to achieve roughly what we need.  I'll admit this is hacky, though.

So, adding both SKIP_DATABASE_STATS and ONLY_DATABASE_STATS might be the
best bet.

[0] https://postgr.es/m/20221215191246.GA252861%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Tom Lane <tgl@sss.pgh.pa.us> — 2023-01-06T17:53:07Z

Nathan Bossart <nathandbossart@gmail.com> writes:
> On Thu, Dec 29, 2022 at 03:29:15PM -0500, Tom Lane wrote:
>> I tried to make a patch along these lines, and soon hit a stumbling
>> block: ONLY is a fully-reserved SQL keyword.  I don't think this
>> syntax is attractive enough to justify requiring people to
>> double-quote the option, so we are back to square one.  Anybody
>> have a different suggestion?

> ... adding both SKIP_DATABASE_STATS and ONLY_DATABASE_STATS might be the
> best bet.

Nobody has proposed a different bikeshed color, so I'm going to
proceed with that syntax.  I'll incorporate the parallel-machinery
fix from your patch and push to HEAD only (since it's hard to argue
this isn't a new feature).

This needn't foreclose pursuing the various ideas about making
vac_update_datfrozenxid faster; but none of those would eliminate
the fundamental O(N^2) issue AFAICS.

			regards, tom lane