Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add options to control whether VACUUM runs vac_update_datfrozenxid.

  2. Use catalog query to discover tables to process in vacuumdb

  1. BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    PG Bug reporting form <noreply@postgresql.org> — 2022-12-13T10:57:39Z

    The following bug has been logged on the website:
    
    Bug reference:      17717
    Logged by:          Gunnar L
    Email address:      postgresql@taljaren.se
    PostgreSQL version: 15.0
    Operating system:   Ubuntu Linux
    Description:        
    
    We have observed a significant slowdown in vacuumdb performance between
    different versions of postgresql. And possibly also a memory issue.
    
    We run a specific data model, where each customer has its own schema with
    its own set of tables. Each database server hosts 16 databases, each
    containing around 250 customer schemas. Due to postgres creating a new file
    for each database object, we end up with around 5 million files on each
    database server. This may or may not be related to the issue we're seeing
    (new algorithms with new time complexity?)
    
    We upgraded from postgresql 9.5 to postgresql 13, and noticed a significant
    slowdown in how vacuumdb performs. Before, we could run a vacuumdb -a -z
    each night, taking around 2 hours to complete. After the upgrade, we see a
    constant 100% CPU utilization during the vacuumdb process (almost no I/O
    activity), and vacuumdb cannot complete within a reasonable time. We're able
    to vacuum about 3-4 databases each night.
    
    We are able to recreate this issue, using a simple bash script to generate a
    similar setup.
    
    From local testing, here are our findings:
    
    Concerning speed:
    * Version 9.5, 10, 11 are fast  (9.5 slower than 10 and 11)
    * Version 12, 13, 14 are very, very slow
    * Version 15 is faster (a lot faster than 12,13,14) but not nearly as fast
    as 10 or 11.
    
    Concerning memory usage:
    * Version 15 is using a lot more shared memory OR it might not be releasing
    it properly after vacuuming a db.
    
    These are the timings for vacuuming the 16 dbs.
    
    Version   Seconds   Completed
    ------------------------------
    9.5       412       16/16
    10        178       16/16
    11        166       16/16
    12       8319        1/16 or 2/16 (manually aborted)
    13      18853        3/16 or 4/16 (manually aborted)
    14      16857        3/16 or 4/16 (manually aborted)
    15        617        1/16 (crashed!)
    15       4158        6/16 (crashed! --shm-size=256mb)
    15       9500       16/16 (--shm-size=4096mb)
    
    The timing of the only successful run for postgres 15 is somewhat flaky,
    since the machine was suspended for about 1-1.5 hours so 9500 is only an
    estimate, but the first run (1 db completed in 10 minutes) gives that it is
    faster than 12-14 but slower than 10 and 11 (3 minutes to complete
    everything)
    
    
    The following describes our setup
    This is the script (called  setup.sh)  we’re using to populate the databases
    (we give a port number as parameter)
    
    ##### start of setup.sh
    export PGPASSWORD=mysecretpassword
    PORT=$1
    
    echo ""> tables_$PORT.sql
    for schema in `seq -w 1 250`; do
        echo "create schema schema$schema;" >> tables_$PORT.sql
        for table in `seq -w 1 500`; do
            echo "create table schema$schema.table$table (id int);" >>
    tables_$PORT.sql
        done
    done
    
    echo "Setting up db: 01"
    createdb -h localhost -U postgres -p $PORT  db01
    psql -q -h localhost -U postgres -p $PORT db01 -f tables_$PORT.sql
    
    # This seems to be the fastest way to create the databases
    for db in `seq -w 2 16`; do
        echo "Setting up db: $db"
        createdb -h localhost -U postgres -p $PORT --template db01 db$db
    done
    ####### end of setup.sh
    
    
    
    To execute a test for a particular postgres version (in this example PG
    9.5), we run the following. It will setup PG 9.5 on port 15432.
    
    docker run --rm --name pg95 -e POSTGRES_PASSWORD=mysecretpassword -p
    15432:5432 -d postgres:9.5
    ./setup.sh 15432
    date; time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"; date
    
    (The date commands are added to keep track of when tasks were started).
    
    
    
    
    
    Here are complete set of commands and output and  comments 
    (We use different ports for different versions of PG)
    
    date; time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"; date
    (The date commands since it takes some time to run)
    
    
    
    
    
    
    
    
    
    time docker exec -it pg95 bash -c "vacuumdb -a -z -U postgres"
    vacuumdb: vacuuming database "db01"
    …<snip>...
    vacuumdb: vacuuming database "db16"
    vacuumdb: vacuuming database "postgres"
    vacuumdb: vacuuming database "template1"
    
    real	6m52,070s
    user	0m0,048s
    sys	0m0,029s
    
    
    time docker exec -it pg10 bash -c "vacuumdb -a -z -U postgres"
    vacuumdb: vacuuming database "db01"
    …<snip>...
    vacuumdb: vacuuming database "db16"
    vacuumdb: vacuuming database "postgres"
    vacuumdb: vacuuming database "template1"
    
    real	2m58,354s
    user	0m0,043s
    sys	0m0,013s
    
    
    
    
    
    time docker exec -it pg11 bash -c "vacuumdb -a -z -U postgres"
    vacuumdb: vacuuming database "db01"
    …<snip>...
    vacuumdb: vacuuming database "db16"
    vacuumdb: vacuuming database "postgres"
    vacuumdb: vacuuming database "template1"
    
    real	2m46,181s
    user	0m0,047s
    sys	0m0,012s
    
    
    
    
    date; time docker exec -it pg12 bash -c "vacuumdb -a -z -U postgres"; date
    lör 10 dec 2022 18:57:43 CET
    vacuumdb: vacuuming database "db01"
    vacuumdb: vacuuming database "db02"
    ^CCancel request sent
    vacuumdb: error: vacuuming of table "schema241.table177" in database "db02"
    failed: ERROR:  canceling statement due to user request
    
    real	138m39,600s
    user	0m0,177s
    sys	0m0,418s
    lör 10 dec 2022 21:16:22 CET
    
    
    
    
    date;time docker exec -it pg13 bash -c "vacuumdb -a -z -U postgres"
    lör 10 dec 2022 07:22:32 CET
    vacuumdb: vacuuming database "db01"
    vacuumdb: vacuuming database "db02"
    vacuumdb: vacuuming database "db03"
    vacuumdb: vacuuming database "db04"
    ^CCancel request sent
    
    real	314m13,172s
    user	0m0,551s
    sys	0m0,663s
    lör 10 dec 2022 12:37:03 CET
    
    
    
    date;time docker exec -it pg14 bash -c "vacuumdb -a -z -U postgres"; date
    lör 10 dec 2022 14:15:37 CET
    vacuumdb: vacuuming database "db01"
    vacuumdb: vacuuming database "db02"
    vacuumdb: vacuuming database "db03"
    vacuumdb: vacuuming database "db04"
    ^CCancel request sent
    
    real	280m57,172s
    user	0m0,586s
    sys	0m0,559s
    lör 10 dec 2022 18:56:34 CET
    
    
    
    date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date
    lör 10 dec 2022 12:50:25 CET
    vacuumdb: vacuuming database "db01"
    vacuumdb: vacuuming database "db02"
    vacuumdb: error: processing of database "db02" failed: ERROR:  could not
    resize shared memory segment "/PostgreSQL.2952321776" to 27894720 bytes: No
    space left on device
    
    real	10m17,913s
    user	0m0,030s
    sys	0m0,049s
    lör 10 dec 2022 13:00:43 CET
    
    # it was faster, but we need  to extend shared memory to make it work
    
    
    docker run --rm --name pg15 --shm-size=256mb -e
    POSTGRES_PASSWORD=mysecretpassword -p 55555:5432 -d postgres:15
    
    date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date
    mån 12 dec 2022 08:56:17 CET
    vacuumdb: vacuuming database "db01"
    …<snip>...
    vacuumdb: vacuuming database "db07"
    vacuumdb: error: processing of database "db07" failed: ERROR:  could not
    resize shared memory segment "/PostgreSQL.1003084622" to 27894720 bytes: No
    space left on device
    
    real	69m18,345s
    user	0m0,217s
    sys	0m0,086s
    mån 12 dec 2022 10:05:36 CET
    
    
    
    docker run --rm --name pg15 --shm-size=4096mb -e
    POSTGRES_PASSWORD=mysecretpassword -p 55555:5432 -d postgres:15
    
    date;time docker exec -it pg15 bash -c "vacuumdb -a -z -U postgres"; date
    mån 12 dec 2022 11:16:11 CET
    vacuumdb: vacuuming database "db01"
    …<snip>...
    vacuumdb: vacuuming database "db16"
    vacuumdb: vacuuming database "postgres"
    vacuumdb: vacuuming database "template1"
    
    real	232m46,168s
    user	0m0,227s
    sys	0m0,467s
    mån 12 dec 2022 15:08:57 CET
    
    
    
    Here is the hardware that was used
    AMD Ryzen 7 PRO 5850U with Radeon Graphics
    8 Cores, 16 threads
    
    $ free
                   total        used        free      shared  buff/cache  
    available
    Mem:        28562376     5549716      752624     1088488    22260036   
    21499752
    Swap:         999420      325792      673628
    
    Disk:	NVMe device, Samsung SSD 980 1TB
    
    
  2. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-15T18:56:30Z

    PG Bug reporting form <noreply@postgresql.org> writes:
    > We run a specific data model, where each customer has its own schema with
    > its own set of tables. Each database server hosts 16 databases, each
    > containing around 250 customer schemas. Due to postgres creating a new file
    > for each database object, we end up with around 5 million files on each
    > database server. This may or may not be related to the issue we're seeing
    > (new algorithms with new time complexity?)
    
    > We upgraded from postgresql 9.5 to postgresql 13, and noticed a significant
    > slowdown in how vacuumdb performs. Before, we could run a vacuumdb -a -z
    > each night, taking around 2 hours to complete. After the upgrade, we see a
    > constant 100% CPU utilization during the vacuumdb process (almost no I/O
    > activity), and vacuumdb cannot complete within a reasonable time. We're able
    > to vacuum about 3-4 databases each night.
    
    I poked into this a little bit.  On HEAD, watching things with "perf"
    identifies vac_update_datfrozenxid() as the main time sink.  It's not
    hard to see why: that does a seqscan of pg_class, and it's invoked
    at the end of each vacuum() call.  So if you try to vacuum each table
    in the DB separately, you're going to end up spending O(N^2) time
    in often-useless rescans of pg_class.  This isn't a huge problem in
    ordinary-sized DBs, but with 125000 small tables in the DB it becomes
    the dominant cost.
    
    > Concerning speed:
    > * Version 9.5, 10, 11 are fast  (9.5 slower than 10 and 11)
    > * Version 12, 13, 14 are very, very slow
    > * Version 15 is faster (a lot faster than 12,13,14) but not nearly as fast
    > as 10 or 11.
    
    The reason for the v12 performance change is that up through v11,
    "vacuumdb -a -z" would just issue "VACUUM (ANALYZE);" in each DB.
    So vac_update_datfrozenxid only ran once.  Beginning in v12 (commit
    e0c2933a7), vacuumdb issues a separate VACUUM command for each
    targeted table, which causes the problem.
    
    I'm not sure why there's a performance delta from 14 to 15.
    It doesn't look like vacuumdb itself had any material changes,
    so we must have done something different on the backend side.
    This may indicate that there's another O(N^2) behavior that
    we got rid of in v15.  Anyway, that change isn't bad, so I did
    not poke into it too much.
    
    Conclusions:
    
    * As a short-term fix, you could try using vacuumdb from v11
    with the newer servers.  Or just do "psql -c 'vacuum analyze'"
    and not bother with vacuumdb at all.  (On HEAD, with this
    example database, 'vacuum analyze' takes about 7 seconds per DB
    for me, versus ~10 minutes using vacuumdb.)
    
    * To fix vacuumdb properly, it might be enough to get it to
    batch VACUUMs, say by naming up to 1000 tables per command
    instead of just one.  I'm not sure how that would interact
    with its parallelization logic, though.  It's not really
    solving the O(N^2) issue either, just pushing it further out.
    
    * A better idea, though sadly not very back-patchable, could
    be to expose a VACUUM option to control whether it runs
    vac_update_datfrozenxid, so that vacuumdb can do that just
    once at the end.  Considering that vac_update_datfrozenxid
    requires an exclusive lock, the current behavior is poison for
    parallel vacuuming quite aside from the O(N^2) issue.  This
    might tie into some work Peter G. has been pursuing, too.
    
    			regards, tom lane
    
    
    
    
  3. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Peter Geoghegan <pg@bowt.ie> — 2022-12-15T20:06:57Z

    On Thu, Dec 15, 2022 at 10:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > * A better idea, though sadly not very back-patchable, could
    > be to expose a VACUUM option to control whether it runs
    > vac_update_datfrozenxid, so that vacuumdb can do that just
    > once at the end.  Considering that vac_update_datfrozenxid
    > requires an exclusive lock, the current behavior is poison for
    > parallel vacuuming quite aside from the O(N^2) issue.  This
    > might tie into some work Peter G. has been pursuing, too.
    
    That sounds like a good idea to me. But do we actually need a VACUUM
    option for this? I wonder if we could get away with having the VACUUM
    command never call vac_update_datfrozenxid(), except when run in
    single-user mode. It would be nice to make pg_xact/clog truncation
    autovacuum's responsibility.
    
    Autovacuum already does things differently to the VACUUM command, and
    for reasons that seem related to this complaint about vacuumdb.
    Besides, autovacuum is already on the hook to call
    vac_update_datfrozenxid() for the benefit of databases that haven't
    actually been vacuumed, per the do_autovacuum() comments right above
    its vac_update_datfrozenxid() call.
    
    -- 
    Peter Geoghegan
    
    
    
    
  4. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-15T21:57:16Z

    Peter Geoghegan <pg@bowt.ie> writes:
    > On Thu, Dec 15, 2022 at 10:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> * A better idea, though sadly not very back-patchable, could
    >> be to expose a VACUUM option to control whether it runs
    >> vac_update_datfrozenxid, so that vacuumdb can do that just
    >> once at the end.  Considering that vac_update_datfrozenxid
    >> requires an exclusive lock, the current behavior is poison for
    >> parallel vacuuming quite aside from the O(N^2) issue.  This
    >> might tie into some work Peter G. has been pursuing, too.
    
    > That sounds like a good idea to me. But do we actually need a VACUUM
    > option for this? I wonder if we could get away with having the VACUUM
    > command never call vac_update_datfrozenxid(), except when run in
    > single-user mode. It would be nice to make pg_xact/clog truncation
    > autovacuum's responsibility.
    
    I could get behind manual VACUUM not invoking vac_update_datfrozenxid
    by default, perhaps.  But if it can never call it, then that is a
    fairly important bit of housekeeping that is unreachable except by
    autovacuum.  No doubt the people who turn off autovacuum are benighted,
    but they're still out there.
    
    Could we get somewhere by saying that manual VACUUM calls
    vac_update_datfrozenxid only if it's a full-DB vacuum (ie, no table
    was specified)?  That would fix the problem at hand.  However, it'd
    mean (since v12) that a vacuumdb run never calls vac_update_datfrozenxid
    at all, which would result in horrible problems for any poor sods
    who think that a cronjob running "vacuumdb -a" is an adequate substitute
    for autovacuum.
    
    Or maybe we could modify things so that "autovacuum = off" doesn't prevent
    occasional cycles of vac_update_datfrozenxid-and-nothing-else?
    
    In the end I feel like a manual way to call vac_update_datfrozenxid
    would be the least magical way of running this.
    
    			regards, tom lane
    
    
    
    
  5. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Peter Geoghegan <pg@bowt.ie> — 2022-12-16T04:39:54Z

    On Thu, Dec 15, 2022 at 1:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > I could get behind manual VACUUM not invoking vac_update_datfrozenxid
    > by default, perhaps.  But if it can never call it, then that is a
    > fairly important bit of housekeeping that is unreachable except by
    > autovacuum.  No doubt the people who turn off autovacuum are benighted,
    > but they're still out there.
    
    I wouldn't mind adding another option for this to VACUUM. We already
    have a couple of VACUUM options that are only really needed as escape
    hatches, or perhaps as testing tools used by individual Postgres
    hackers. Another one doesn't seem too bad. The VACUUM command should
    eventually become totally niche, so I'm not too concerned about going
    overboard here.
    
    > Could we get somewhere by saying that manual VACUUM calls
    > vac_update_datfrozenxid only if it's a full-DB vacuum (ie, no table
    > was specified)?  That would fix the problem at hand.
    
    That definitely seems reasonable.
    
    > Or maybe we could modify things so that "autovacuum = off" doesn't prevent
    > occasional cycles of vac_update_datfrozenxid-and-nothing-else?
    
    That's what I was thinking of. It seems like a more natural approach
    to me, at least offhand.
    
    I have to imagine that the vast majority of individual calls to
    vac_update_datfrozenxid have just about zero chance of updating
    datfrozenxid or datminmxid as things stand. There is bound to be some
    number of completely static tables in every database (maybe just
    system catalogs). Those static tables are bound to be the tables that
    hold back datfrozenxid/datminmxid approximately all the time. To me
    this suggests that vac_update_datfrozenxid should fully own the fact
    that it's supposed to be called out of band, possibly only in
    autovacuum.
    
    Separately, I wonder if it would make sense to invent a new fast-path
    for the VACUUM command that is designed to inexpensively determine
    that it cannot possibly matter if vac_update_datfrozenxid is never
    called, given the specifics (the details of the target rel and its
    TOAST rel).
    
    -- 
    Peter Geoghegan
    
    
    
    
  6. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-16T14:49:09Z

    Peter Geoghegan <pg@bowt.ie> writes:
    > On Thu, Dec 15, 2022 at 1:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> Or maybe we could modify things so that "autovacuum = off" doesn't prevent
    >> occasional cycles of vac_update_datfrozenxid-and-nothing-else?
    
    > That's what I was thinking of. It seems like a more natural approach
    > to me, at least offhand.
    
    Seems worth looking into.  But I suppose the launch frequency would
    have to be more often than the current behavior for autovacuum = off,
    so it would complicate the logic in that area.
    
    > I have to imagine that the vast majority of individual calls to
    > vac_update_datfrozenxid have just about zero chance of updating
    > datfrozenxid or datminmxid as things stand.
    
    That is a really good point.  How about teaching VACUUM to track
    the oldest original relfrozenxid and relminmxid among the table(s)
    it processed, and skip vac_update_datfrozenxid unless at least one
    of those matches the database's values?  For extra credit, also
    skip if we didn't successfully advance the source rel's value.
    
    This might lead to a fix that solves the OP's problem while not
    changing anything fundamental, which would make it reasonable
    to back-patch.
    
    			regards, tom lane
    
    
    
    
  7. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Peter Geoghegan <pg@bowt.ie> — 2022-12-16T18:47:07Z

    On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > I have to imagine that the vast majority of individual calls to
    > > vac_update_datfrozenxid have just about zero chance of updating
    > > datfrozenxid or datminmxid as things stand.
    >
    > That is a really good point.  How about teaching VACUUM to track
    > the oldest original relfrozenxid and relminmxid among the table(s)
    > it processed, and skip vac_update_datfrozenxid unless at least one
    > of those matches the database's values?  For extra credit, also
    > skip if we didn't successfully advance the source rel's value.
    
    Hmm. I think that that would probably work.
    
    It would certainly work on 15+, because there tends to be "natural
    diversity" among the relfrozenxid values seen for each table, due to
    the "track oldest extant XID" work; we no longer see many tables that
    all have the same relfrozenxid, that advance in lockstep. But even
    that factor probably doesn't matter, since we only need one "laggard
    relfrozenxid" static table for the scheme to work and work well. That
    is probably a safe bet on all versions, though I'd have to check to be
    sure.
    
    > This might lead to a fix that solves the OP's problem while not
    > changing anything fundamental, which would make it reasonable
    > to back-patch.
    
    That's a big plus. This is a nasty regression. I wouldn't call it a
    must-fix, but it's bad enough to be worth fixing if we can come up
    with a reasonably non-invasive approach.
    
    -- 
    Peter Geoghegan
    
    
    
    
  8. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-16T20:33:33Z

    Peter Geoghegan <pg@bowt.ie> writes:
    > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> That is a really good point.  How about teaching VACUUM to track
    >> the oldest original relfrozenxid and relminmxid among the table(s)
    >> it processed, and skip vac_update_datfrozenxid unless at least one
    >> of those matches the database's values?  For extra credit, also
    >> skip if we didn't successfully advance the source rel's value.
    
    > Hmm. I think that that would probably work.
    
    > It would certainly work on 15+, because there tends to be "natural
    > diversity" among the relfrozenxid values seen for each table, due to
    > the "track oldest extant XID" work; we no longer see many tables that
    > all have the same relfrozenxid, that advance in lockstep. But even
    > that factor probably doesn't matter, since we only need one "laggard
    > relfrozenxid" static table for the scheme to work and work well. That
    > is probably a safe bet on all versions, though I'd have to check to be
    > sure.
    
    Oh, I see your point: if a whole lot of tables have the same relfrozenxid
    and it matches datfrozenxid, this won't help.  Still, we can hope that
    that's an uncommon situation.  If we postulate somebody trying to use
    scheduled "vacuumdb -z" in place of autovacuum, they shouldn't really have
    that situation.  Successively vacuuming many tables should normally
    result in the tables' relfrozenxids not being all the same, unless they
    were unlucky enough to have a very long-running transaction holding back
    the global xmin horizon the whole time.
    
    			regards, tom lane
    
    
    
    
  9. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Michael Paquier <michael@paquier.xyz> — 2022-12-18T02:21:47Z

    On Thu, Dec 15, 2022 at 01:56:30PM -0500, Tom Lane wrote:
    > * To fix vacuumdb properly, it might be enough to get it to
    > batch VACUUMs, say by naming up to 1000 tables per command
    > instead of just one.  I'm not sure how that would interact
    > with its parallelization logic, though.  It's not really
    > solving the O(N^2) issue either, just pushing it further out.
    
    I have been thinking about this part, and using a hardcoded rule for
    the batches would be tricky.  The list of relations returned by the
    scan of pg_class are ordered by relpages, so depending on the
    distribution of the sizes (few tables with a large size and a lot of
    table with small sizes, exponential distribution of table sizes), we
    may finish with more downsides than upsides in some cases, even if we
    use a linear rule based on the number of relations, or even if we
    distribute the relations across the slots in a round robin fashion for
    example.
    
    In order to control all that, rather than a hardcoded rule, could it
    be as simple as introducing an option like vacuumdb --batch=N
    defaulting to 1 to let users control the number of relations grouped
    in a single command with a round robin distribution for each slot?
    --
    Michael
    
  10. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Christophe Pettus <xof@thebuild.com> — 2022-12-18T02:23:27Z

    
    > In order to control all that, rather than a hardcoded rule, could it
    > be as simple as introducing an option like vacuumdb --batch=N
    > defaulting to 1 to let users control the number of relations grouped
    > in a single command with a round robin distribution for each slot?
    
    My first reaction to that is: Is it possible to explain to a DBA what N should be for a particular cluster?
    
    
    
  11. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Nathan Bossart <nathandbossart@gmail.com> — 2022-12-18T23:55:00Z

    On Thu, Dec 15, 2022 at 08:39:54PM -0800, Peter Geoghegan wrote:
    > On Thu, Dec 15, 2022 at 1:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> I could get behind manual VACUUM not invoking vac_update_datfrozenxid
    >> by default, perhaps.  But if it can never call it, then that is a
    >> fairly important bit of housekeeping that is unreachable except by
    >> autovacuum.  No doubt the people who turn off autovacuum are benighted,
    >> but they're still out there.
    > 
    > I wouldn't mind adding another option for this to VACUUM. We already
    > have a couple of VACUUM options that are only really needed as escape
    > hatches, or perhaps as testing tools used by individual Postgres
    > hackers. Another one doesn't seem too bad. The VACUUM command should
    > eventually become totally niche, so I'm not too concerned about going
    > overboard here.
    
    Perhaps there could also be an update-datfrozenxid function that vacuumdb
    calls when finished with a database.  Even if vacuum becomes smarter about
    calling vac_update_datfrozenxid, this might still be worth doing.
    
    -- 
    Nathan Bossart
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  12. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Michael Paquier <michael@paquier.xyz> — 2022-12-19T03:21:26Z

    On Sat, Dec 17, 2022 at 06:23:27PM -0800, Christophe Pettus wrote:
    > My first reaction to that is: Is it possible to explain to a DBA
    > what N should be for a particular cluster?
    
    Assuming that we can come up with a rather straight-forward still
    portable rule for the distribution of the relations across of the
    slots like something I mentioned above (which is not the best thing
    depending on the sizes and the number of tables), that would be quite
    tricky IMO.
    --
    Michael
    
  13. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-28T20:13:23Z

    [ redirecting to -hackers because patch attached ]
    
    Peter Geoghegan <pg@bowt.ie> writes:
    > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> That is a really good point.  How about teaching VACUUM to track
    >> the oldest original relfrozenxid and relminmxid among the table(s)
    >> it processed, and skip vac_update_datfrozenxid unless at least one
    >> of those matches the database's values?  For extra credit, also
    >> skip if we didn't successfully advance the source rel's value.
    
    > Hmm. I think that that would probably work.
    
    I poked into that idea some more and concluded that getting VACUUM to
    manage it behind the user's back is not going to work very reliably.
    The key problem is explained by this existing comment in autovacuum.c:
    
         * Even if we didn't vacuum anything, it may still be important to do
         * this, because one indirect effect of vac_update_datfrozenxid() is to
         * update ShmemVariableCache->xidVacLimit.  That might need to be done
         * even if we haven't vacuumed anything, because relations with older
         * relfrozenxid values or other databases with older datfrozenxid values
         * might have been dropped, allowing xidVacLimit to advance.
    
    That is, if the table that's holding back datfrozenxid gets dropped
    between VACUUM runs, VACUUM would never think that it might have
    advanced the global minimum.
    
    I'm forced to the conclusion that we have to expose some VACUUM
    options if we want this to work well.  Attached is a draft patch
    that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options
    (name bikeshedding welcome) and teaches vacuumdb to use them.
    
    Light testing says that this is a win: even on the regression
    database, which isn't all that big, I see a drop in vacuumdb's
    runtime from ~260 ms to ~175 ms.  Of course this is a case where
    VACUUM doesn't really have anything to do, so it's a best-case
    scenario ... but still, I was expecting the effect to be barely
    above noise with this many tables, yet it's a good bit more.
    
    			regards, tom lane
    
    
  14. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Nathan Bossart <nathandbossart@gmail.com> — 2022-12-28T21:12:53Z

    On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
    > I'm forced to the conclusion that we have to expose some VACUUM
    > options if we want this to work well.  Attached is a draft patch
    > that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options
    > (name bikeshedding welcome) and teaches vacuumdb to use them.
    
    This is the conclusion I arrived at, too.  In fact, I was just about to
    post a similar patch set.  I'm attaching it here anyway, but I'm fine with
    proceeding with your version.
    
    I think the main difference between your patch and mine is that I've
    exposed vac_update_datfrozenxid() via a function instead of a VACUUM
    option.  IMHO that feels a little more natural, but I can't say I feel too
    strongly about it.
    
    -- 
    Nathan Bossart
    Amazon Web Services: https://aws.amazon.com
    
  15. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-28T21:20:19Z

    Nathan Bossart <nathandbossart@gmail.com> writes:
    > I think the main difference between your patch and mine is that I've
    > exposed vac_update_datfrozenxid() via a function instead of a VACUUM
    > option.  IMHO that feels a little more natural, but I can't say I feel too
    > strongly about it.
    
    I thought about that but it seems fairly unsafe, because that means
    that vac_update_datfrozenxid is executing inside a user-controlled
    transaction.  I don't think it will hurt us if the user does a
    ROLLBACK afterward --- but if he sits on the open transaction,
    that would be bad, if only because we're still holding the
    LockDatabaseFrozenIds lock which will block other VACUUMs.
    There might be more hazards besides that; certainly no one has ever
    tried to run vac_update_datfrozenxid that way before.
    
    			regards, tom lane
    
    
    
    
  16. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Nathan Bossart <nathandbossart@gmail.com> — 2022-12-28T21:21:50Z

    On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
    > +	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
    > +	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE && !failed)
    > +	{
    > +		executeCommand(conn, "VACUUM (ONLY_DATABASE_STATS);", echo);
    > +	}
    
    When I looked at this, I thought it would be better to send the command
    through the parallel slot machinery so that failures would use the same
    code path as the rest of the VACUUM commands.  However, you also need to
    adjust ParallelSlotsWaitCompletion() to mark the slots as idle so that the
    slot array can be reused after it is called.
    
    -- 
    Nathan Bossart
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  17. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Nathan Bossart <nathandbossart@gmail.com> — 2022-12-28T21:23:21Z

    On Wed, Dec 28, 2022 at 04:20:19PM -0500, Tom Lane wrote:
    > Nathan Bossart <nathandbossart@gmail.com> writes:
    >> I think the main difference between your patch and mine is that I've
    >> exposed vac_update_datfrozenxid() via a function instead of a VACUUM
    >> option.  IMHO that feels a little more natural, but I can't say I feel too
    >> strongly about it.
    > 
    > I thought about that but it seems fairly unsafe, because that means
    > that vac_update_datfrozenxid is executing inside a user-controlled
    > transaction.  I don't think it will hurt us if the user does a
    > ROLLBACK afterward --- but if he sits on the open transaction,
    > that would be bad, if only because we're still holding the
    > LockDatabaseFrozenIds lock which will block other VACUUMs.
    > There might be more hazards besides that; certainly no one has ever
    > tried to run vac_update_datfrozenxid that way before.
    
    That's a good point.
    
    -- 
    Nathan Bossart
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  18. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-28T22:05:39Z

    Nathan Bossart <nathandbossart@gmail.com> writes:
    > On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
    >> +		executeCommand(conn, "VACUUM (ONLY_DATABASE_STATS);", echo);
    
    > When I looked at this, I thought it would be better to send the command
    > through the parallel slot machinery so that failures would use the same
    > code path as the rest of the VACUUM commands.  However, you also need to
    > adjust ParallelSlotsWaitCompletion() to mark the slots as idle so that the
    > slot array can be reused after it is called.
    
    Hm.  I was just copying the way commands are issued further up in the
    same function.  But I think you're right: once we've done
    
    	ParallelSlotsAdoptConn(sa, conn);
    
    it's probably not entirely kosher to use the conn directly.
    
    			regards, tom lane
    
    
    
    
  19. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Justin Pryzby <pryzby@telsasoft.com> — 2022-12-29T01:23:54Z

    On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
    > Peter Geoghegan <pg@bowt.ie> writes:
    > > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > >> That is a really good point.  How about teaching VACUUM to track
    > >> the oldest original relfrozenxid and relminmxid among the table(s)
    > >> it processed, and skip vac_update_datfrozenxid unless at least one
    > >> of those matches the database's values?  For extra credit, also
    > >> skip if we didn't successfully advance the source rel's value.
    > 
    > > Hmm. I think that that would probably work.
    > 
    > I'm forced to the conclusion that we have to expose some VACUUM
    > options if we want this to work well.  Attached is a draft patch
    > that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options
    > (name bikeshedding welcome) and teaches vacuumdb to use them.
    
    I was surprised to hear that this added *two* options.
    
    I assumed it would look like:
    
    VACUUM (UPDATE_DATABASE_STATS {yes,no,only})
    
    -- 
    Justin
    
    
    
    
  20. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Justin Pryzby <pryzby@telsasoft.com> — 2022-12-29T01:29:10Z

    On Sun, Dec 18, 2022 at 11:21:47AM +0900, Michael Paquier wrote:
    > On Thu, Dec 15, 2022 at 01:56:30PM -0500, Tom Lane wrote:
    > > * To fix vacuumdb properly, it might be enough to get it to
    > > batch VACUUMs, say by naming up to 1000 tables per command
    > > instead of just one.  I'm not sure how that would interact
    > > with its parallelization logic, though.  It's not really
    > > solving the O(N^2) issue either, just pushing it further out.
    > 
    > I have been thinking about this part, and using a hardcoded rule for
    > the batches would be tricky.  The list of relations returned by the
    > scan of pg_class are ordered by relpages, so depending on the
    > distribution of the sizes (few tables with a large size and a lot of
    > table with small sizes, exponential distribution of table sizes), we
    > may finish with more downsides than upsides in some cases, even if we
    > use a linear rule based on the number of relations, or even if we
    > distribute the relations across the slots in a round robin fashion for
    > example.
    
    I've always found it weird that it uses "ORDER BY relpages".
    
    I'd prefer if it could ORDER BY age(relfrozenxid) or
    GREATEST(age(relfrozenxid), age(relminmxid)), at least if you specify
    one of the --min-*age parms.  Or something less hardcoded and
    unconfigurable.
    
    -- 
    Justin
    
    
    
    
  21. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-29T02:17:54Z

    Justin Pryzby <pryzby@telsasoft.com> writes:
    > On Wed, Dec 28, 2022 at 03:13:23PM -0500, Tom Lane wrote:
    >> I'm forced to the conclusion that we have to expose some VACUUM
    >> options if we want this to work well.  Attached is a draft patch
    >> that invents SKIP_DATABASE_STATS and ONLY_DATABASE_STATS options
    >> (name bikeshedding welcome) and teaches vacuumdb to use them.
    
    > I assumed it would look like:
    > VACUUM (UPDATE_DATABASE_STATS {yes,no,only})
    
    Meh.  We could do it like that, but I think options that look like
    booleans but aren't are messy.
    
    			regards, tom lane
    
    
    
    
  22. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Andres Freund <andres@anarazel.de> — 2022-12-29T03:03:29Z

    Hi,
    
    On 2022-12-28 15:13:23 -0500, Tom Lane wrote:
    > [ redirecting to -hackers because patch attached ]
    > 
    > Peter Geoghegan <pg@bowt.ie> writes:
    > > On Fri, Dec 16, 2022 at 6:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > >> That is a really good point.  How about teaching VACUUM to track
    > >> the oldest original relfrozenxid and relminmxid among the table(s)
    > >> it processed, and skip vac_update_datfrozenxid unless at least one
    > >> of those matches the database's values?  For extra credit, also
    > >> skip if we didn't successfully advance the source rel's value.
    > 
    > > Hmm. I think that that would probably work.
    > 
    > I poked into that idea some more and concluded that getting VACUUM to
    > manage it behind the user's back is not going to work very reliably.
    > The key problem is explained by this existing comment in autovacuum.c:
    > 
    >      * Even if we didn't vacuum anything, it may still be important to do
    >      * this, because one indirect effect of vac_update_datfrozenxid() is to
    >      * update ShmemVariableCache->xidVacLimit.  That might need to be done
    >      * even if we haven't vacuumed anything, because relations with older
    >      * relfrozenxid values or other databases with older datfrozenxid values
    >      * might have been dropped, allowing xidVacLimit to advance.
    > 
    > That is, if the table that's holding back datfrozenxid gets dropped
    > between VACUUM runs, VACUUM would never think that it might have
    > advanced the global minimum.
    
    I wonder if a less aggressive version of this idea might still work. Perhaps
    we could use ShmemVariableCache->latestCompletedXid or
    ShmemVariableCache->nextXid to skip at least some updates?
    
    Obviously this isn't going to help if there's a lot of concurrent activity,
    but the case of just running vacuumdb -a might be substantially improved.
    
    
    Separately I wonder if it's worth micro-optimizing vac_update_datfrozenxid() a
    bit. I e.g. see a noticable speedup bypassing systable_getnext() and using
    heap_getnext().  It's really too bad that we want to check for "in the future"
    xids, otherwise we could use a ScanKey to filter at a lower level.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  23. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-29T17:22:58Z

    I wrote:
    > Justin Pryzby <pryzby@telsasoft.com> writes:
    >> I assumed it would look like:
    >> VACUUM (UPDATE_DATABASE_STATS {yes,no,only})
    
    > Meh.  We could do it like that, but I think options that look like
    > booleans but aren't are messy.
    
    Note that I'm not necessarily objecting to there being just one option,
    only to its values being a superset-of-boolean.  Perhaps this'd work:
    
    VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY})
    
    			regards, tom lane
    
    
    
    
  24. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Nathan Bossart <nathandbossart@gmail.com> — 2022-12-29T18:26:57Z

    On Thu, Dec 29, 2022 at 12:22:58PM -0500, Tom Lane wrote:
    >> Justin Pryzby <pryzby@telsasoft.com> writes:
    >>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only})
    > VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY})
    
    +1 for only introducing one option.  IMHO UPDATE_DATABASE_STATS fits a
    little better since it states the action like most of the other options,
    but I think both choices are sufficiently clear.
    
    -- 
    Nathan Bossart
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  25. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-29T18:45:49Z

    Nathan Bossart <nathandbossart@gmail.com> writes:
    > +1 for only introducing one option.  IMHO UPDATE_DATABASE_STATS fits a
    > little better since it states the action like most of the other options,
    > but I think both choices are sufficiently clear.
    
    Consistency of VACUUM's options seems like a lost cause already :-(.
    Between them, DISABLE_PAGE_SKIPPING, SKIP_LOCKED, and PROCESS_TOAST
    cover just about the entire set of possibilities for describing a
    boolean option --- we couldn't even manage to be consistent about
    whether ON or OFF is the default, let alone where the verb is.
    And it's hard to argue that FULL, VERBOSE, or PARALLEL is a verb.
    
    			regards, tom lane
    
    
    
    
  26. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Nathan Bossart <nathandbossart@gmail.com> — 2022-12-29T18:52:14Z

    On Wed, Dec 28, 2022 at 07:03:29PM -0800, Andres Freund wrote:
    > Separately I wonder if it's worth micro-optimizing vac_update_datfrozenxid() a
    > bit. I e.g. see a noticable speedup bypassing systable_getnext() and using
    > heap_getnext().  It's really too bad that we want to check for "in the future"
    > xids, otherwise we could use a ScanKey to filter at a lower level.
    
    Another thing I'm exploring is looking up the datfrozenxid/datminmxid
    before starting the pg_class scan so that the scan can be stopped early if
    it sees that we cannot possibly advance the values.  The
    overwrite-corrupt-values logic might make this a little more complicated,
    but I think it'd be sufficient to force the pg_class scan to complete if we
    ever see a value "in the future."  Overwriting the corrupt value might be
    delayed, but it would eventually happen once the table ages advance.
    
    -- 
    Nathan Bossart
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  27. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2022-12-29T20:29:15Z

    Nathan Bossart <nathandbossart@gmail.com> writes:
    > On Thu, Dec 29, 2022 at 12:22:58PM -0500, Tom Lane wrote:
    >> Justin Pryzby <pryzby@telsasoft.com> writes:
    >>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only})
    >>>> VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY})
    
    > +1 for only introducing one option.  IMHO UPDATE_DATABASE_STATS fits a
    > little better since it states the action like most of the other options,
    > but I think both choices are sufficiently clear.
    
    I tried to make a patch along these lines, and soon hit a stumbling
    block: ONLY is a fully-reserved SQL keyword.  I don't think this
    syntax is attractive enough to justify requiring people to
    double-quote the option, so we are back to square one.  Anybody
    have a different suggestion?
    
    			regards, tom lane
    
    
    
    
  28. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Nathan Bossart <nathandbossart@gmail.com> — 2022-12-29T21:37:19Z

    On Thu, Dec 29, 2022 at 03:29:15PM -0500, Tom Lane wrote:
    > Nathan Bossart <nathandbossart@gmail.com> writes:
    >> On Thu, Dec 29, 2022 at 12:22:58PM -0500, Tom Lane wrote:
    >>> Justin Pryzby <pryzby@telsasoft.com> writes:
    >>>> VACUUM (UPDATE_DATABASE_STATS {yes,no,only})
    >>>>> VACUUM (DATABASE_STATS {UPDATE,SKIP,ONLY})
    > 
    >> +1 for only introducing one option.  IMHO UPDATE_DATABASE_STATS fits a
    >> little better since it states the action like most of the other options,
    >> but I think both choices are sufficiently clear.
    > 
    > I tried to make a patch along these lines, and soon hit a stumbling
    > block: ONLY is a fully-reserved SQL keyword.  I don't think this
    > syntax is attractive enough to justify requiring people to
    > double-quote the option, so we are back to square one.  Anybody
    > have a different suggestion?
    
    Hm.  I thought about using PreventInTransactionBlock() for the function,
    but that probably won't work for a few reasons.  AFAICT we'd need to
    restrict it to only be callable via "SELECT update_database_stats()", which
    feels a bit unnatural.
    
    There was some discussion elsewhere [0] about adding a
    PROCESS_MAIN_RELATION option or expanding PROCESS_TOAST to simplify
    vacuuming the TOAST table directly.  If such an option existed, you could
    call
    
    	VACUUM (PROCESS_MAIN_RELATION FALSE, PROCESS_TOAST FALSE, UPDATE_DATABASE_STATES TRUE) pg_class;
    
    to achieve roughly what we need.  I'll admit this is hacky, though.
    
    So, adding both SKIP_DATABASE_STATS and ONLY_DATABASE_STATS might be the
    best bet.
    
    [0] https://postgr.es/m/20221215191246.GA252861%40nathanxps13
    
    -- 
    Nathan Bossart
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  29. Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

    Tom Lane <tgl@sss.pgh.pa.us> — 2023-01-06T17:53:07Z

    Nathan Bossart <nathandbossart@gmail.com> writes:
    > On Thu, Dec 29, 2022 at 03:29:15PM -0500, Tom Lane wrote:
    >> I tried to make a patch along these lines, and soon hit a stumbling
    >> block: ONLY is a fully-reserved SQL keyword.  I don't think this
    >> syntax is attractive enough to justify requiring people to
    >> double-quote the option, so we are back to square one.  Anybody
    >> have a different suggestion?
    
    > ... adding both SKIP_DATABASE_STATS and ONLY_DATABASE_STATS might be the
    > best bet.
    
    Nobody has proposed a different bikeshed color, so I'm going to
    proceed with that syntax.  I'll incorporate the parallel-machinery
    fix from your patch and push to HEAD only (since it's hard to argue
    this isn't a new feature).
    
    This needn't foreclose pursuing the various ideas about making
    vac_update_datfrozenxid faster; but none of those would eliminate
    the fundamental O(N^2) issue AFAICS.
    
    			regards, tom lane