Thread

  1. Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

    Nikolay Samokhvalov <nik@postgres.ai> — 2023-07-10T21:37:24Z

    On Mon, Jul 10, 2023 at 2:02 PM Michael Banck <mbanck@gmx.net> wrote:
    
    > Thanks for that!
    >
    
    Thanks for the fast review.
    
    
    >
    > > I agree with Andrey – without it, we don't have any good way to upgrade
    > > large clusters in short time. Default rsync mode (without "--size-only")
    > > takes a lot of time too, if the load is heavy.
    > >
    > > With these adjustments, can "rsync --size-only" remain in the docs as the
    > > *fast* and safe method to upgrade standbys, or there are still some
    > > concerns related to corruption risks?
    >
    > I hope somebody can answer that definitively, but I read Stephen's mail
    > to indicate that this procedure should be safe in principle (if you know
    > what you are doing).
    >
    
    right, this is my understanding too
    
    
    > > +++ b/doc/src/sgml/ref/pgupgrade.sgml
    > > @@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion;
    > >      </para>
    > >
    > >      <para>
    > > -     Streaming replication and log-shipping standby servers can
    > > +     Streaming replication and log-shipping standby servers must
    > >       remain running until a later step.
    > >      </para>
    > >     </step>
    > >
    > > -   <step>
    > > +   <step id="pgupgrade-step-prepare-standbys">
    > >
    > >      <para>
    > > -     If you are upgrading standby servers using methods outlined in
    > section <xref
    > > -     linkend="pgupgrade-step-replicas"/>, verify that the old standby
    > > -     servers are caught up by running
    > <application>pg_controldata</application>
    > > -     against the old primary and standby clusters.  Verify that the
    > > -     <quote>Latest checkpoint location</quote> values match in all
    > clusters.
    > > -     (There will be a mismatch if old standby servers were shut down
    > > -     before the old primary or if the old standby servers are still
    > running.)
    > > +     If you are upgrading standby servers using methods outlined in
    > > +     <xref linkend="pgupgrade-step-replicas"/>,
    >
    > You dropped the "section" before the xref, I think that should be kept
    > around.
    >
    
    Seems to be a problem in discussing source code that looks quite different
    than the final result.
    
    In the result – the docs – we currently have "section Step 9", looking
    weird. I still think it's good to remove it. We also have "in Step 17
    below" (without the extra word "section") in a different place on the same
    page.
    
    
    >
    > > +                                                ensure that they were
    > running when
    > > +     you shut down the primaries in the previous step, so all the
    > latest changes
    >
    > You talk of primaries in plural here, that is a bit weird for PostgreSQL
    > documentation.
    >
    
    The same docs already discuss two primaries ("8. Stop both primaries"), but
    I agree it might look confusing if you read only a part of the doc, jumping
    into middle of it, like I do all the time when using the docs in "check the
    reference" style.
    
    I agree with this comment, but it tells me we need even more improvements
    of this doc, beyond my original goal – e.g., I don't like section 8 saying
    "Make sure both database servers", it should be "both primaries".
    
    
    >
    > > +     and the shutdown checkpoint record were received. You can verify
    > this by running
    > > +     <application>pg_controldata</application> against the old primary
    > and standby
    > > +     clusters. The <quote>Latest checkpoint location</quote> values
    > must match in all
    > > +     nodes. A mismatch might occur if old standby servers were shut
    > down before
    > > +     the old primary. To fix a mismatch, start all old servers and
    > return to the
    > > +     previous step; proceeding with mismatched
    > > +     <quote>Latest checkpoint location</quote> may lead to standby
    > corruption.
    > > +    </para>
    > > +
    > > +    <para>
    > >       Also, make sure <varname>wal_level</varname> is not set to
    > >       <literal>minimal</literal> in the
    > <filename>postgresql.conf</filename> file on the
    > >       new primary cluster.
    > > @@ -497,7 +503,6 @@ pg_upgrade.exe
    > >       linkend="warm-standby"/>) standby servers, you can follow these
    > steps to
    > >       quickly upgrade them.  You will not be running
    > <application>pg_upgrade</application> on
    > >       the standby servers, but rather <application>rsync</application>
    > on the primary.
    > > -     Do not start any servers yet.
    > >      </para>
    > >
    > >      <para>
    > > @@ -508,6 +513,15 @@ pg_upgrade.exe
    > >       is running.
    > >      </para>
    > >
    > > +    <para>
    > > +     Before running rsync, to avoid standby corruption, it is absolutely
    > > +     critical to ensure that both primaries are shut down and standbys
    > > +     have received the last changes (see <xref
    > linkend="pgupgrade-step-prepare-standbys"/>).
    >
    > I think this should be something like "ensure both that the primary is
    > shut down and that the standbys have received all the changes".
    >
    
    Well, we have two primary servers – old and new. I tried to clarify it in
    the new version.
    
    
    >
    > > +     Standbys can be running at this point or fully stopped.
    >
    > "or be fully stopped." I think.
    >
    > > +                                                             If they
    > > +     are still running, you can stop, upgrade, and start them one by
    > one; this
    > > +     can be useful to keep the cluster open for read-only transactions.
    > > +    </para>
    >
    > Maybe this is clear from the context, but "upgrade" in the above should
    > maybe more explicitly refer to the rsync method or else people might
    > think one can run pg_upgrade on them after all?
    >
    
    Maybe. It will require changes in other parts of this doc.
    Thinking (here:
    https://gitlab.com/postgres/postgres/-/merge_requests/18/diffs)
    
    Meanwhile, attached is v2
    
    thanks for the comments