Thread

  1. RE: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE

    Aya Iwata (Fujitsu) <iwata.aya@fujitsu.com> — 2025-10-21T14:14:34Z

    Hi Peter-san, Michael-san,
    
    Thank you for your comments.
    I updated patch to v0009. Please review attached patch.
    
    > -----Original Message-----
    > From: Peter Smith <smithpb2250@gmail.com>
    > Sent: Monday, October 20, 2025 11:02 AM
    
    > Some comments for the latest v8 patch.
    > 
    > ======
    > src/backend/postmaster/bgworker.c
    > 
    > TerminateBgWorkersByBbOid:
    > 
    > 1.
    > +void
    > +TerminateBgWorkersByDbOid(Oid oid)
    > 
    > Now the function name is more explicit, but that is not a good reason
    > to make the parameter name more vague.
    > 
    > IMO the parameter should still be "dbOid" or "databaseId" instead of
    > just "oid". (ditto for the extern in bgworker.h)
    
    I agree with you. I reverted parameter name to databaseId.
    
    
    > ======
    > src/backend/storage/ipc/procarray.c
    > 
    > 
    > CountOtherDBBackends:
    > 
    > 2.
    > + /*
    > + * Usually, we try 50 times with 100ms sleep between tries, making 5 sec
    > + * total wait. If requested, it would be reduced to 10 times to shorten the
    > + * test time.
    > + */
    > 
    > 
    > The comment seemed vague to me. How about more like:
    > 
    > /*
    >  * Retry up to 50 times with 100ms between attempts (max 5s total).
    >  * Can be reduced to 10 attempts (max 1s total) to speed up tests.
    >  */
    
    Thank you. I updated this comment.
    
    > 3.
    > + for (tries = 0; tries < ntries; tries++)
    > 
    > 'tries' can be declared as a for-loop variable.
    
    I have declared "int" within the for-loop.
    
    > ~~~
    > 
    > 4.
    > Something feels strange about this function name
    > (CountOtherDBBackends) which suggests it is just for counting stuff,
    > but in reality is more about exiting/terminating the workers. In fact
    > retuns a boolean, not a count. Compare this with this similarly named
    > "CountUserBackends" which really *is* doing what it says.
    > 
    > Can we give this function a better name, or is that out of scope for this patch?
    
    I think this is out of scope because existing code have terminated autovacuum process by SIGTERM.
    It can be discussed separately.
    I just added a comment to this function "background workers would also be terminated".
    
    > ======
    > src/test/modules/worker_spi/t/002_worker_terminate.pl
    > 
    > 5.
    > +# Firstly register an injection point to make the test faster. Normally, it
    > +# spends more than 5 seconds because the backend retries, counting the
    > number
    > +# of connecting processes 50 times, but now the counting would be done only
    > 10
    > +# times. See CountOtherDBBackends().
    > +$node->safe_psql('postgres', "CREATE EXTENSION injection_points;");
    > +$node->safe_psql('postgres',
    > + "SELECT injection_points_attach('reduce-ncounts', 'error');");
    > +
    > 
    > It seemed overkill to give details about what "normally" happens. I
    > think it is enough to have a simple comment here:
    > 
    > SUGGESTION
    > The injection point 'reduce-ncounts' reduces the number of backend
    > retries, allowing for shorter test runs. See CountOtherDBBackends().
    
    Thank you for your suggestion. I updated this comment.
    
    > -----Original Message-----
    > From: Michael Paquier <michael@paquier.xyz>
    > Sent: Monday, October 20, 2025 1:33 PM
    
    > On Mon, Oct 20, 2025 at 01:01:31PM +1100, Peter Smith wrote:
    > > Some comments for the latest v8 patch.
    > 
    > The comments of Peter apply to comments and parameters.  I am not
    > going down to these details in this message, these can be infinitely
    > tuned.
    > 
    > The injection point integration looks correct.  You are checking the
    > compile flag and if the extension is available in the installation
    > path, which should be enough.
    > 
    > +   if (IS_INJECTION_POINT_ATTACHED("reduce-ncounts"))
    > +       ntries = 10;
    > 
    > 1s is much faster than the default of 5s, still I am wondering if this
    > cannot be brought down a bit more.  Dropping the worker still around
    > after the first test with CREATE DATABASE works here.
    
    Thank you. I updated ntries to 3.
    
    > +# Confirm a background worker is still running
    > +$node->safe_psql(
    > +	"postgres", qq(
    > +        SELECT count(1) FROM pg_stat_activity
    > +		WHERE backend_type = 'worker_spi dynamic';));
    > 
    > This does not check that the worker that does not have the flag set is
    > still running: you are not feeding the output of this query to an is()
    > test.
    > 
    > +	is($result, 't', "dynamic bgworker launched");
    > 
    > In launch_bgworker(), this uses the same test description for all the
    > callers of this subroutine.  Let's prefix it with $testcase.
    
    I added $testcase. Is it same as your expectations?
    
    > +void
    > +TerminateBgWorkersByDbOid(Oid oid)
    > 
    > FWIW, while reading this code, I was wondering about one improvement
    > that could show benefits for more extension code than only what we are
    > discussing here because external code has no access to
    > BackgroundWorkerSlot while holding the LWLock BackgroundWorkerLock in
    > a single loop, by rewriting this new routine with something like that:
    > void TerminateBackgroundWorkerMatchin(
    > bool (*do_terminate) (int pid, BackgroundWorker *, Datum))
    > 
    > Then the per-database termination would be a custom routine, defined
    > also in bgworker.c.  Other extension code could define their own
    > filtering callback routine.  Just an idea in passing, to let extension
    > code take more actions on bgworker slots in use-based on a PGPROC
    > entry, like a role ID for example, or it could be a different factor.
    > Feel free to dislike such a funky idea if you do not like it and say
    > so, of course.
    
    Thank you for your advice.
    I'd like to address that, but I couldn't figure out how to do it on my own. 
    Could you please describe it more?
    
    Regards,
    Aya Iwata
    Fujitsu Limited