Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add pg_stat_database counters for sessions and session time
- 960869da0803 14.0 cited

Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-06-30T08:01:00Z

Hi hackers,

While commit 960869da08 added some information about connections that 
have been successfully authenticated, there is no metrics for 
connections that have not (or did not reached the authentication stage).

Adding metrics about failed connections attempts could also help, for 
example with proper sampling, to:

  * detect spikes in failed login attempts
  * check if there is a correlation between spikes in successful and
    failed connection attempts

While the number of successful connections could also already been 
tracked with the ClientAuthentication_hook (and also the ones that 
failed the authentication) we are missing metrics about:

  * why the connection failed (could be bad password, bad database, bad
    user, missing CONNECT privilege...)
  * number of times the authentication stage has not been reached
  * why the authentication stage has not been reached (bad startup
    packets, timeout while processing startup packet,...)

Those missing metrics (in addition to the ones that can be already 
gathered) could provide value for:

  * security investigations
  * anomalies detections
  * tracking application misconfigurations

In an attempt to be able to provide those metrics, please find attached 
a patch proposal to add new hooks in the connection path, that would be 
fired if:

  * there is a bad startup packet
  * there is a timeout while processing the startup packet
  * user does not have CONNECT privilege
  * database does not exist

For safety those hooks request the use of a const Port parameter, so 
that they could be used only for reporting purpose (for example, we are 
working on an extension to record detailed login metrics counters).

Another option could be to add those metrics in the engine itself 
(instead of providing new hooks to get them), but the new hooks option 
gives more flexibility on how to render and exploit them (there is a lot 
of information in the Port Struct that one could be interested with).

I’m adding this patch proposal to the commitfest.
Looking forward to your feedback,

Regards,
Bertrand

Re: Patch proposal: New hooks in the connection path

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> — 2022-06-30T09:23:33Z

On Thu, Jun 30, 2022 at 1:31 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi hackers,
>
> While commit 960869da08 added some information about connections that have been successfully authenticated, there is no metrics for connections that have not (or did not reached the authentication stage).
>
> Adding metrics about failed connections attempts could also help, for example with proper sampling, to:
>
> detect spikes in failed login attempts
> check if there is a correlation between spikes in successful and failed connection attempts
>
> While the number of successful connections could also already been tracked with the ClientAuthentication_hook (and also the ones that failed the authentication) we are missing metrics about:
>
> why the connection failed (could be bad password, bad database, bad user, missing CONNECT privilege...)
> number of times the authentication stage has not been reached
> why the authentication stage has not been reached (bad startup packets, timeout while processing startup packet,...)
>
> Those missing metrics (in addition to the ones that can be already gathered) could provide value for:
>
> security investigations
> anomalies detections
> tracking application misconfigurations
>
> In an attempt to be able to provide those metrics, please find attached a patch proposal to add new hooks in the connection path, that would be fired if:
>
> there is a bad startup packet
> there is a timeout while processing the startup packet
> user does not have CONNECT privilege
> database does not exist
>
> For safety those hooks request the use of a const Port parameter, so that they could be used only for reporting purpose (for example, we are working on an extension to record detailed login metrics counters).
>
> Another option could be to add those metrics in the engine itself (instead of providing new hooks to get them), but the new hooks option gives more flexibility on how to render and exploit them (there is a lot of information in the Port Struct that one could be interested with).
>
> I’m adding this patch proposal to the commitfest.
> Looking forward to your feedback,

+1 for the idea. I've seen numerous cases where the login metrics
(especially failed connections) are handy in analyzing stuff. And I'm
okay with the hook approach than the postgres emitting the necessary
metrics. However, I'm personally not okay with having multiple hooks
as proposed in the v1 patch. Can we think of having a single hook or
enhancing the existing ClientAuthentication_hook where we pass a
PURPOSE parameter (CONN_SUCCESS, CONN_FAILURE, CONN_FOO, CONN_BAR
....) tp the hook? With this approach, we don't need to spread out the
postgres code with many hooks and the hook implementers will look at
the PURPOSE parameter and deal with it accordingly.

On the security aspect, we must ensure we don't leak any sensitive
information such as password or SSH key to the new hook - if PGPORT
has this information, maybe we need to mask that structure a bit
before handing it off to the hook.

Regards,
Bharath Rupireddy.

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-07-01T07:48:40Z

Hi,

On 6/30/22 11:23 AM, Bharath Rupireddy wrote:
> On Thu, Jun 30, 2022 at 1:31 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>> Hi hackers,
>>
>> While commit 960869da08 added some information about connections that have been successfully authenticated, there is no metrics for connections that have not (or did not reached the authentication stage).
>>
>> Adding metrics about failed connections attempts could also help, for example with proper sampling, to:
>>
>> detect spikes in failed login attempts
>> check if there is a correlation between spikes in successful and failed connection attempts
>>
>> While the number of successful connections could also already been tracked with the ClientAuthentication_hook (and also the ones that failed the authentication) we are missing metrics about:
>>
>> why the connection failed (could be bad password, bad database, bad user, missing CONNECT privilege...)
>> number of times the authentication stage has not been reached
>> why the authentication stage has not been reached (bad startup packets, timeout while processing startup packet,...)
>>
>> Those missing metrics (in addition to the ones that can be already gathered) could provide value for:
>>
>> security investigations
>> anomalies detections
>> tracking application misconfigurations
>>
>> In an attempt to be able to provide those metrics, please find attached a patch proposal to add new hooks in the connection path, that would be fired if:
>>
>> there is a bad startup packet
>> there is a timeout while processing the startup packet
>> user does not have CONNECT privilege
>> database does not exist
>>
>> For safety those hooks request the use of a const Port parameter, so that they could be used only for reporting purpose (for example, we are working on an extension to record detailed login metrics counters).
>>
>> Another option could be to add those metrics in the engine itself (instead of providing new hooks to get them), but the new hooks option gives more flexibility on how to render and exploit them (there is a lot of information in the Port Struct that one could be interested with).
>>
>> I’m adding this patch proposal to the commitfest.
>> Looking forward to your feedback,
> +1 for the idea. I've seen numerous cases where the login metrics
> (especially failed connections) are handy in analyzing stuff. And I'm
> okay with the hook approach than the postgres emitting the necessary
> metrics.

Thanks for looking at it!

> However, I'm personally not okay with having multiple hooks
> as proposed in the v1 patch.

I agree that it would be great to reduce the number of proposed hooks.

But,

>   Can we think of having a single hook

The proposed hooks are triggered during errors (means that the 
connection attempt break) and:

- In the connection paths that will not reach the 
ClientAuthentication_hook at all: those are the ones related to the bad 
startup packet and timeout while processing the startup packet.

or

- After the ClientAuthentication_hook is fired: those are the bad db 
oid, bad db name and bad perm ones.

So, It does look like having only one hook would require refactoring in 
the connection path and I'm not sure if this is worth it.

> or
> enhancing the existing ClientAuthentication_hook where we pass a
> PURPOSE parameter (CONN_SUCCESS, CONN_FAILURE, CONN_FOO, CONN_BAR
> ....) tp the hook?

I think one could already "predict" the bad db and bad perm errors 
within the current ClientAuthentication_hook.

But in case of multiple "possible" errors (within the same connection 
attempt) how could we know for sure the one that will be actually 
reported? That's why i think the best way is to put new hooks as close 
as possible to the place where the related errors are reported.

What do you think?

> With this approach, we don't need to spread out the
> postgres code with many hooks and the hook implementers will look at
> the PURPOSE parameter and deal with it accordingly.
>
> On the security aspect, we must ensure we don't leak any sensitive
> information such as password or SSH key to the new hook - if PGPORT
> has this information, maybe we need to mask that structure a bit
> before handing it off to the hook.

Yeah good point, I'll have a closer look if there is any information 
that could be present in the Port struct before the 
ClientAuthentication_hook is fired (means during the ones that are 
related to the startup packet) and that we would want to mask.

Regards,

Bertrand

Re: Patch proposal: New hooks in the connection path

Nathan Bossart <nathandbossart@gmail.com> — 2022-07-01T23:00:27Z

On Fri, Jul 01, 2022 at 09:48:40AM +0200, Drouvot, Bertrand wrote:
>> However, I'm personally not okay with having multiple hooks
>> as proposed in the v1 patch.
> 
> I agree that it would be great to reduce the number of proposed hooks.
> 
> But,
> 
>>   Can we think of having a single hook
> 
> The proposed hooks are triggered during errors (means that the connection
> attempt break) and:
> 
> - In the connection paths that will not reach the ClientAuthentication_hook
> at all: those are the ones related to the bad startup packet and timeout
> while processing the startup packet.
> 
> or
> 
> - After the ClientAuthentication_hook is fired: those are the bad db oid,
> bad db name and bad perm ones.
> 
> So, It does look like having only one hook would require refactoring in the
> connection path and I'm not sure if this is worth it.
> 
>> or
>> enhancing the existing ClientAuthentication_hook where we pass a
>> PURPOSE parameter (CONN_SUCCESS, CONN_FAILURE, CONN_FOO, CONN_BAR
>> ....) tp the hook?
> 
> I think one could already "predict" the bad db and bad perm errors within
> the current ClientAuthentication_hook.
> 
> But in case of multiple "possible" errors (within the same connection
> attempt) how could we know for sure the one that will be actually reported?
> That's why i think the best way is to put new hooks as close as possible to
> the place where the related errors are reported.
> 
> What do you think?

Could we model this after fmgr_hook?  The first argument in that hook
indicates where it is being called from.  This doesn't alleviate the need
for several calls to the hook in the authentication logic, but extension
authors would only need to define one hook.

That being said, I don't see why this information couldn't be provided in a
system view.  IMO it is generically useful.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

Roberto Mello <roberto.mello@gmail.com> — 2022-07-02T00:49:09Z

On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:

>
>
> That being said, I don't see why this information couldn't be provided in a
> system view.  IMO it is generically useful.

+1 for a system view with appropriate permissions, in addition to the
hooks.

That would make the information easily accessible to a number or monitoring
systems besides the admin.

Roberto

—
Crunchy Data — passion for open source PostgreSQL

>

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-07-04T12:53:24Z

Hi,

On 7/2/22 1:00 AM, Nathan Bossart wrote:
> Could we model this after fmgr_hook?  The first argument in that hook
> indicates where it is being called from.  This doesn't alleviate the need
> for several calls to the hook in the authentication logic, but extension
> authors would only need to define one hook.

I like the idea and indeed fmgr.h looks a good place to model it.

Attached a new patch version doing so.

Thanks

-- 

Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-07-04T12:58:50Z

Hi,

On 7/2/22 2:49 AM, Roberto Mello wrote:
>
> On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart 
> <nathandbossart@gmail.com> wrote:
>
>
>
>     That being said, I don't see why this information couldn't be
>     provided in a
>     system view.  IMO it is generically useful.
>
>
> +1 for a system view with appropriate permissions, in addition to the 
> hooks.
>
> That would make the information easily accessible to a number or 
> monitoring systems besides the admin.
>
Agree about that.

I'll start another thread and propose a dedicated patch for the 
"internal counters" and how to expose them.

Thanks for the feedback,

-- 
Bertrand Drouvot
Amazon Web Services:https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

Zhihong Yu <zyu@yugabyte.com> — 2022-07-04T13:12:42Z

On Mon, Jul 4, 2022 at 5:54 AM Drouvot, Bertrand <bdrouvot@amazon.com>
wrote:

> Hi,
>
> On 7/2/22 1:00 AM, Nathan Bossart wrote:
> > Could we model this after fmgr_hook?  The first argument in that hook
> > indicates where it is being called from.  This doesn't alleviate the need
> > for several calls to the hook in the authentication logic, but extension
> > authors would only need to define one hook.
>
> I like the idea and indeed fmgr.h looks a good place to model it.
>
> Attached a new patch version doing so.
>
> Thanks
>
> --
>
> Bertrand Drouvot
> Amazon Web Services: https://aws.amazon.com

Hi,
+   FCET_SPT,   /* startup packet timeout */
+   FCET_BSP,   /* bad startup packet */

Looking at existing enum type, such as FmgrHookEventType, the part after
underscore is a word.
I think it would be good to follow existing practice and make the enums
more readable.

Cheers

Re: Patch proposal: New hooks in the connection path

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> — 2022-07-05T07:07:23Z

On Mon, Jul 4, 2022 at 6:29 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> On 7/2/22 2:49 AM, Roberto Mello wrote:
>
> On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>>
>> That being said, I don't see why this information couldn't be provided in a
>> system view.  IMO it is generically useful.
>
> +1 for a system view with appropriate permissions, in addition to the hooks.
>
> That would make the information easily accessible to a number or monitoring systems besides the admin.
>
> Agree about that.

Are we going to have it as a part of shared memory stats? Or a
separate shared memory for connection stats exposing these via a
function and a view can be built on this function like
pg_get_replication_slots and pg_replication_slots?

> I'll start another thread and propose a dedicated patch for the "internal counters" and how to expose them.

IMHO, let's have the discussion here in this thread and the patch can be 0002.

Regards,
Bharath Rupireddy.

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-07-05T07:17:03Z

Hi,

On 7/4/22 3:12 PM, Zhihong Yu wrote:
> Hi,
> +   FCET_SPT,   /* startup packet timeout */
> +   FCET_BSP,   /* bad startup packet */
>
> Looking at existing enum type, such as FmgrHookEventType, the part 
> after underscore is a word.
> I think it would be good to follow existing practice and make the 
> enums more readable.

Fair point.

Attached a new version which makes the enums more readable.

Thanks for the feedback

Regards

-- 
Bertrand Drouvot
Amazon Web Services:https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> — 2022-07-05T07:37:24Z

On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi,
>
> On 7/2/22 1:00 AM, Nathan Bossart wrote:
> > Could we model this after fmgr_hook?  The first argument in that hook
> > indicates where it is being called from.  This doesn't alleviate the need
> > for several calls to the hook in the authentication logic, but extension
> > authors would only need to define one hook.
>
> I like the idea and indeed fmgr.h looks a good place to model it.
>
> Attached a new patch version doing so.

Thanks for the patch. Can we think of enhancing
ClientAuthentication_hook_type itself i.e. make it a generic hook for
all sorts of authentication metrics, info etc. with the type parameter
embedded to it instead of new hook FailedConnection_hook?We can either
add a new parameter for the "event" (the existing
ClientAuthentication_hook_type implementers will have problems), or
embed/multiplex the "event" info to existing Port structure or status
variable (macro or enum) (existing implementers will not have
compatibility problems).  IMO, this looks cleaner going forward.

On the v2 patch:

1. Why do we need to place the hook and structures in fmgr.h? Why not in auth.h?

2. Timeout Handler is a signal handler, called as part of SIGALRM
signal handler, most of the times, signal handlers ought to be doing
small things, now that we are handing off the control to hook, which
can do any long running work (writing to a remote storage, file,
aggregate etc.), I don't think it's the right thing to do here.
 static void
 StartupPacketTimeoutHandler(void)
 {
+ if (FailedConnection_hook)
+ (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort);
+ ereport(COMMERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("timeout while processing startup packet")));

3. On "not letting these hooks (ClientAuthentication_hook_type or
FailedConnection_hook_type) expose sensitive info via Port structure"
- it seems like the Port structure has sensitive info like HbaLine,
host address, username, etc. but that's what it is so I think we are
okay with the structure as-is.

Regards,
Bharath Rupireddy.

Re: Patch proposal: New hooks in the connection path

Brindle, Joshua <joshuqbr@amazon.com> — 2022-07-05T13:27:06Z

On 6/30/22 5:23 AM, Bharath Rupireddy wrote:
> <snip>
> On the security aspect, we must ensure we don't leak any sensitive
> information such as password or SSH key to the new hook - if PGPORT
> has this information, maybe we need to mask that structure a bit
> before handing it off to the hook.

Can you elaborate more on why you see this as necessary? Extensions run 
in-process and have no real memory access limits, "masking", which 
really means copying data to another struct, is just extra work and 
overhead with no actual security gain, IMO.

Re: Patch proposal: New hooks in the connection path

Joe Conway <mail@joeconway.com> — 2022-07-05T22:11:12Z

On 7/5/22 03:37, Bharath Rupireddy wrote:
> On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>> On 7/2/22 1:00 AM, Nathan Bossart wrote:
>> > Could we model this after fmgr_hook?  The first argument in that hook
>> > indicates where it is being called from.  This doesn't alleviate the need
>> > for several calls to the hook in the authentication logic, but extension
>> > authors would only need to define one hook.
>>
>> I like the idea and indeed fmgr.h looks a good place to model it.
>>
>> Attached a new patch version doing so.

I was thinking along the same lines, so +1 for the general approach

> Thanks for the patch. Can we think of enhancing
> ClientAuthentication_hook_type itself i.e. make it a generic hook for
> all sorts of authentication metrics, info etc. with the type parameter
> embedded to it instead of new hook FailedConnection_hook?We can either
> add a new parameter for the "event" (the existing
> ClientAuthentication_hook_type implementers will have problems), or
> embed/multiplex the "event" info to existing Port structure or status
> variable (macro or enum) (existing implementers will not have
> compatibility problems).  IMO, this looks cleaner going forward.

Not sure I like this though -- I'll have to think about that

> On the v2 patch:
> 
> 1. Why do we need to place the hook and structures in fmgr.h? Why not in auth.h?

agreed -- it does not belong in fmgr.h

> 2. Timeout Handler is a signal handler, called as part of SIGALRM
> signal handler, most of the times, signal handlers ought to be doing
> small things, now that we are handing off the control to hook, which
> can do any long running work (writing to a remote storage, file,
> aggregate etc.), I don't think it's the right thing to do here.
>   static void
>   StartupPacketTimeoutHandler(void)
>   {
> + if (FailedConnection_hook)
> + (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort);
> + ereport(COMMERROR,
> + (errcode(ERRCODE_PROTOCOL_VIOLATION),
> + errmsg("timeout while processing startup packet")));

Why add the ereport()?

But more to Bharath's point, perhaps this is a case that is better 
served by incrementing a stat counter and not exposed as a hook?

Also, a teeny nit:
8<--------------
+	if (status != STATUS_OK) {
+		if (FailedConnection_hook)
8<--------------

does not follow usual practice and probably should be:

8<--------------
+	if (status != STATUS_OK)
+	{
+		if (FailedConnection_hook)
8<--------------


-- 
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-07-06T08:13:51Z

Hi,

On 7/6/22 12:11 AM, Joe Conway wrote:
>
> On 7/5/22 03:37, Bharath Rupireddy wrote:
>> On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand 
>> <bdrouvot@amazon.com> wrote:
>>> On 7/2/22 1:00 AM, Nathan Bossart wrote:
>>> > Could we model this after fmgr_hook?  The first argument in that hook
>>> > indicates where it is being called from.  This doesn't alleviate 
>>> the need
>>> > for several calls to the hook in the authentication logic, but 
>>> extension
>>> > authors would only need to define one hook.
>>>
>>> I like the idea and indeed fmgr.h looks a good place to model it.
>>>
>>> Attached a new patch version doing so.
>
> I was thinking along the same lines, so +1 for the general approach

Thanks for the review!

>
>> Thanks for the patch. Can we think of enhancing
>> ClientAuthentication_hook_type itself i.e. make it a generic hook for
>> all sorts of authentication metrics, info etc. with the type parameter
>> embedded to it instead of new hook FailedConnection_hook?We can either
>> add a new parameter for the "event" (the existing
>> ClientAuthentication_hook_type implementers will have problems), or
>> embed/multiplex the "event" info to existing Port structure or status
>> variable (macro or enum) (existing implementers will not have
>> compatibility problems).  IMO, this looks cleaner going forward.
>
> Not sure I like this though -- I'll have to think about that

Not sure about this one neither.

The "enhanced" ClientAuthentication_hook will have to be fired at the 
same places as the new FailedConnection_hook is, but i think those 
places are not necessary linked to real authentication per say (making 
the name confusing).

>
>> On the v2 patch:
>>
>> 1. Why do we need to place the hook and structures in fmgr.h? Why not 
>> in auth.h?
>
> agreed -- it does not belong in fmgr.h

Moved to auth.h.

>
>> 2. Timeout Handler is a signal handler, called as part of SIGALRM
>> signal handler, most of the times, signal handlers ought to be doing
>> small things, now that we are handing off the control to hook, which
>> can do any long running work (writing to a remote storage, file,
>> aggregate etc.), I don't think it's the right thing to do here.
>>   static void
>>   StartupPacketTimeoutHandler(void)
>>   {
>> + if (FailedConnection_hook)
>> + (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort);
>> + ereport(COMMERROR,
>> + (errcode(ERRCODE_PROTOCOL_VIOLATION),
>> + errmsg("timeout while processing startup packet")));
>
> Why add the ereport()?

removed it.

>
> But more to Bharath's point, perhaps this is a case that is better
> served by incrementing a stat counter and not exposed as a hook?

I think that the advantage of the hook is that it gives the extension 
author the ability/flexibility to aggregate the counter based on 
information available in the Port Struct (say the client addr for 
example) at this stage.

What about to aggregate the stat counter based on the client addr? (Not 
sure if there is more useful information (than the client addr) at this 
stage though)

That said, i agree that having a hook in a time out handler might not be 
the right thing to do (even if at the end that would be to the extension 
author responsibility to do "small things" in it), so it has been 
removed in the new attached version.

>
> Also, a teeny nit:
> 8<--------------
> +       if (status != STATUS_OK) {
> +               if (FailedConnection_hook)
> 8<--------------
>
> does not follow usual practice and probably should be:
>
> 8<--------------
> +       if (status != STATUS_OK)
> +       {
> +               if (FailedConnection_hook)
> 8<--------------
>
>
Thanks!, fixed.

-- 

Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-07-06T08:18:08Z

Hi,

On 7/5/22 9:07 AM, Bharath Rupireddy wrote:
> On Mon, Jul 4, 2022 at 6:29 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>> On 7/2/22 2:49 AM, Roberto Mello wrote:
>>
>> On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>>> That being said, I don't see why this information couldn't be provided in a
>>> system view.  IMO it is generically useful.
>> +1 for a system view with appropriate permissions, in addition to the hooks.
>>
>> That would make the information easily accessible to a number or monitoring systems besides the admin.
>>
>> Agree about that.
> Are we going to have it as a part of shared memory stats? Or a
> separate shared memory for connection stats exposing these via a
> function and a view can be built on this function like
> pg_get_replication_slots and pg_replication_slots?

Thanks for looking at it.

I don't have any proposal yet, I'll have to look at it.

-- 

Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

Joe Conway <mail@joeconway.com> — 2022-07-07T19:56:10Z

On 7/6/22 04:13, Drouvot, Bertrand wrote:
> On 7/6/22 12:11 AM, Joe Conway wrote:
>> On 7/5/22 03:37, Bharath Rupireddy wrote:
>>> 2. Timeout Handler is a signal handler, called as part of SIGALRM
>>> signal handler, most of the times, signal handlers ought to be doing
>>> small things, now that we are handing off the control to hook, which
>>> can do any long running work (writing to a remote storage, file,
>>> aggregate etc.), I don't think it's the right thing to do here.
>>>   static void
>>>   StartupPacketTimeoutHandler(void)
>>>   {
>>> + if (FailedConnection_hook)
>>> + (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort);

>> But more to Bharath's point, perhaps this is a case that is better
>> served by incrementing a stat counter and not exposed as a hook?
> 
> I think that the advantage of the hook is that it gives the extension
> author the ability/flexibility to aggregate the counter based on
> information available in the Port Struct (say the client addr for
> example) at this stage.
> 
> What about to aggregate the stat counter based on the client addr? (Not
> sure if there is more useful information (than the client addr) at this
> stage though)
> 
> That said, i agree that having a hook in a time out handler might not be
> the right thing to do (even if at the end that would be to the extension
> author responsibility to do "small things" in it), so it has been
> removed in the new attached version.

It isn't clear to me if having a hook in the timeout handler is a 
nonstarter -- perhaps a comment with suitable warning for prospective 
extension authors is enough? Anyone else want to weigh in on this issue 
specifically?

-- 
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

Tom Lane <tgl@sss.pgh.pa.us> — 2022-07-07T20:10:34Z

Joe Conway <mail@joeconway.com> writes:
> It isn't clear to me if having a hook in the timeout handler is a 
> nonstarter -- perhaps a comment with suitable warning for prospective 
> extension authors is enough? Anyone else want to weigh in on this issue 
> specifically?

It doesn't seem like a great place for a hook, because the list of stuff
you could safely do there would be mighty short, possibly the empty set.
Write to shared memory?  Not too safe.  Write to a file?  Even less.
Write to local memory?  Pointless, because we're about to _exit(1).
Pretty much anything I can think of that you'd want to do is something
we've already decided the core code can't safely do, and putting it
in a hook won't make it safer.

If someone wants to argue for this hook, I'd like to see a credible
example of a *safe* use-case, keeping in mind the points raised in
the comments in BackendInitialize and process_startup_packet_die.

			regards, tom lane

Re: Patch proposal: New hooks in the connection path

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> — 2022-07-08T12:24:13Z

On Fri, Jul 8, 2022 at 1:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Joe Conway <mail@joeconway.com> writes:
> > It isn't clear to me if having a hook in the timeout handler is a
> > nonstarter -- perhaps a comment with suitable warning for prospective
> > extension authors is enough? Anyone else want to weigh in on this issue
> > specifically?
>
> It doesn't seem like a great place for a hook, because the list of stuff
> you could safely do there would be mighty short, possibly the empty set.
> Write to shared memory?  Not too safe.  Write to a file?  Even less.
> Write to local memory?  Pointless, because we're about to _exit(1).
> Pretty much anything I can think of that you'd want to do is something
> we've already decided the core code can't safely do, and putting it
> in a hook won't make it safer.

I agree with this. But, all of the areas that v2-0003 touched for
connectivity failures, they typically are emitting
ereport(FATAL,/ereport(COMMERROR, (in ProcessStartupPacket) and we
have emit_log_hook already being exposed and the implementers can,
literally, do anything the hook.

Looking at v2-0003 patch and emit_log_hook, how about we filter out
for those connectivity errors either based on error codes and if they
aren't unique, perhaps passing special flags to ereport API indicating
that it's a connectivity error and in the emit_log_hook we can look
for those connectivity error codes or flags to collect the stats about
the failure connections (with MyProcPort being present in
emit_log_hook)? This way, we don't need a new hook. Thoughts?

Regards,
Bharath Rupireddy.

Re: Patch proposal: New hooks in the connection path

Tom Lane <tgl@sss.pgh.pa.us> — 2022-07-08T13:43:22Z

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> On Fri, Jul 8, 2022 at 1:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It doesn't seem like a great place for a hook, because the list of stuff
>> you could safely do there would be mighty short, possibly the empty set.

> I agree with this. But, all of the areas that v2-0003 touched for
> connectivity failures, they typically are emitting
> ereport(FATAL,/ereport(COMMERROR, (in ProcessStartupPacket) and we
> have emit_log_hook already being exposed and the implementers can,
> literally, do anything the hook.

This is utterly off-point, because those calls are not inside
signal handlers.

			regards, tom lane

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-07-11T06:18:46Z

Hi,

On 7/7/22 10:10 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> It isn't clear to me if having a hook in the timeout handler is a
>> nonstarter -- perhaps a comment with suitable warning for prospective
>> extension authors is enough? Anyone else want to weigh in on this issue
>> specifically?
> It doesn't seem like a great place for a hook, because the list of stuff
> you could safely do there would be mighty short, possibly the empty set.
> Write to shared memory?  Not too safe.  Write to a file?  Even less.
> Write to local memory?  Pointless, because we're about to _exit(1).
> Pretty much anything I can think of that you'd want to do is something
> we've already decided the core code can't safely do, and putting it
> in a hook won't make it safer.
>
> If someone wants to argue for this hook, I'd like to see a credible
> example of a *safe* use-case, keeping in mind the points raised in
> the comments in BackendInitialize and process_startup_packet_die.

The use case would be to increment a counter in shared memory (or most 
probably within an hash table) to reflect the number of times a startup 
packet timeout occurred.

Reading the comments in/related to BackendInitialize() I understand 
that's definitely not safe to write in shared memory for the 
EXEC_BACKEND case, but wouldn't it be safe for the non EXEC_BACKEND case?

BTW, it makes me realize that the hook being fired in the bad startup 
packet case:

         /*
          * Stop here if it was bad or a cancel packet. ProcessStartupPacket
          * already did any appropriate error reporting.
          */
         if (status != STATUS_OK)
+       {
+               if (FailedConnection_hook)
+                       (*FailedConnection_hook) 
(FCET_BAD_STARTUP_PACKET, port);
                 proc_exit(0);
+       }

is not safe for the EXEC_BACKEND case.

Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-07-12T12:58:27Z

Hi,

On 7/11/22 8:18 AM, Drouvot, Bertrand wrote:
> Hi,
>
> On 7/7/22 10:10 PM, Tom Lane wrote:
>> Joe Conway <mail@joeconway.com> writes:
>>> It isn't clear to me if having a hook in the timeout handler is a
>>> nonstarter -- perhaps a comment with suitable warning for prospective
>>> extension authors is enough? Anyone else want to weigh in on this issue
>>> specifically?
>> It doesn't seem like a great place for a hook, because the list of stuff
>> you could safely do there would be mighty short, possibly the empty set.
>> Write to shared memory?  Not too safe.  Write to a file?  Even less.
>> Write to local memory?  Pointless, because we're about to _exit(1).
>> Pretty much anything I can think of that you'd want to do is something
>> we've already decided the core code can't safely do, and putting it
>> in a hook won't make it safer.
>>
>> If someone wants to argue for this hook, I'd like to see a credible
>> example of a *safe* use-case, keeping in mind the points raised in
>> the comments in BackendInitialize and process_startup_packet_die.
>
> The use case would be to increment a counter in shared memory (or most 
> probably within an hash table) to reflect the number of times a 
> startup packet timeout occurred.
>
> Reading the comments in/related to BackendInitialize() I understand 
> that's definitely not safe to write in shared memory for the 
> EXEC_BACKEND case, but wouldn't it be safe for the non EXEC_BACKEND case?
>
> BTW, it makes me realize that the hook being fired in the bad startup 
> packet case:
>
>         /*
>          * Stop here if it was bad or a cancel packet. 
> ProcessStartupPacket
>          * already did any appropriate error reporting.
>          */
>         if (status != STATUS_OK)
> +       {
> +               if (FailedConnection_hook)
> +                       (*FailedConnection_hook) 
> (FCET_BAD_STARTUP_PACKET, port);
>                 proc_exit(0);
> +       }
>
> is not safe for the EXEC_BACKEND case.
>
What about the idea to trigger the hook for the STARTUP PACKET TIMEOUT 
and BAD STARTUP PACKET only for the non EXEC_BACKEND/Windows cases?

I'm tempted to think it's better to have some cases where one could 
benefit from the hook as opposed to none.

Thoughts?

Regards,

-- 

Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> — 2022-07-14T09:43:37Z

On Fri, Jul 8, 2022 at 5:54 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Looking at v2-0003 patch and emit_log_hook, how about we filter out
> for those connectivity errors either based on error codes and if they
> aren't unique, perhaps passing special flags to ereport API indicating
> that it's a connectivity error and in the emit_log_hook we can look
> for those connectivity error codes or flags to collect the stats about
> the failure connections (with MyProcPort being present in
> emit_log_hook)? This way, we don't need a new hook. Thoughts?

Bertrand and Other Hackers, above comment may have been lost in the
wild - any thoughts on it?

Regards,
Bharath Rupireddy.

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-08-02T13:23:35Z

Hi Bharath,

On 7/14/22 11:43 AM, Bharath Rupireddy wrote:
> On Fri, Jul 8, 2022 at 5:54 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> Looking at v2-0003 patch and emit_log_hook, how about we filter out
>> for those connectivity errors either based on error codes and if they
>> aren't unique, perhaps passing special flags to ereport API indicating
>> that it's a connectivity error and in the emit_log_hook we can look
>> for those connectivity error codes or flags to collect the stats about
>> the failure connections (with MyProcPort being present in
>> emit_log_hook)? This way, we don't need a new hook. Thoughts?
> Bertrand and Other Hackers, above comment may have been lost in the
> wild - any thoughts on it?

Thanks for your feedback!

I can see 2 issues with that approach:

- We’ll not be able to track the “startup timeout case” (well, we may 
not be able to track it anyway depending of what next to [1] will be) as 
it does not emit any log messages.
- We’ll depend of the log_min_messages value (means 
edata->output_to_server needs to be true for the emit_log_hook to be 
triggered).

[1]: 
https://www.postgresql.org/message-id/a1558d12-c1c4-0fe5-f8a5-2b6c2294e55f%40amazon.com

Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-08-08T10:51:20Z

Hi,

On 7/12/22 2:58 PM, Drouvot, Bertrand wrote:
> Hi,
>
> On 7/11/22 8:18 AM, Drouvot, Bertrand wrote:
>> Hi,
>>
>> The use case would be to increment a counter in shared memory (or 
>> most probably within an hash table) to reflect the number of times a 
>> startup packet timeout occurred.
>>
>> Reading the comments in/related to BackendInitialize() I understand 
>> that's definitely not safe to write in shared memory for the 
>> EXEC_BACKEND case, but wouldn't it be safe for the non EXEC_BACKEND 
>> case?
>>
>> BTW, it makes me realize that the hook being fired in the bad startup 
>> packet case:
>>
>>         /*
>>          * Stop here if it was bad or a cancel packet. 
>> ProcessStartupPacket
>>          * already did any appropriate error reporting.
>>          */
>>         if (status != STATUS_OK)
>> +       {
>> +               if (FailedConnection_hook)
>> +                       (*FailedConnection_hook) 
>> (FCET_BAD_STARTUP_PACKET, port);
>>                 proc_exit(0);
>> +       }
>>
>> is not safe for the EXEC_BACKEND case.
>>
> What about the idea to trigger the hook for the STARTUP PACKET TIMEOUT 
> and BAD STARTUP PACKET only for the non EXEC_BACKEND/Windows cases?
>
> I'm tempted to think it's better to have some cases where one could 
> benefit from the hook as opposed to none.
>
> Thoughts?

Please find attached v2-0004-connection_hooks.patch as an attempt of 
doing so.

Thanks

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> — 2022-08-13T08:47:33Z

On Tue, Aug 2, 2022 at 6:55 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi Bharath,
>
> On 7/14/22 11:43 AM, Bharath Rupireddy wrote:
> > On Fri, Jul 8, 2022 at 5:54 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >> Looking at v2-0003 patch and emit_log_hook, how about we filter out
> >> for those connectivity errors either based on error codes and if they
> >> aren't unique, perhaps passing special flags to ereport API indicating
> >> that it's a connectivity error and in the emit_log_hook we can look
> >> for those connectivity error codes or flags to collect the stats about
> >> the failure connections (with MyProcPort being present in
> >> emit_log_hook)? This way, we don't need a new hook. Thoughts?
> > Bertrand and Other Hackers, above comment may have been lost in the
> > wild - any thoughts on it?
>
> Thanks for your feedback!
>
> I can see 2 issues with that approach:
>
> - We’ll not be able to track the “startup timeout case” (well, we may
> not be able to track it anyway depending of what next to [1] will be) as
> it does not emit any log messages.
>
> [1]:
> https://www.postgresql.org/message-id/a1558d12-c1c4-0fe5-f8a5-2b6c2294e55f%40amazon.com

Yes, we wanted to be very quick in StartupPacketTimeoutHandler because
it is a timeout signal handler after all.

> - We’ll depend of the log_min_messages value (means
> edata->output_to_server needs to be true for the emit_log_hook to be
> triggered).

Hm, we can just say that 'log_min_message setting will enable/disable
the feature'.

I agree with your first point of not having an error in
StartupPacketTimeoutHandler hence I don't think using emit log hook
for the connection failure stats helps.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: Patch proposal: New hooks in the connection path

Gurjeet Singh <gurjeet@singh.im> — 2022-08-14T05:45:53Z

On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> Please find attached v2-0004-connection_hooks.patch

     /*
      * Stop here if it was bad or a cancel packet.  ProcessStartupPacket
      * already did any appropriate error reporting.
      */
     if (status != STATUS_OK)
+    {
+#ifndef EXEC_BACKEND
+        if (FailedConnection_hook)
+            (*FailedConnection_hook) (FCET_BAD_STARTUP_PACKET, port);
+#endif
         proc_exit(0);
+    }

Per the comment above the if condition, the `status != OK` may
represent a cancel packet, as well. Clearly, a cancel packet is not
the same as a _bad_ packet. So I think here you need to differentiate
between a cancel packet and a genuinely bad packet; I don't see
anything good coming good out of us, or the hook-developer, lumping
those 2 cases together.

I think we can reduce the number of places the hook is called, if we
call the hook from proc_exit(), and all the other places we simply set
a global variable to signify the reason for the failure. The case of
_exit(1) from the signal-handler cannot use such a mechanism, but I
think all the other cases of interest can simply register one of the
FCET_* value, and the hook call from proc_exit() can pass that value
to the hook.

If we can convinces ourselves that we can use proc_exit(1) in
StartupPacketTimeoutHandler(), instead of calling _exit(1), I think we
cal eliminate replace all call sites for this hook with
set-global-variable variant.

> ...
> *     This should be the only function to call exit().
> *     -cim 2/6/90
>...
> proc_exit(int code)

The comment on proc_exit() claims that should be the only place
calling exit(), except the add-on/extension hooks. So there must be a
strong reason why the signal-handler uses _exit() to bypass all
callbacks.

Best regards,
Gurjeet
http://Gurje.et

Re: Patch proposal: New hooks in the connection path

Gurjeet Singh <gurjeet@singh.im> — 2022-08-14T05:52:33Z

(reposting the same review, with many grammatical fixes)

On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> Please find attached v2-0004-connection_hooks.patch

     /*
      * Stop here if it was bad or a cancel packet.  ProcessStartupPacket
      * already did any appropriate error reporting.
      */
     if (status != STATUS_OK)
+    {
+#ifndef EXEC_BACKEND
+        if (FailedConnection_hook)
+            (*FailedConnection_hook) (FCET_BAD_STARTUP_PACKET, port);
+#endif
         proc_exit(0);
+    }

Per the comment above the if condition, the `status != OK` may
represent a cancel packet, as well. Clearly, a cancel packet is not
the same as a _bad_ packet. So I think here you need to differentiate
between a cancel packet and a genuinely bad packet; I don't see
anything good coming out of us, or the hook-developer, lumping
those 2 cases together.

I think we can reduce the number of places the hook is called, if we
call the hook from proc_exit(), and at all the other places we simply set
a global variable to signify the reason for the failure. The case of
_exit(1) from the signal-handler cannot use such a mechanism, but I
think all the other cases of interest can simply register one of the
FCET_* values, and let the call from proc_exit() pass that value
to the hook.

If we can convince ourselves that we can use proc_exit(1) in
StartupPacketTimeoutHandler(), instead of calling _exit(1), I think we
cal replace all call sites for this hook with the
set-global-variable variant.

> ...
> *     This should be the only function to call exit().
> *     -cim 2/6/90
>...
> proc_exit(int code)

The comment on proc_exit() claims that it should be the only place
calling exit(), except that the add-on/extension hooks may ignore this.
So there must be a strong reason why the signal-handler uses _exit()
to bypass all callbacks.

Best regards,
Gurjeet
http://Gurje.et

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-08-16T08:01:13Z

Hi,

On 8/14/22 7:52 AM, Gurjeet Singh wrote:
> On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>> Please find attached v2-0004-connection_hooks.patch
>       /*
>        * Stop here if it was bad or a cancel packet.  ProcessStartupPacket
>        * already did any appropriate error reporting.
>        */
>       if (status != STATUS_OK)
> +    {
> +#ifndef EXEC_BACKEND
> +        if (FailedConnection_hook)
> +            (*FailedConnection_hook) (FCET_BAD_STARTUP_PACKET, port);
> +#endif
>           proc_exit(0);
> +    }
>
> Per the comment above the if condition, the `status != OK` may
> represent a cancel packet, as well. Clearly, a cancel packet is not
> the same as a _bad_ packet. So I think here you need to differentiate
> between a cancel packet and a genuinely bad packet; I don't see
> anything good coming out of us, or the hook-developer, lumping
> those 2 cases together.

Thanks for the feedback!

Yeah, good point. I agree that it would be better to make the distinction.

> I think we can reduce the number of places the hook is called, if we
> call the hook from proc_exit(), and at all the other places we simply set
> a global variable to signify the reason for the failure. The case of
> _exit(1) from the signal-handler cannot use such a mechanism, but I
> think all the other cases of interest can simply register one of the
> FCET_* values, and let the call from proc_exit() pass that value
> to the hook.

That looks like a good idea to me. I'm tempted to rewrite the patch that 
way (and addressing the first comment in the same time).

Curious to hear about others hackers thoughts too.

> If we can convince ourselves that we can use proc_exit(1) in
> StartupPacketTimeoutHandler(), instead of calling _exit(1), I think we
> cal replace all call sites for this hook with the
> set-global-variable variant.

I can see why this is not safe in the EXEC_BACKEND case, but it's not 
clear to me why it would not be safe in the non EXEC_BACKEND case.

>> ...
>> *     This should be the only function to call exit().
>> *     -cim 2/6/90
>> ...
>> proc_exit(int code)
> The comment on proc_exit() claims that it should be the only place
> calling exit(), except that the add-on/extension hooks may ignore this.
> So there must be a strong reason why the signal-handler uses _exit()
> to bypass all callbacks.

yeah.

Regards,

-- 

Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> — 2022-08-16T08:10:29Z

On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> On 8/14/22 7:52 AM, Gurjeet Singh wrote:
> > On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> > I think we can reduce the number of places the hook is called, if we
> > call the hook from proc_exit(), and at all the other places we simply set
> > a global variable to signify the reason for the failure. The case of
> > _exit(1) from the signal-handler cannot use such a mechanism, but I
> > think all the other cases of interest can simply register one of the
> > FCET_* values, and let the call from proc_exit() pass that value
> > to the hook.
>
> That looks like a good idea to me. I'm tempted to rewrite the patch that
> way (and addressing the first comment in the same time).
>
> Curious to hear about others hackers thoughts too.

IMO, calling the hook from proc_exit() is not a good design as
proc_exit() is a generic code called from many places in the source
code, even the simple code of kind  if(call_failed_conn_hook) {
falied_conn_hook(params);} can come in the way of many exit code paths
which is undesirable, and the likelihood of introducing new bugs may
increase.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: Patch proposal: New hooks in the connection path

bdrouvotAWS <bdrouvot@amazon.com> — 2022-08-16T08:25:19Z

Hi,

On 8/16/22 10:10 AM, Bharath Rupireddy wrote:
> On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>> On 8/14/22 7:52 AM, Gurjeet Singh wrote:
>>> On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>>> I think we can reduce the number of places the hook is called, if we
>>> call the hook from proc_exit(), and at all the other places we simply set
>>> a global variable to signify the reason for the failure. The case of
>>> _exit(1) from the signal-handler cannot use such a mechanism, but I
>>> think all the other cases of interest can simply register one of the
>>> FCET_* values, and let the call from proc_exit() pass that value
>>> to the hook.
>> That looks like a good idea to me. I'm tempted to rewrite the patch that
>> way (and addressing the first comment in the same time).
>>
>> Curious to hear about others hackers thoughts too.
> IMO, calling the hook from proc_exit() is not a good design as
> proc_exit() is a generic code called from many places in the source
> code, even the simple code of kind  if(call_failed_conn_hook) {
> falied_conn_hook(params);} can come in the way of many exit code paths
> which is undesirable, and the likelihood of introducing new bugs may
> increase.

Thanks for the feedback.

What do you think about calling the hook only if the new global variable 
is not equal to its default value (which would mean don't trigger the 
hook)?

Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: New hooks in the connection path

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> — 2022-08-16T10:16:00Z

On Tue, Aug 16, 2022 at 1:55 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi,
>
> On 8/16/22 10:10 AM, Bharath Rupireddy wrote:
> > On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> >> On 8/14/22 7:52 AM, Gurjeet Singh wrote:
> >>> On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> >>> I think we can reduce the number of places the hook is called, if we
> >>> call the hook from proc_exit(), and at all the other places we simply set
> >>> a global variable to signify the reason for the failure. The case of
> >>> _exit(1) from the signal-handler cannot use such a mechanism, but I
> >>> think all the other cases of interest can simply register one of the
> >>> FCET_* values, and let the call from proc_exit() pass that value
> >>> to the hook.
> >> That looks like a good idea to me. I'm tempted to rewrite the patch that
> >> way (and addressing the first comment in the same time).
> >>
> >> Curious to hear about others hackers thoughts too.
> > IMO, calling the hook from proc_exit() is not a good design as
> > proc_exit() is a generic code called from many places in the source
> > code, even the simple code of kind  if(call_failed_conn_hook) {
> > falied_conn_hook(params);} can come in the way of many exit code paths
> > which is undesirable, and the likelihood of introducing new bugs may
> > increase.
>
> Thanks for the feedback.
>
> What do you think about calling the hook only if the new global variable
> is not equal to its default value (which would mean don't trigger the
> hook)?

IMO, that's not a good design as explained above. Why should the
failed connection hook related code get hit for each and every
proc_exit() call? Here, the code duplication i.e. the number of places
the failed connection hook gets called mustn't be the reason to move
that code to proc_exit().

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: Patch proposal: New hooks in the connection path

Gurjeet Singh <gurjeet@singh.im> — 2022-08-16T14:26:33Z

On Tue, Aug 16, 2022 at 3:16 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Aug 16, 2022 at 1:55 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> >
> > Hi,
> >
> > On 8/16/22 10:10 AM, Bharath Rupireddy wrote:
> > > On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> > >> On 8/14/22 7:52 AM, Gurjeet Singh wrote:
> > >>> On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> > >>> I think we can reduce the number of places the hook is called, if we
> > >>> call the hook from proc_exit(), and at all the other places we simply set
> > >>> a global variable to signify the reason for the failure. The case of
> > >>> _exit(1) from the signal-handler cannot use such a mechanism, but I
> > >>> think all the other cases of interest can simply register one of the
> > >>> FCET_* values, and let the call from proc_exit() pass that value
> > >>> to the hook.
> > >> That looks like a good idea to me. I'm tempted to rewrite the patch that
> > >> way (and addressing the first comment in the same time).
> > >>
> > >> Curious to hear about others hackers thoughts too.

I agree that we need feedback from long-timers here,  on the decision
of whether to use proc_exit() for this purpose.

> > > IMO, calling the hook from proc_exit() is not a good design as
> > > proc_exit() is a generic code called from many places in the source
> > > code, even the simple code of kind  if(call_failed_conn_hook) {
> > > falied_conn_hook(params);} can come in the way of many exit code paths
> > > which is undesirable, and the likelihood of introducing new bugs may
> > > increase.
> >
> > Thanks for the feedback.
> >
> > What do you think about calling the hook only if the new global variable
> > is not equal to its default value (which would mean don't trigger the
> > hook)?
>
> IMO, that's not a good design as explained above. Why should the
> failed connection hook related code get hit for each and every
> proc_exit() call? Here, the code duplication i.e. the number of places
> the failed connection hook gets called mustn't be the reason to move
> that code to proc_exit().

I agree, it doesn't feel _clean_, having to maintain a global
variable, pass it to hook at exit, etc. But the alternative feels less
cleaner.

This hook needs to be called when the process has decided to exit, so
it makes sense to place this call in stack above proc_exit(), whose
sole job is to let the process die gracefully, and take care of things
on the way out.

There are quite a few places in core that leverage proc_exit()'s
facilities (by registering on_proc_exit callbacks), so an
extension/hook doing so wouldn't be out of the ordinary; (apparently
contrib/sepgsql has already set the precedent on an extension using
the on_proc_exit callback). Admittedly, in this case the core will be
managing and passing it the additional global variable needed to
record the connection failure reason, FCET_*.

If we agree that proc_exit() is a good place to place this call, then
this hook can be converted into a on_proc_exit callback. If the global
variable is exported, then the extension(s) can access it in the
callback to ascertain why the process is exiting, and proc_exit()
won't have to know anything special about the extension, or hook, or
the global variable.

The on_proc_exit callback method wouldn't work for the _exit() called
in StartupPacketTimeoutHandler(), so that will need to be handled
separately.

Best regards,
Gurjeet
http://Gurje.et

Re: Patch proposal: New hooks in the connection path

Gregory Stark (as CFM) <stark.cfm@gmail.com> — 2023-04-03T22:08:06Z

This looks like it was a good discussion -- last summer. But it
doesn't seem to be a patch under active development now.

It sounds like there were some design constraints that still need some
new ideas to solve and a new patch will be needed to address them.

Should this be marked Returned With Feedback?

-- 
Gregory Stark
As Commitfest Manager

Re: Patch proposal: New hooks in the connection path

Bertrand Drouvot <bertranddrouvot.pg@gmail.com> — 2023-04-04T10:11:55Z

Hi,

On 4/4/23 12:08 AM, Gregory Stark (as CFM) wrote:
> This looks like it was a good discussion -- last summer. But it
> doesn't seem to be a patch under active development now.
> 
> It sounds like there were some design constraints that still need some
> new ideas to solve and a new patch will be needed to address them.
> 
> Should this be marked Returned With Feedback?
> 

I just marked it as Returned With Feedback.

I may re-open it later on to resume the discussion or share
new ideas though.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com