Thread

  1. [multithreading] extension compatibility

    Robert Haas <robertmhaas@gmail.com> — 2024-06-05T20:05:26Z

    Hi,
    
    At 2024.pgconf.dev, Heikki did a session on multithreading PostgreSQL
    which I was unfortunately unable to attend due my involvement with
    another session, and then we had an unconference discussion which I
    was able to attend and at which I volunteered to have a look at a
    couple of tasks, including "Extension Marking System (marking
    extensions as thread-safe)". So in this email I'd like to (1) say a
    few things about multithreading for PostgreSQL in general, (2) spell
    out my understanding of the extension compatibility problem
    specifically, and then (3) discuss possible solutions to that problem.
    See also https://wiki.postgresql.org/wiki/Multithreading
    
    == Multithreading Generally ==
    
    I believe there is a consensus in the PostgreSQL developer community,
    or at least among committers, that a multi-threaded programming model
    would be superior to a multi-process programming model as we have now.
    I won't be surprised if a few people disagree with that as a general
    statement, and others may view it as better in theory but so difficult
    in practice as to be not worth doing, but I believe that the consensus
    is otherwise. I do understand that switching to threads introduces
    some new stability risks, which are not to be taken lightly, but it
    also opens the door to various performance improvements, and even
    functionality, that are not feasible today. I do not believe that it
    would be necessary, as has been alleged previously, to halt all other
    development for a lengthy period of time while such a conversion is
    undertaken, nor do I believe that the community would or should accept
    such a solution were someone to propose it. I do believe that there
    are some difficult problems to be solved in order to make it work at
    all, and I believe even more strongly that a good deal of follow-up
    work will be necessary to reap the potential benefits of such a
    change. I also believe that it's absolutely necessary that both models
    coexist side by side for a period of time. I think we will eventually
    want to abandon the multi-process model, because I think over time the
    benefits of using threads will accumulate until they are overwhelming
    and the process model will end up appearing to be an obstacle to
    progress. However, I don't think we'll be able to do that particularly
    soon, because I think it's going to take a while to fully stabilize
    the thread model even as far as the core code is concerned, and
    extensions will take even longer to catch up. I realize Heikki in
    particular is hoping for a quick transition; I don't see that as
    feasible, but like everything else about this, opinions are going to
    vary.
    
    Obligatory disclaimer: Everything above (and below) is just a
    statement of what I believe, and everyone is free to dispute it. As
    always, I cannot speak to objective truth, but I can tell you what I
    think.
    
    == The Extension Compatibility Problem ==
    
    I don't know yet whether we're going to end up with a system where the
    same build of PostgreSQL can produce processes or threads depending on
    configuration or whether it's going to be a build option, but I'm
    guessing the latter is more likely. Certainly, if an extension is
    assuming that its global variables are session-local and they suddenly
    become global to the cluster, chaos will ensue. The same is true for
    the core code, and will need to be solved by annotating global
    variables so that the appropriate ones can be made thread-local and
    the others can be given whatever treatment is appropriate considering
    how they are used. The details of how this annotation system will work
    are TBD, but the point for this email is that extension global
    variables, including file-level globals, will need the same kinds of
    annotations that we use in the core code in order to work. Other
    adjustments may also be needed.
    
    I think there are two severable problems here. One is that, if an
    extension is built for use with a non-threaded PostgreSQL, we
    shouldn't permit it to be used with a threaded PostgreSQL, even if the
    major version and other details are compatible. Hence, threading or
    the lack of it must become part of the data set up by PG_MODULE_MAGIC.
    Maybe this problem goes away if we decide that threads-vs-processes is
    a configuration option rather than a build-time option, but even then,
    we might still end up with a build-time option indicating whether
    threads are even a possibility, so I think it's pretty likely we need
    this in some form. If or when the process model eventually dies, then
    we can take this out again.
    
    The other problem is that we probably want a way for extensions to
    signal that they are believed to work with threading. It's a little
    bit debatable whether this is a good idea, because (1) some people are
    going to blindly state that their extension works fine with threading
    even if they haven't actually made the necessary changes and (2) one
    could simply declare that making an extension thread-ready is part of
    supporting whatever PostgreSQL release adds threading as an option and
    (3) one could also declare that extension authors should just document
    what they do or don't support rather than doing anything in code.
    However, I think it makes sense to try to make extensions fail to
    compile against a threaded PostgreSQL unless the extension declares
    that it supports such builds of PostgreSQL. I think that by doing
    this, we'll make it a LOT easier for packagers to find out what
    extensions still need updating. A packager could possibly do light
    testing of an extension and fail to miss the fact that the extension
    doesn't actually work properly against a threaded PostgreSQL, but you
    can't fail to notice a compile failure. There's still going to be some
    chaos because of (1), but I think we can mitigate that with good
    messaging: documentation, wiki pages, and blog posts explaining that
    this is coming and how to adapt to it can help a lot, IMHO.
    
    == Extension Compatibility Solutions ==
    
    The attached patch is a sketch of one possible approach: PostgreSQL
    signals whether it is multithreaded by defining or not defining
    PG_MULTITHREADING in pg_config_manual.h, and an extension signals
    thread-readiness by defining PG_THREADSAFE_EXTENSION before including
    any PostgreSQL headers other than postgres.h. If PostgreSQL is built
    multithreaded and the extension does not signal thread-safety, you get
    something like this:
    
    ../pgsql/src/test/modules/dummy_seclabel/dummy_seclabel.c:20:1: error:
    static assertion failed due to requirement '1 == 0': must define
    PG_THREADSAFE_EXTENSION or use unthreaded PostgreSQL
    PG_MODULE_MAGIC;
    
    I'm not entirely happy with this solution because the results are
    confusing if PG_THREADSAFE_EXTENSION is declared after including
    fmgr.h. Perhaps this can be adequately handled by documenting and
    demonstrating the right pattern, or maybe somebody has a better idea.
    
    Another idea I considered was to replace the PG_MODULE_MAGIC;
    declaration with something that allows for arguments, like
    PG_MODULE_MAGIC(.process_model = false, .thread_model = true). But on
    further reflection, that seems like the wrong thing. AFAICS, that's
    going to tell you at runtime about something that you really want to
    know at compile time. But this kind of idea might need more thought if
    we decide that the *same* build of PostgreSQL can either launch
    processes or threads per session, because then we'd to know which
    extensions were available in whichever mode applied to the current
    session.
    
    That's all I've got for today.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  2. Re: [multithreading] extension compatibility

    Tristan Partin <tristan@partin.io> — 2024-06-05T20:32:18Z

    On Wed Jun 5, 2024 at 3:05 PM CDT, Robert Haas wrote:
    > ...
    >
    > == Extension Compatibility Solutions ==
    >
    > The attached patch is a sketch of one possible approach: PostgreSQL
    > signals whether it is multithreaded by defining or not defining
    > PG_MULTITHREADING in pg_config_manual.h, and an extension signals
    > thread-readiness by defining PG_THREADSAFE_EXTENSION before including
    > any PostgreSQL headers other than postgres.h. If PostgreSQL is built
    > multithreaded and the extension does not signal thread-safety, you get
    > something like this:
    >
    > ../pgsql/src/test/modules/dummy_seclabel/dummy_seclabel.c:20:1: error:
    > static assertion failed due to requirement '1 == 0': must define
    > PG_THREADSAFE_EXTENSION or use unthreaded PostgreSQL
    > PG_MODULE_MAGIC;
    >
    > I'm not entirely happy with this solution because the results are
    > confusing if PG_THREADSAFE_EXTENSION is declared after including
    > fmgr.h. Perhaps this can be adequately handled by documenting and
    > demonstrating the right pattern, or maybe somebody has a better idea.
    >
    > Another idea I considered was to replace the PG_MODULE_MAGIC;
    > declaration with something that allows for arguments, like
    > PG_MODULE_MAGIC(.process_model = false, .thread_model = true). But on
    > further reflection, that seems like the wrong thing. AFAICS, that's
    > going to tell you at runtime about something that you really want to
    > know at compile time. But this kind of idea might need more thought if
    > we decide that the *same* build of PostgreSQL can either launch
    > processes or threads per session, because then we'd to know which
    > extensions were available in whichever mode applied to the current
    > session.
    
    Not entirely sure how I feel about the approach you've taken, but here 
    is a patch that Heikki and I put together for extension compatibility. 
    It's not a build time solution, but a runtime solution. Instead of 
    PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There 
    is a GUC called `multithreaded` which controls the variable 
    IsMultithreaded. We operated under the assumption that being able to 
    toggle multithreading and multi-processing without recompiling has 
    value.
    
    -- 
    Tristan Partin
    https://tristan.partin.io
    
  3. Re: [multithreading] extension compatibility

    Jelte Fennema-Nio <postgres@jeltef.nl> — 2024-06-05T20:45:23Z

    On Wed, 5 Jun 2024 at 22:05, Robert Haas <robertmhaas@gmail.com> wrote:
    > The attached patch is a sketch of one possible approach: PostgreSQL
    > signals whether it is multithreaded by defining or not defining
    > PG_MULTITHREADING in pg_config_manual.h, and an extension signals
    > thread-readiness by defining PG_THREADSAFE_EXTENSION before including
    > any PostgreSQL headers other than postgres.h.
    
    My first gut-reaction: It seems kinda annoying to have to do this for
    every c that you use to build your extension, e.g. citus or postgis
    have a ton of those.
    
    PG_MODULE_MAGIC seems like a better fit imho.
    
    If we really want a compile time failure, then I think I'd prefer to
    have a new postgres.h file (e.g. postgres-thread.h) that you would
    include instead of plain postgres.h. Basically this file could then
    contain the below two lines:
    
    #include "postgres.h"
    #define PG_THREADSAFE_EXTENSION    1
    
    
    
    
  4. Re: [multithreading] extension compatibility

    Robert Haas <robertmhaas@gmail.com> — 2024-06-05T20:55:58Z

    On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin <tristan@partin.io> wrote:
    > Not entirely sure how I feel about the approach you've taken, but here
    > is a patch that Heikki and I put together for extension compatibility.
    > It's not a build time solution, but a runtime solution. Instead of
    > PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
    > is a GUC called `multithreaded` which controls the variable
    > IsMultithreaded. We operated under the assumption that being able to
    > toggle multithreading and multi-processing without recompiling has
    > value.
    
    That's interesting, because I thought Heikki was against having a
    runtime toggle.
    
    I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
    great as long as we only ever need the PG_MODULE_MAGIC line to signal
    one bit of information, but as soon as you get to two bits it's pretty
    questionable, and anything more than two bits is insane. If we want to
    do something with the PG_MODULE_MAGIC line, I think it should involve
    options-passing of some form rather than just having an alternate
    macro name.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  5. Re: [multithreading] extension compatibility

    Tristan Partin <tristan@partin.io> — 2024-06-05T22:21:46Z

    On Wed Jun 5, 2024 at 3:56 PM CDT, Robert Haas wrote:
    > On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin <tristan@partin.io> wrote:
    > > Not entirely sure how I feel about the approach you've taken, but here
    > > is a patch that Heikki and I put together for extension compatibility.
    > > It's not a build time solution, but a runtime solution. Instead of
    > > PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
    > > is a GUC called `multithreaded` which controls the variable
    > > IsMultithreaded. We operated under the assumption that being able to
    > > toggle multithreading and multi-processing without recompiling has
    > > value.
    >
    > That's interesting, because I thought Heikki was against having a
    > runtime toggle.
    >
    > I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
    > great as long as we only ever need the PG_MODULE_MAGIC line to signal
    > one bit of information, but as soon as you get to two bits it's pretty
    > questionable, and anything more than two bits is insane. If we want to
    > do something with the PG_MODULE_MAGIC line, I think it should involve
    > options-passing of some form rather than just having an alternate
    > macro name.
    
    I agree that this method doesn't lend itself to future extensibility.
    
    -- 
    Tristan Partin
    https://tristan.partin.io
    
    
    
    
  6. Re: [multithreading] extension compatibility

    Heikki Linnakangas <hlinnaka@iki.fi> — 2024-06-06T00:01:12Z

    On 05/06/2024 23:55, Robert Haas wrote:
    > On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin <tristan@partin.io> wrote:
    >> Not entirely sure how I feel about the approach you've taken, but here
    >> is a patch that Heikki and I put together for extension compatibility.
    >> It's not a build time solution, but a runtime solution. Instead of
    >> PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
    >> is a GUC called `multithreaded` which controls the variable
    >> IsMultithreaded. We operated under the assumption that being able to
    >> toggle multithreading and multi-processing without recompiling has
    >> value.
    > 
    > That's interesting, because I thought Heikki was against having a
    > runtime toggle.
    
    I'm very much in favor of a runtime toggle. To be precise, a 
    PGC_POSTMASTER setting. We'll get a lot more testing if you can easily 
    turn it on/off, and so far I haven't seen anything that would require it 
    to be a compile time option.
    
    > I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
    > great as long as we only ever need the PG_MODULE_MAGIC line to signal
    > one bit of information, but as soon as you get to two bits it's pretty
    > questionable, and anything more than two bits is insane. If we want to
    > do something with the PG_MODULE_MAGIC line, I think it should involve
    > options-passing of some form rather than just having an alternate
    > macro name.
    
    +1, that would be nicer.
    
    -- 
    Heikki Linnakangas
    Neon (https://neon.tech)
    
    
    
    
    
  7. Re: [multithreading] extension compatibility

    Robert Haas <robertmhaas@gmail.com> — 2024-06-06T01:10:01Z

    On Wed, Jun 5, 2024 at 8:01 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    > I'm very much in favor of a runtime toggle. To be precise, a
    > PGC_POSTMASTER setting. We'll get a lot more testing if you can easily
    > turn it on/off, and so far I haven't seen anything that would require it
    > to be a compile time option.
    
    I was thinking about global variable annotations. If someone wants to
    build without multithreading, I think that they won't want to still
    end up with a ton of variables being changed to thread-local. So I
    think there has to be a build-time option controlling whether this
    build supports threading. I suspect there will be other people who
    want to just shut all of this experimental code off, which is probably
    going to be a second driver for a build-time toggle. But even with
    that, we can still have a GUC controlling whether threading is
    actually used. Does that make sense to you?
    
    Supposing it does, then how does the extension-marking system need to
    work? I suppose in this world we don't want any build failures: you're
    allowed to build a non-thread-aware extension against a
    threading-capable PostgreSQL; you're just not allowed to load the
    resulting extension when the server is in threading mode.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  8. Re: [multithreading] extension compatibility

    Andres Freund <andres@anarazel.de> — 2024-06-06T01:50:32Z

    Hi,
    
    On 2024-06-05 21:10:01 -0400, Robert Haas wrote:
    > On Wed, Jun 5, 2024 at 8:01 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    > > I'm very much in favor of a runtime toggle. To be precise, a
    > > PGC_POSTMASTER setting. We'll get a lot more testing if you can easily
    > > turn it on/off, and so far I haven't seen anything that would require it
    > > to be a compile time option.
    > 
    > I was thinking about global variable annotations. If someone wants to
    > build without multithreading, I think that they won't want to still
    > end up with a ton of variables being changed to thread-local.
    
    Depending on the architecture / ABI / compiler options it's often not
    meaningfully more expensive to access a thread local variable than a "normal"
    variable.
    
    I think these days it's e.g. more expensive on x86-64 windows, but not
    linux. On arm the overhead of TLS is more noticeable, across platforms,
    afaict.
    
    Example compiler output for x86-64 and armv8:
    https://godbolt.org/z/K369eG5MM
    
    Cycle analysis or linux x86-64 output:
    https://godbolt.org/z/KK57vM1of
    
    This shows that for the linux x86-64 case there's no difference in efficiency
    between the tls/non-tls case.
    
    The reason it's so fast on x86-64 linux is that they reused one of the "old"
    segment registers to serve as the index register differing between each
    thread.  For x86-64 code, most code is compiled position independent, and
    *also* uses an indexed mode (but relative to the instruction pointer).
    
    
    I think we might be able to gain some small performance benefits via the
    annotations, which actualy might make it viable to just apply the annotations
    regardless of using threads or not.
    
    
    Greetings,
    
    Andres Freund
    
    
    
    
  9. Re: [multithreading] extension compatibility

    Robert Haas <robertmhaas@gmail.com> — 2024-06-06T01:59:42Z

    On Wed, Jun 5, 2024 at 9:50 PM Andres Freund <andres@anarazel.de> wrote:
    > Depending on the architecture / ABI / compiler options it's often not
    > meaningfully more expensive to access a thread local variable than a "normal"
    > variable.
    >
    > I think these days it's e.g. more expensive on x86-64 windows, but not
    > linux. On arm the overhead of TLS is more noticeable, across platforms,
    > afaict.
    
    I mean, to me, this still sounds like we want multithreading to be a
    build-time option.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  10. Re: [multithreading] extension compatibility

    Andres Freund <andres@anarazel.de> — 2024-06-06T02:09:36Z

    Hi,
    
    On 2024-06-05 21:59:42 -0400, Robert Haas wrote:
    > On Wed, Jun 5, 2024 at 9:50 PM Andres Freund <andres@anarazel.de> wrote:
    > > Depending on the architecture / ABI / compiler options it's often not
    > > meaningfully more expensive to access a thread local variable than a "normal"
    > > variable.
    > >
    > > I think these days it's e.g. more expensive on x86-64 windows, but not
    > > linux. On arm the overhead of TLS is more noticeable, across platforms,
    > > afaict.
    > 
    > I mean, to me, this still sounds like we want multithreading to be a
    > build-time option.
    
    Maybe. I think shipping a mode where users can fairly simply toggle between
    threaded and process mode will allow us to get this stable a *lot* quicker
    than if we distribute two builds. Most users don't build from source, distros
    will have to pick the mode. If they don't choose threaded mode, we'll not find
    problems. If they do choose threaded mode, we can't ask users to switch to a
    process based mode to check if the problem is related.
    
    We have been talking in a bunch of threads about having a mode where the main
    postgres binary chooses a build optimized for the current architecture, to be
    able to use SIMD instructions without a runtime check/dispatch. I guess we
    could add threadedness to such a matrix...
    
    Greetings,
    
    Andres Freund
    
    
    
    
  11. Re: [multithreading] extension compatibility

    Robert Haas <robertmhaas@gmail.com> — 2024-06-06T02:47:26Z

    On Wed, Jun 5, 2024 at 10:09 PM Andres Freund <andres@anarazel.de> wrote:
    > Maybe. I think shipping a mode where users can fairly simply toggle between
    > threaded and process mode will allow us to get this stable a *lot* quicker
    > than if we distribute two builds. Most users don't build from source, distros
    > will have to pick the mode. If they don't choose threaded mode, we'll not find
    > problems. If they do choose threaded mode, we can't ask users to switch to a
    > process based mode to check if the problem is related.
    
    I don't believe that being coercive here is the right approach. I
    think distros see the value in compiling with as many things turned on
    as possible; when they ship with something turned off, it's because
    it's unstable or introduces weird dependencies or has some other
    disadvantage.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  12. Re: [multithreading] extension compatibility

    Heikki Linnakangas <hlinnaka@iki.fi> — 2024-06-06T09:00:38Z

    On 06/06/2024 05:47, Robert Haas wrote:
    > On Wed, Jun 5, 2024 at 10:09 PM Andres Freund <andres@anarazel.de> wrote:
    >> Maybe. I think shipping a mode where users can fairly simply toggle between
    >> threaded and process mode will allow us to get this stable a *lot* quicker
    >> than if we distribute two builds. Most users don't build from source, distros
    >> will have to pick the mode. If they don't choose threaded mode, we'll not find
    >> problems. If they do choose threaded mode, we can't ask users to switch to a
    >> process based mode to check if the problem is related.
    > 
    > I don't believe that being coercive here is the right approach. I
    > think distros see the value in compiling with as many things turned on
    > as possible; when they ship with something turned off, it's because
    > it's unstable or introduces weird dependencies or has some other
    > disadvantage.
    
    I presume there's no harm in building with multithreading support. If 
    you don't want to use it, put "multithreading=off" in your config file 
    (which will presumably be the default for a while).
    
    If we're worried about the performance impact of thread-local variables 
    in particular, we can try to measure that. I don't think it's material 
    though.
    
    If there is some material harm from compiling with multithreading 
    support even if you're not using it, we should try to fix that. I'm not 
    dead set against having a compile-time option, but I don't see the need 
    for it at the moment.
    
    -- 
    Heikki Linnakangas
    Neon (https://neon.tech)
    
    
    
    
    
  13. Re: [multithreading] extension compatibility

    Robert Haas <robertmhaas@gmail.com> — 2024-06-06T14:23:46Z

    On Thu, Jun 6, 2024 at 5:00 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    > If there is some material harm from compiling with multithreading
    > support even if you're not using it, we should try to fix that. I'm not
    > dead set against having a compile-time option, but I don't see the need
    > for it at the moment.
    
    Well, OK, so it sounds like I'm outvoted, at least at the moment.
    Maybe that will change as more people vote, but for now, that's where
    we are. Given that, I suppose we want something more like Tristan's
    patch, but with a more extensible syntax. Does that sound right?
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  14. Re: [multithreading] extension compatibility

    Heikki Linnakangas <hlinnaka@iki.fi> — 2024-06-06T14:32:12Z

    On 06/06/2024 17:23, Robert Haas wrote:
    > On Thu, Jun 6, 2024 at 5:00 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    >> If there is some material harm from compiling with multithreading
    >> support even if you're not using it, we should try to fix that. I'm not
    >> dead set against having a compile-time option, but I don't see the need
    >> for it at the moment.
    > 
    > Well, OK, so it sounds like I'm outvoted, at least at the moment.
    > Maybe that will change as more people vote, but for now, that's where
    > we are. Given that, I suppose we want something more like Tristan's
    > patch, but with a more extensible syntax. Does that sound right?
    
    +1
    
    -- 
    Heikki Linnakangas
    Neon (https://neon.tech)
    
    
    
    
    
  15. Re: [multithreading] extension compatibility

    Robert Haas <robertmhaas@gmail.com> — 2024-06-13T19:23:25Z

    On Thu, Jun 6, 2024 at 10:32 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    > > Well, OK, so it sounds like I'm outvoted, at least at the moment.
    > > Maybe that will change as more people vote, but for now, that's where
    > > we are. Given that, I suppose we want something more like Tristan's
    > > patch, but with a more extensible syntax. Does that sound right?
    >
    > +1
    
    So, how shall we proceed here? Tristan, do you want to update your
    patch based on this feedback?
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com