Thread

  1. Fix some ubsan/asan related issues

    Tristan Partin <tristan@neon.tech> — 2024-01-30T15:57:24Z

    Patch 1:
    
    Passing NULL as a second argument to memcpy breaks ubsan, and there 
    didn't seem to be anything preventing that in the LogLogicalMessage() 
    codepath. Here is a preventative measure in LogLogicalMessage() and an 
    Assert() in CopyXLogRecordToWAL().
    
    Patch 2:
    
    Support building with -Db_sanitize=address in Meson. Various executables 
    are leaky which can cause the builds to fail and tests to fail, when we 
    are fine leaking this memory.
    
    Personally, I am a big stickler for always freeing my memory in 
    executables even if it gets released at program exit because it makes 
    the leak sanitizer much more effective. However (!), I am not here to 
    change the philosophy of memory management in one-off executables, so 
    I have just silenced memory leaks in various executables for now.
    
    Patch 3:
    
    THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION!
    
    In my effort to try to see if the test suite would pass with asan 
    enabled, I ran into a max_stack_depth issue. I tried maxing it out 
    (hence, the patch), but that still didn't remedy my issue. I tried to 
    look on the list for any relevant emails, but nothing turned up. Maybe 
    others have not attempted this before? Seems doubtful.
    
    Not entirely sure how to fix this issue. I personally find asan 
    extremely effective, even more than valgrind, so it would be great if 
    I could run Postgres with asan enabled to catch various stupid C issues 
    I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres 
    just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't 
    seem to leave enough stack space for Postgres.
    
    ---
    
    I would like to see patch 1 reviewed and committed. Patch 2 honestly 
    doesn't matter unless asan support can be fixed. I can also add a patch 
    that errors out the Meson build if asan support is requested. That way 
    others don't spend time heading down a dead end.
    
    -- 
    Tristan Partin
    Neon (https://neon.tech)
    
    
    
    
  2. Re: Fix some ubsan/asan related issues

    Tristan Partin <tristan@neon.tech> — 2024-01-30T15:59:25Z

    Spend so much time writing out the email, I once again forget 
    attachments...UGH.
    
    -- 
    Tristan Partin
    Neon (https://neon.tech)
    
  3. Re: Fix some ubsan/asan related issues

    Heikki Linnakangas <hlinnaka@iki.fi> — 2024-01-30T20:05:28Z

    On 30/01/2024 17:57, Tristan Partin wrote:
    > Patch 1:
    > 
    > Passing NULL as a second argument to memcpy breaks ubsan, and there
    > didn't seem to be anything preventing that in the LogLogicalMessage()
    > codepath. Here is a preventative measure in LogLogicalMessage() and an
    > Assert() in CopyXLogRecordToWAL().
    
    For the audience: We ran into this one with the neon extension. The 
    solution is to call LogLogicalMessage("", 0, ...) instead of 
    LogLogicalMessage(NULL, 0, ...). . But it's true that it's pointlessfor 
    LogLogicalMessage to call XLogRegisterData() if the message is empty. If 
    we do this, we should check for 'size == 0' rather than 'message == NULL'.
    
    > Patch 2:
    > 
    > Support building with -Db_sanitize=address in Meson. Various executables
    > are leaky which can cause the builds to fail and tests to fail, when we
    > are fine leaking this memory.
    > 
    > Personally, I am a big stickler for always freeing my memory in
    > executables even if it gets released at program exit because it makes
    > the leak sanitizer much more effective. However (!), I am not here to
    > change the philosophy of memory management in one-off executables, so
    > I have just silenced memory leaks in various executables for now.
    > 
    > Patch 3:
    > 
    > THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION!
    > 
    > In my effort to try to see if the test suite would pass with asan
    > enabled, I ran into a max_stack_depth issue. I tried maxing it out
    > (hence, the patch), but that still didn't remedy my issue. I tried to
    > look on the list for any relevant emails, but nothing turned up. Maybe
    > others have not attempted this before? Seems doubtful.
    > 
    > Not entirely sure how to fix this issue. I personally find asan
    > extremely effective, even more than valgrind, so it would be great if
    > I could run Postgres with asan enabled to catch various stupid C issues
    > I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
    > just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
    > seem to leave enough stack space for Postgres.
    
    I'm a bit confused by these. We already run with sanitizer in the cirrus 
    CI. What does this enable that we're not already doing?
    
    -- 
    Heikki Linnakangas
    Neon (https://neon.tech)
    
    
    
    
    
  4. Re: Fix some ubsan/asan related issues

    Andres Freund <andres@anarazel.de> — 2024-01-30T21:23:04Z

    Hi,
    
    On 2024-01-30 22:05:28 +0200, Heikki Linnakangas wrote:
    > On 30/01/2024 17:57, Tristan Partin wrote:
    > > In my effort to try to see if the test suite would pass with asan
    > > enabled, I ran into a max_stack_depth issue. I tried maxing it out
    > > (hence, the patch), but that still didn't remedy my issue. I tried to
    > > look on the list for any relevant emails, but nothing turned up. Maybe
    > > others have not attempted this before? Seems doubtful.
    > > 
    > > Not entirely sure how to fix this issue. I personally find asan
    > > extremely effective, even more than valgrind, so it would be great if
    > > I could run Postgres with asan enabled to catch various stupid C issues
    > > I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
    > > just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
    > > seem to leave enough stack space for Postgres.
    > 
    > I'm a bit confused by these. We already run with sanitizer in the cirrus CI.
    > What does this enable that we're not already doing?
    
    The reason asan fails is that it uses a "shadow stack" to track stack variable
    lifetimes. These confuse our stack depth check. CI doesn't have the issue
    because the compiler doesn't yet enable the feature, locally I get around it
    by using ASAN_OPTIONS=detect_stack_use_after_return=0:...
    
    The checks are actually quite useful, so making our stack depth check work
    with asan would be worthwhile.
    
    I discussed this in a bit more in
    https://postgr.es/m/20231129193920.4vphw7dqxjzf5v5b%40awork3.anarazel.de
    
    Greetings,
    
    Andres Freund
    
    
    
    
  5. Re: Fix some ubsan/asan related issues

    Andres Freund <andres@anarazel.de> — 2024-01-30T21:58:12Z

    Hi,
    
    On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
    > From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
    > From: Tristan Partin <tristan@neon.tech>
    > Date: Mon, 29 Jan 2024 18:03:39 -0600
    > Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
    >  NULL
    
    > If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
    > the API contract of memcpy in glibc. The two pointer arguments are
    > marked as nonnull, even in the event the amount to copy is 0 bytes.
    
    It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
    that something useful?
    
    
    > From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
    > From: Tristan Partin <tristan@neon.tech>
    > Date: Wed, 24 Jan 2024 17:07:01 -0600
    > Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address
    > 
    > The ecpg is parser is extremely leaky, so we need to silence leak
    > detection.
    
    This does stuff beyond epcg...
    
    
    > +if get_option('b_sanitize').contains('address')
    > +  cdata.set('USE_ADDRESS_SANITIZER', 1)
    > +endif
    >  
    >  ###############################################################
    >  # NLS / Gettext
    > diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
    > index ac409b0006..e18e716d9c 100644
    > --- a/src/bin/initdb/initdb.c
    > +++ b/src/bin/initdb/initdb.c
    > @@ -338,6 +338,17 @@ do { \
    >  		output_failed = true, output_errno = errno; \
    >  } while (0)
    >  
    > +#ifdef USE_ADDRESS_SANITIZER
    
    When asan is used  __SANITIZE_ADDRESS__ is defined, so we don't need to
    implement this ourselves.
    
    
    > +const char *__asan_default_options(void);
    > +
    > +const char *__asan_default_options(void)
    > +{
    > +	return "detect_leaks=0";
    > +}
    > +
    > +#endif
    
    Wonder if we should move this into some static library and link it into all
    binaries that don't want leak detection? It doesn't seem great to have to
    adjust this in a bunch of files if we want to adjust the options...
    
    Greetings,
    
    Andres Freund
    
    
    
    
  6. Re: Fix some ubsan/asan related issues

    Alexander Lakhin <exclusion@gmail.com> — 2024-01-31T04:00:00Z

    Hello,
    
    30.01.2024 18:57, Tristan Partin wrote:
    > Patch 1:
    >
    > Passing NULL as a second argument to memcpy breaks ubsan, ...
    
    Maybe you would like to fix also one more similar place, reached with:
    create extension xml2;
    select xslt_process('<x/>',
    $$<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="@*|node()">
    </xsl:template>
    </xsl:stylesheet>$$);
    
    varlena.c:201:26: runtime error: null pointer passed as argument 2, which is declared to never be null
    
    There is also an issue with pg_bsd_indent, I stumble upon when doing
    `make check-world` with the sanitizers enabled:
    https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com
    
    31.01.2024 00:23, Andres Freund wrote:
    > The reason asan fails is that it uses a "shadow stack" to track stack variable
    > lifetimes. These confuse our stack depth check. CI doesn't have the issue
    > because the compiler doesn't yet enable the feature, locally I get around it
    > by using ASAN_OPTIONS=detect_stack_use_after_return=0:...
    
    Even with detect_stack_use_after_return=0, clang-18's asan makes the test
    012_subtransactions.pl fail:
    2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: SELECT hs_subxids(201);
    2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth limit exceeded
    2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter max_stack_depth 
    (currently 2048kB), after ensuring the platform's stack depth limit is adequate.
    
    (All the other tests pass.)
    Though the same test passes when I use clang-16.
    
    Best regards,
    Alexander
  7. Re: Fix some ubsan/asan related issues

    Tristan Partin <tristan@neon.tech> — 2024-01-31T18:16:52Z

    On Tue Jan 30, 2024 at 10:00 PM CST, Alexander Lakhin wrote:
    > Hello,
    >
    > 30.01.2024 18:57, Tristan Partin wrote:
    > > Patch 1:
    > >
    > > Passing NULL as a second argument to memcpy breaks ubsan, ...
    >
    > Maybe you would like to fix also one more similar place, reached with:
    > create extension xml2;
    > select xslt_process('<x/>',
    > $$<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    > <xsl:template match="@*|node()">
    > </xsl:template>
    > </xsl:stylesheet>$$);
    >
    > varlena.c:201:26: runtime error: null pointer passed as argument 2, which is declared to never be null
    >
    > There is also an issue with pg_bsd_indent, I stumble upon when doing
    > `make check-world` with the sanitizers enabled:
    > https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com
    >
    > 31.01.2024 00:23, Andres Freund wrote:
    > > The reason asan fails is that it uses a "shadow stack" to track stack variable
    > > lifetimes. These confuse our stack depth check. CI doesn't have the issue
    > > because the compiler doesn't yet enable the feature, locally I get around it
    > > by using ASAN_OPTIONS=detect_stack_use_after_return=0:...
    >
    > Even with detect_stack_use_after_return=0, clang-18's asan makes the test
    > 012_subtransactions.pl fail:
    > 2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: SELECT hs_subxids(201);
    > 2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth limit exceeded
    > 2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter max_stack_depth 
    > (currently 2048kB), after ensuring the platform's stack depth limit is adequate.
    >
    > (All the other tests pass.)
    > Though the same test passes when I use clang-16.
    
    Thanks Alexander! I will try and take a look at these.
    
    -- 
    Tristan Partin
    Neon (https://neon.tech)
    
    
    
    
  8. Re: Fix some ubsan/asan related issues

    Tristan Partin <tristan@neon.tech> — 2024-02-06T03:53:40Z

    On Tue Jan 30, 2024 at 3:58 PM CST, Andres Freund wrote:
    > Hi,
    >
    > On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
    > > From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
    > > From: Tristan Partin <tristan@neon.tech>
    > > Date: Mon, 29 Jan 2024 18:03:39 -0600
    > > Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
    > >  NULL
    >
    > > If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
    > > the API contract of memcpy in glibc. The two pointer arguments are
    > > marked as nonnull, even in the event the amount to copy is 0 bytes.
    >
    > It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
    > that something useful?
    
    Dropped. Will change on the Neon side. Should we add an assert 
    somewhere for good measure? Near the memcpy in question?
    
    > > From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
    > > From: Tristan Partin <tristan@neon.tech>
    > > Date: Wed, 24 Jan 2024 17:07:01 -0600
    > > Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address
    > > 
    > > The ecpg is parser is extremely leaky, so we need to silence leak
    > > detection.
    >
    > This does stuff beyond epcg...
    
    Dropped.
    
    > > +if get_option('b_sanitize').contains('address')
    > > +  cdata.set('USE_ADDRESS_SANITIZER', 1)
    > > +endif
    > >  
    > >  ###############################################################
    > >  # NLS / Gettext
    > > diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
    > > index ac409b0006..e18e716d9c 100644
    > > --- a/src/bin/initdb/initdb.c
    > > +++ b/src/bin/initdb/initdb.c
    > > @@ -338,6 +338,17 @@ do { \
    > >  		output_failed = true, output_errno = errno; \
    > >  } while (0)
    > >  
    > > +#ifdef USE_ADDRESS_SANITIZER
    >
    > When asan is used  __SANITIZE_ADDRESS__ is defined, so we don't need to
    > implement this ourselves.
    
    Thanks!
    
    > > +const char *__asan_default_options(void);
    > > +
    > > +const char *__asan_default_options(void)
    > > +{
    > > +	return "detect_leaks=0";
    > > +}
    > > +
    > > +#endif
    >
    > Wonder if we should move this into some static library and link it into all
    > binaries that don't want leak detection? It doesn't seem great to have to
    > adjust this in a bunch of files if we want to adjust the options...
    
    See attached patches. Here is what I found to be necessary to get 
    a -Db_sanitize=address,undefined build to successfully make it through 
    tests. I do have a few concerns about the patch.
    
    1. For whatever reason, __SANITIZE_LEAK__ is not defined when the leak 
       sanitizer is enabled. So you will see me use this, to make some 
       include directives work. I don't like this as a final solution 
       because someone could use -fsanitize=leak.
    2. I tracked down what seems to be a valid leak in adt/xml.c. Attached 
       (test.sql) is a fairly minimal reproduction of what is needed to show 
       the leak. I didn't spend too much time tracking it down. Might get to 
       it later, who knows. Below you will find the backtrace, and whoever 
       wants to try their hand at fixing it will need to comment out 
       xmlNewNode in the leak.supp file.
    3. I don't love the new library name. Maybe it should be name more lsan 
       specific.
    4. Should pg_attribute_no_asan be renamed to 
       pg_attribute_no_sanitize_address? That would match 
       pg_attribute_no_sanitize_alignment.
    
    I will also attach a Meson test log for good measure. I didn't try 
    testing anything that requires PG_TEST_EXTRA, but I suspect that 
    everything will be fine.
    
    Alexander, I haven't yet gotten to the things you pointed out in the 
    sibling thread.
    
    ==221848==ERROR: LeakSanitizer: detected memory leaks
    
    Direct leak of 120 byte(s) in 1 object(s) allocated from:
        #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
        #1 0x7fac4a48427d in xmlNewNode (/lib64/libxml2.so.2+0x5d27d) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
        #2 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
        #3 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
        #4 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
        #5 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
        #6 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
        #7 0x109655d in ExecProject ../src/include/executor/executor.h:389
        #8 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
        #9 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
        #10 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
        #11 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
        #12 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
        #13 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
        #14 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
        #15 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
        #16 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
        #17 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
        #18 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
        #19 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
        #20 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
        #21 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
        #22 0x11a5f67 in main ../src/backend/main/main.c:198
        #23 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
        #24 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
        #25 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)
    
    Indirect leak of 13 byte(s) in 1 object(s) allocated from:
        #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
        #1 0x7fac4a4e106f in xmlStrndup (/lib64/libxml2.so.2+0xba06f) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
        #2 0x7fac4a4842c0 in xmlNewNode (/lib64/libxml2.so.2+0x5d2c0) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
        #3 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
        #4 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
        #5 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
        #6 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
        #7 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
        #8 0x109655d in ExecProject ../src/include/executor/executor.h:389
        #9 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
        #10 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
        #11 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
        #12 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
        #13 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
        #14 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
        #15 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
        #16 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
        #17 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
        #18 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
        #19 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
        #20 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
        #21 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
        #22 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
        #23 0x11a5f67 in main ../src/backend/main/main.c:198
        #24 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
        #25 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
        #26 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)
    
    SUMMARY: AddressSanitizer: 133 byte(s) leaked in 2 allocation(s).
    
    -- 
    Tristan Partin
    Neon (https://neon.tech)
    
  9. Re: Fix some ubsan/asan related issues

    Junwang Zhao <zhjwpku@gmail.com> — 2024-09-16T13:29:21Z

    Hi Tristan,
    
    On Tue, Feb 6, 2024 at 11:53 AM Tristan Partin <tristan@neon.tech> wrote:
    >
    > On Tue Jan 30, 2024 at 3:58 PM CST, Andres Freund wrote:
    > > Hi,
    > >
    > > On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
    > > > From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
    > > > From: Tristan Partin <tristan@neon.tech>
    > > > Date: Mon, 29 Jan 2024 18:03:39 -0600
    > > > Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
    > > >  NULL
    > >
    > > > If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
    > > > the API contract of memcpy in glibc. The two pointer arguments are
    > > > marked as nonnull, even in the event the amount to copy is 0 bytes.
    > >
    > > It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
    > > that something useful?
    >
    > Dropped. Will change on the Neon side. Should we add an assert
    > somewhere for good measure? Near the memcpy in question?
    >
    > > > From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
    > > > From: Tristan Partin <tristan@neon.tech>
    > > > Date: Wed, 24 Jan 2024 17:07:01 -0600
    > > > Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address
    > > >
    > > > The ecpg is parser is extremely leaky, so we need to silence leak
    > > > detection.
    > >
    > > This does stuff beyond epcg...
    >
    > Dropped.
    >
    > > > +if get_option('b_sanitize').contains('address')
    > > > +  cdata.set('USE_ADDRESS_SANITIZER', 1)
    > > > +endif
    > > >
    > > >  ###############################################################
    > > >  # NLS / Gettext
    > > > diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
    > > > index ac409b0006..e18e716d9c 100644
    > > > --- a/src/bin/initdb/initdb.c
    > > > +++ b/src/bin/initdb/initdb.c
    > > > @@ -338,6 +338,17 @@ do { \
    > > >             output_failed = true, output_errno = errno; \
    > > >  } while (0)
    > > >
    > > > +#ifdef USE_ADDRESS_SANITIZER
    > >
    > > When asan is used  __SANITIZE_ADDRESS__ is defined, so we don't need to
    > > implement this ourselves.
    >
    > Thanks!
    >
    > > > +const char *__asan_default_options(void);
    > > > +
    > > > +const char *__asan_default_options(void)
    > > > +{
    > > > +   return "detect_leaks=0";
    > > > +}
    > > > +
    > > > +#endif
    > >
    > > Wonder if we should move this into some static library and link it into all
    > > binaries that don't want leak detection? It doesn't seem great to have to
    > > adjust this in a bunch of files if we want to adjust the options...
    >
    > See attached patches. Here is what I found to be necessary to get
    > a -Db_sanitize=address,undefined build to successfully make it through
    > tests. I do have a few concerns about the patch.
    >
    > 1. For whatever reason, __SANITIZE_LEAK__ is not defined when the leak
    >    sanitizer is enabled. So you will see me use this, to make some
    >    include directives work. I don't like this as a final solution
    >    because someone could use -fsanitize=leak.
    > 2. I tracked down what seems to be a valid leak in adt/xml.c. Attached
    >    (test.sql) is a fairly minimal reproduction of what is needed to show
    >    the leak. I didn't spend too much time tracking it down. Might get to
    >    it later, who knows. Below you will find the backtrace, and whoever
    >    wants to try their hand at fixing it will need to comment out
    >    xmlNewNode in the leak.supp file.
    > 3. I don't love the new library name. Maybe it should be name more lsan
    >    specific.
    > 4. Should pg_attribute_no_asan be renamed to
    >    pg_attribute_no_sanitize_address? That would match
    >    pg_attribute_no_sanitize_alignment.
    >
    > I will also attach a Meson test log for good measure. I didn't try
    > testing anything that requires PG_TEST_EXTRA, but I suspect that
    > everything will be fine.
    >
    > Alexander, I haven't yet gotten to the things you pointed out in the
    > sibling thread.
    >
    > ==221848==ERROR: LeakSanitizer: detected memory leaks
    >
    > Direct leak of 120 byte(s) in 1 object(s) allocated from:
    >     #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
    >     #1 0x7fac4a48427d in xmlNewNode (/lib64/libxml2.so.2+0x5d27d) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
    >     #2 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
    >     #3 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
    >     #4 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
    >     #5 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
    >     #6 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
    >     #7 0x109655d in ExecProject ../src/include/executor/executor.h:389
    >     #8 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
    >     #9 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
    >     #10 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
    >     #11 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
    >     #12 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
    >     #13 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
    >     #14 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
    >     #15 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
    >     #16 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
    >     #17 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
    >     #18 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
    >     #19 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
    >     #20 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
    >     #21 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
    >     #22 0x11a5f67 in main ../src/backend/main/main.c:198
    >     #23 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    >     #24 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    >     #25 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)
    >
    > Indirect leak of 13 byte(s) in 1 object(s) allocated from:
    >     #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
    >     #1 0x7fac4a4e106f in xmlStrndup (/lib64/libxml2.so.2+0xba06f) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
    >     #2 0x7fac4a4842c0 in xmlNewNode (/lib64/libxml2.so.2+0x5d2c0) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
    >     #3 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
    >     #4 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
    >     #5 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
    >     #6 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
    >     #7 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
    >     #8 0x109655d in ExecProject ../src/include/executor/executor.h:389
    >     #9 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
    >     #10 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
    >     #11 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
    >     #12 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
    >     #13 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
    >     #14 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
    >     #15 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
    >     #16 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
    >     #17 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
    >     #18 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
    >     #19 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
    >     #20 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
    >     #21 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
    >     #22 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
    >     #23 0x11a5f67 in main ../src/backend/main/main.c:198
    >     #24 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    >     #25 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    >     #26 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)
    >
    > SUMMARY: AddressSanitizer: 133 byte(s) leaked in 2 allocation(s).
    >
    > --
    > Tristan Partin
    > Neon (https://neon.tech)
    
    I tried your v1-0002, it works at compile phase but failed to run initdb
    with the following leak detected:
    
    =================================================================
    ==64983==ERROR: LeakSanitizer: detected memory leaks
    
    Direct leak of 248 byte(s) in 1 object(s) allocated from:
        #0 0x7fc7729df9cf in __interceptor_malloc
    ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
        #1 0x55bff5480e8b in save_ps_display_args
    ../postgres/src/backend/utils/misc/ps_status.c:190
        #2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
        #3 0x7fc771924249 in __libc_start_call_main
    ../sysdeps/nptl/libc_start_call_main.h:58
    
    Indirect leak of 19 byte(s) in 1 object(s) allocated from:
        #0 0x7fc77299777b in __interceptor_strdup
    ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
        #1 0x55bff5480f41 in save_ps_display_args
    ../postgres/src/backend/utils/misc/ps_status.c:198
        #2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
        #3 0x7fc771924249 in __libc_start_call_main
    ../sysdeps/nptl/libc_start_call_main.h:58
    
    I worked around by moving *new_environ* as a global variable, I think this is
    false positive report so maybe this deserves a patch?
    
    I tried to apply your v2 patch set but v2-0004 seems out of date.
    
    -- 
    Regards
    Junwang Zhao
    
    
    
    
  10. Re: Fix some ubsan/asan related issues

    Tristan Partin <tristan@partin.io> — 2024-10-02T04:02:27Z

    On Mon Sep 16, 2024 at 9:29 AM EDT, Junwang Zhao wrote:
    > I tried your v1-0002, it works at compile phase but failed to run initdb
    > with the following leak detected:
    >
    > =================================================================
    > ==64983==ERROR: LeakSanitizer: detected memory leaks
    >
    > Direct leak of 248 byte(s) in 1 object(s) allocated from:
    >     #0 0x7fc7729df9cf in __interceptor_malloc
    > ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
    >     #1 0x55bff5480e8b in save_ps_display_args
    > ../postgres/src/backend/utils/misc/ps_status.c:190
    >     #2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
    >     #3 0x7fc771924249 in __libc_start_call_main
    > ../sysdeps/nptl/libc_start_call_main.h:58
    >
    > Indirect leak of 19 byte(s) in 1 object(s) allocated from:
    >     #0 0x7fc77299777b in __interceptor_strdup
    > ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
    >     #1 0x55bff5480f41 in save_ps_display_args
    > ../postgres/src/backend/utils/misc/ps_status.c:198
    >     #2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
    >     #3 0x7fc771924249 in __libc_start_call_main
    > ../sysdeps/nptl/libc_start_call_main.h:58
    >
    > I worked around by moving *new_environ* as a global variable, I think this is
    > false positive report so maybe this deserves a patch?
    >
    > I tried to apply your v2 patch set but v2-0004 seems out of date.
    
    Thanks for giving it a try Junwang. I'll rebase this, hopefully tomorrow 
    and give it another push.
    
    -- 
    Tristan Partin
    Neon (https://neon.tech)