Thread

  1. Re: GNU/Hurd portability patches

    Alexander Lakhin <exclusion@gmail.com> — 2025-10-12T13:00:00Z

    Hi Michael,
    
    12.10.2025 11:31, Michael Banck wrote:
     >
     > Any way to easily reproduce this? It happened only once on fruitcrow so
     > far.
    
    I'd say it happens pretty often when `make check` doesn't hang (so it
    takes an hour or two for me to reproduce).
    
    Though now that you've mentioned MAX_CONNECTIONS => '3', I also tried:
    EXTRA_REGRESS_OPTS="--max-connections=3" make -s check
    and it passed 6 iterations for me. Iteration 7 failed with:
    not ok 213   + partition_aggregate                      1027 ms
    
    --- /home/demo/postgresql/src/test/regress/expected/partition_aggregate.out 2025-10-11 10:04:36.000000000 +0100
    +++ /home/demo/postgresql/src/test/regress/results/partition_aggregate.out 2025-10-12 13:02:05.000000000 +0100
    @@ -1476,14 +1476,14 @@
      (15 rows)
    
      SELECT x, sum(y), avg(y), sum(x+y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
    - x  | sum  |        avg         |  sum  | count
    -----+------+--------------------+-------+-------
    -  0 | 5000 | 5.0000000000000000 |  5000 |  1000
    -  1 | 6000 | 6.0000000000000000 |  7000 |  1000
    - 10 | 5000 | 5.0000000000000000 | 15000 |  1000
    - 11 | 6000 | 6.0000000000000000 | 17000 |  1000
    - 20 | 5000 | 5.0000000000000000 | 25000 |  1000
    - 21 | 6000 | 6.0000000000000000 | 27000 |  1000
    + x  | sum  |            avg             |  sum  | count
    +----+------+----------------------------+-------+-------
    +  0 | 5000 |         5.0000000000000000 |  5000 |  1000
    +  1 | 6000 |         6.0000000000000000 |  7000 |  1000
    + 10 | 5000 | 0.000000052757140846001326 | 15000 |  1000
    + 11 | 6000 |         6.0000000000000000 | 17000 |  1000
    + 20 | 5000 |         5.0000000000000000 | 25000 |  1000
    + 21 | 6000 |         6.0000000000000000 | 27000 |  1000
      (6 rows)
    
    Then another 6 iterations passed, seventh one hanged. Then 10 iterations
    passed.
    
    With  EXTRA_REGRESS_OPTS="--max-connections=10" make -s check, I got:
    2025-10-12 13:52:58.559 BST client backend[15475] pg_regress/constraints STATEMENT:  ALTER TABLE notnull_tbl2 ALTER a 
    DROP NOT NULL;
    !!!wrapper_handler[15479]| postgres_signal_arg: 30, PG_NSIG: 33
    !!!wrapper_handler[15476]| postgres_signal_arg: 30, PG_NSIG: 33
    !!!wrapper_handler[15476]| postgres_signal_arg: 28481392, PG_NSIG: 33
    TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 15476
    postgres(ExceptionalCondition+0x5a) [0x1006af78a]
    postgres(+0x70f59a) [0x10070f59a]
    /lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102b89fee]
    /lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102b89fdd]
    
    on iteration 5.
    
    So we can conclude that the issue with signals is better reproduced with
    higher concurrency.
    
    28481392 (0x1b29770) is pretty close to 28476608 (0x1b284c0), which I
    showed before, so numbers are apparently not random.
    
     > I had to reboot fruitcrow last night because it had crashed, but that
     > was the first time in literally weeks. I tend to reboot it once a week,
     > but otherwise it ran pretty stable.
    
    Today I also tried to test my machine with stress-ng:
    stress-ng -v --class os --sequential 20 --timeout 120s
    
    It hanged/crashed at tests access, brk, close, enosys and never reached
    the end... Some tests might pass after restart, some fail consistently...
    For example:
    Fatal glibc error: ../sysdeps/mach/hurd/mig-reply.c:73 (__mig_dealloc_reply_port): assertion failed: port == arg
    stress-ng: info:  [9395] stressor terminated with unexpected signal 6 'SIGABRT'
    backtrace:
       stress-ng-enosys [run](+0xace81) [0x1000ace81]
       stress-ng-enosys [run](+0x927b6c) [0x100927b6c]
       /lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x1029c8fee]
       /lib/x86_64-gnu/libc.so.0.3(+0x21aec) [0x1029b0aec]
    
     > It took me a while to get there though before I applied for it to be a
     > buildfarm animal, here is what I did:
     >
     > 1) (builfarm client specific): removed "HEAD => ['debug_parallel_query =
     > regress']," and set "MAX_CONNECTIONS => '3'," in build-farm.conf, to
     > reduce concurrency.
    
    Thank you for the info! I didn't specify debug_parallel_query for
    `make check`, but num_connections really makes the difference.
    
     > 2. Gave it 4G of memory to the VM via KVM. Also set -M q35, but I guess
     > you are already doing that as it does not boot properly otherwise IME.
    
    Mine has 4GB too.
    
     > 3. Removed swap (this is already the case for the x86-64 2025 Debian
     > image, but it was not the case for the earlier 2023 i386 image when I
     > started this project). Paging to disk has been problematic and prone to
     > issues (critical parts getting paged out accidently), but this has been
     > fixed over the summer so in principle running a current gnumach/hurd
     > package combination from unstable should be fine again.
    
    Yes, I have no swap enabled.
    
     > 4. Removed tmpfs translators (so that the default-pager is not used
     > anywhere, in conjunction with not setting swap, see above), by setting
     > RAMLOCK=no and RAMTMP=no in /etc/default/tmpfs, as well as commenting
     > out 'mount_run mount_noupdate'/'mount_tmp mount_noupdate' in
     > /etc/init.d/mountall.sh and 'mount_run "$MNTMODE"' in
     > /etc/init.d/mountkernfs.sh (maybe there is a more minimal change, but
     > that is what I have right now).
    
    I have RAMLOCK=no and RAMTMP=no in my /etc/default/tmpfs and can't see any
    tmpfs mounts.
    
    Thank you for your help!
    
    Best regards,
    Alexander