Thread

  1. Re: Heads Up: cirrus-ci is shutting down June 1st

    Nazir Bilal Yavuz <byavuz81@gmail.com> — 2026-05-28T17:06:22Z

    Hi,
    
    Thank you for looking into this!
    
    On Wed, 27 May 2026 at 21:10, Andres Freund <andres@anarazel.de> wrote:
    >
    > > Here is the v2, I took Jelte's patch and reviewed & merged it with my
    > > patch. Updates and questions are:
    > >
    > > 1- I continued to use Jelte's container method (Linux tasks only for
    > > now, BSD tasks will be included in the future) because I think that is
    > > the future-proof way since we might want to generate our container
    > > images in the future. Also, up-to-date Debian images can be tested
    > > with this way; otherwise we would need to use Ubuntu 24.04.
    >
    > Good.
    >
    >
    > > 2- io_uring tests work on the Linux Meson task.
    >
    > Is there a reason to not just do that for all the tasks?
    
    I might word it incorrectly. I meant that Linux meson tests use:
    
    PG_TEST_INITDB_EXTRA_OPTS: >-
      -c io_method=io_uring
    
    and that wasn't working before, now it works. I guess we have this
    only on Linux because we wanted to test io_method=worker in the other
    tasks.
    
    
    > > 3- I didn't put commands to helper scripts for now. I think it is a
    > > good thing to have a helper script but it would be better to have this
    > > helper script after the first version is committed since it can extend
    > > the timeline. Also, I found that having all commands in one file makes
    > > debugging easier.
    >
    > Hm. I'm a bit worried about this getting pretty unmaintainable, due to the
    > repetition.  I think at least we need to use yaml anchors to deduplicate some
    > steps.
    
    Github Actions added support of yaml anchors last year but
    unfortunately they don't support merge keys. Related information: [1].
    
    
    > > 4- FreeBSD task has these options:
    > >
    > >       PG_TEST_INITDB_EXTRA_OPTS: >-
    > >         -c debug_copy_parse_plan_trees=on
    > >         -c debug_write_read_parse_plan_trees=on
    > >         -c debug_raw_expression_coverage_test=on
    > >         -c debug_parallel_query=regress
    > >
    > > Since we won't have FreeBSD for the first version. I put these options
    > > to the MacOS task but I couldn't decide where to put
    > > 'PG_TEST_PG_UPGRADE_MODE: --link'.
    >
    > Makes sense.
    >
    >
    > > Also, I am planning to work on back patches when we agree on the
    > > upstream one. Does that sound good?
    >
    > Yep.
    >
    >
    >
    > > diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
    > > new file mode 100644
    > > index 00000000000..6d20068727c
    > > --- /dev/null
    > > +++ b/.github/workflows/ci.yml
    > > @@ -0,0 +1,1125 @@
    > > +# GitHub Actions CI configuration for PostgreSQL
    > > +
    > > +name: Github Actions CI
    > > +
    > > +on:
    > > +  push:
    > > +    branches: [ "*" ]
    > > +
    > > +# Default to the minimum privilege the jobs need (just reading the repo
    > > +# contents during checkout). Individual jobs override this when they need
    > > +# more, e.g. `cancel-previous` needs `actions: write` to cancel runs.
    > > +permissions:
    > > +  contents: read
    >
    > I'm not sure I like that we ever need more than that. I'd expect that
    > postgresql-cfbot will explicitly disable write permissions for runs.
    
    Done. Updated the comment and removed the 'Cancel previous runs' step.
    
    
    > > +# NB: intentionally NO workflow-level `concurrency:` block. The native
    > > +# concurrency mechanism makes a new run wait for the previous one to fully
    > > +# cancel before it starts — which can take a while. Instead the
    > > +# `cancel-previous` job below fires a cancel API call asynchronously,
    > > +# so the new run gets going immediately. On master the cancel job is skipped,
    > > +# so every push runs to completion.
    >
    > Is this really worth having our own code? Seems like it'd not be that frequent
    > to push if there are already running runs?  What kind of delays are we talking
    > about?
    
    Jelte already answered this in [2]. 'Cancel previous runs' step is
    removed and concurrency is used instead.
    
    
    > > +  # To avoid unnecessarily spinning up a lot of VMs / containers for entirely
    > > +  # broken commits, have a minimal task that all others depend on.
    > > +  #
    > > +  # SPECIAL:
    > > +  # - Builds with --auto-features=disabled and thus almost no enabled
    > > +  #   dependencies
    > > +  sanity-check:
    > > +    name: SanityCheck
    > > +    needs: setup
    > > +    if: needs.setup.outputs.sanitycheck == 'true'
    > > +    runs-on: ubuntu-latest
    > > +    timeout-minutes: 15
    > > +    container:
    > > +      image: ${{ needs.setup.outputs.linux_ci_image }}
    > > +    env:
    > > +      BUILD_JOBS: 8
    > > +      TEST_JOBS: 8
    > > +      CCACHE_DIR: ${{ github.workspace }}/ccache_dir
    > > +      # no options enabled, should be small
    > > +      CCACHE_MAXSIZE: "150M"
    > > +    steps:
    > > +      - uses: actions/checkout@v6
    > > +        with:
    > > +          fetch-depth: ${{ env.CLONE_DEPTH }}
    > > +
    > > +      - name: Restore ccache
    > > +        uses: actions/cache@v5
    >
    > Seems like this is used by every task. Can we move this into a yaml anchor or
    > such, by using a variable representing the job name?
    
    Github Actions doesn't support merge keys. So we can't really
    duplicate them. I used yaml anchors for the checkout step since it is
    exactly for all jobs.
    
    
    > > +        with:
    > > +          path: ${{ env.CCACHE_DIR }}
    > > +          key: ccache-sanitycheck-${{ github.run_id }}
    > > +          restore-keys: ccache-sanitycheck-
    >
    > Why is the key here the run id? Doesn't that mean that we will never have a
    > precise cache match and that we will keep multiple versions of the cache
    > around? That seems like a waste of cache space?
    >
    > For efficiency, particularly on cfbot, it seems like it could be useful to
    > populate the cache of branches with the cache of the master branch. For that
    > we'd need the branch name in the key. Which I think would also good for
    > postgres/postgres, as we currently have a lot of interference between runs on
    > the main and the REL_XY_STABLE branches.
    
    I think that is the default way. If the cache has the exact hit, it
    doesn't refresh the cache. So, having ${{ github.run_id }} makes sure
    we won't have exact hits and the cache will always be refreshed. This
    sounds bad but that is what I understood :(
    
    I can implement something like this:
    
          - name: Restore ccache
            uses: actions/cache/restore@v5
            with:
              path: ${{ env.CCACHE_DIR }}
              key: ccache-sanitycheck-master
              restore-keys: |
                ccache-sanitycheck-${{ github.ref_name }}
                ccache-sanitycheck-
    
          - name: Save ccache
            if: always()
            uses: actions/cache/save@v5
            with:
              path: ${{ env.CCACHE_DIR }}
              key: ccache-sanitycheck-${{ github.ref_name }}-${{ github.run_id }}
    
    So, it will first look for master's cache, then current branch's cache
    and lastly whatever cache is available. Do you prefer that?
    
    
    > > +      - name: Prepare workspace
    > > +        run: |
    > > +          whoami
    > > +          useradd -m postgres
    > > +          chown -R postgres:postgres .
    > > +          mkdir -p "$CCACHE_DIR"
    > > +          chown -R postgres:postgres "$CCACHE_DIR"
    > > +          # Can't change the container's kernel.core_pattern; the postgres
    > > +          # user can't write to / normally. Make / writable.
    > > +          chown root:postgres /
    > > +          chmod g+rwx /
    >
    > Why not just always use a privileged container?
    
    Done.
    
    
    > > +      - name: Configure
    > > +        run: |
    > > +          su postgres <<-'EOF'
    > > +            set -e
    > > +            meson setup \
    > > +              --buildtype=debug \
    > > +              --auto-features=disabled \
    > > +              -Ddefault_library=shared \
    > > +              -Dtap_tests=enabled \
    > > +              build
    > > +          EOF
    > > +
    > > +      - name: Build
    > > +        run: |
    > > +          su postgres <<EOF
    > > +            set -e
    > > +            ninja -C build -j${BUILD_JOBS} ${MBUILD_TARGET}
    > > +          EOF
    >
    > Should we have an explicit cache upload step here? Or are upload steps run
    > unconditionally?
    
    Like I explained above, that is done by having ${{ github.run_id }} in
    the cache key.
    
    
    > > +      # Run a minimal set of tests. The main regression tests take too long
    > > +      # for this purpose. For now this is a random quick pg_regress style
    > > +      # test, and a tap test that exercises both a frontend binary and the
    > > +      # backend.
    > > +      - name: Test
    > > +        run: |
    > > +          su postgres <<EOF
    > > +            set -e
    > > +            ulimit -c unlimited
    > > +            meson test ${MTEST_ARGS} --suite setup
    > > +            meson test ${MTEST_ARGS} --num-processes ${TEST_JOBS} \
    > > +              cube/regress pg_ctl/001_start_stop
    > > +          EOF
    > > +
    > > +      - name: Core backtraces
    > > +        if: failure()
    > > +        run: |
    > > +          mkdir -m 770 /tmp/cores
    > > +          find / -maxdepth 1 -type f -name 'core*' -exec mv '{}' /tmp/cores/ \;
    > > +          src/tools/ci/cores_backtrace.sh linux /tmp/cores
    > > +
    > > +      - name: Upload logs
    > > +        if: failure()
    > > +        uses: actions/upload-artifact@v7
    > > +        with:
    > > +          name: sanitycheck-logs-${{ github.run_id }}
    > > +          path: |
    > > +            build*/testrun/**/*.log
    > > +            build*/testrun/**/*.diffs
    > > +            build*/testrun/**/regress_log_*
    > > +            build*/meson-logs/*.txt
    > > +          if-no-files-found: ignore
    >
    > I think this really should be in a yaml anchor, we have a few somewhat
    > different versions of this now.
    
    Same thing, we can't have yaml anchors because merge keys are not
    supported.  I created this variable:
    
    _LOG_PATHS: &log_paths |
    build*/testrun/**/*.log
    build*/testrun/**/*.diffs
    build*/testrun/**/regress_log_*
    build*/meson-logs/*.txt
    
    and used it in the Upload logs' path.
    
    
    > It's pretty annoying that the output of the failures isn't visible in the UI.
    > Maybe we ought to print a few of the failures out or something?
    
    We already have '--print-errorlogs', do you mean something different?
    
    
    > > +
    > > +  # SPECIAL:
    > > +  # - Uses address sanitizer (sanitizer failures are typically printed in
    > > +  #   the server log)
    > > +  # - Configures postgres with a small segment size
    > > +  #
    > > +  # Enable a reasonable set of sanitizers. Use the linux task for that, as
    > > +  # it's one of the fastest tasks (without sanitizers). Also several of the
    > > +  # sanitizers work best on linux.
    > > +  #
    > > +  # The overhead of alignment sanitizer is low, undefined behaviour has
    > > +  # moderate overhead. Test alignment sanitizer in the meson task, as it
    > > +  # does both 32 and 64 bit builds and is thus more likely to expose
    > > +  # alignment bugs.
    > > +  #
    > > +  # Address sanitizer in contrast is somewhat expensive. Enable it in the
    > > +  # autoconf task, as the meson task tests both 32 and 64bit.
    >
    > I wonder if we should split the meson task into two, one for 32bit and one for
    > 64bit. The concurrency limits for public repos are high enough for that to
    > seem like a reasonable tradeoff? There's no work, other than the repo
    > checkout, shared between them.
    
    Done.
    
    
    > > +  # disable_coredump=0, abort_on_error=1: for useful backtraces in case of crashes
    > > +  # print_stacktraces=1,verbosity=2, duh
    > > +  # detect_leaks=0: too many uninteresting leak errors in short-lived binaries
    > > +  linux-autoconf:
    > > +    name: Linux - Debian Trixie - Autoconf
    > > +    needs: [setup, sanity-check]
    > > +    if: |
    > > +      !cancelled() &&
    > > +      needs.setup.outputs.linux == 'true' &&
    > > +      needs.sanity-check.result != 'failure'
    > > +    runs-on: ubuntu-latest
    > > +    timeout-minutes: 60
    > > +    container:
    > > +      image: ${{ needs.setup.outputs.linux_ci_image }}
    > > +      # Share the host PID + IPC namespaces. 017_shm.pl rapidly creates,
    > > +      # kill9's, and restarts postgres; with the container's small PID
    > > +      # space a new postgres can recycle the dead postmaster's PID before
    > > +      # pg_ctl's postmaster.pid check notices, producing spurious "node X
    > > +      # is already running" failures. SysV shm in the test also relies on
    > > +      # host-like IPC behavior.
    > > +      #
    > > +      # --ulimit raises memlock and core dump size. Memlock is needed for
    > > +      # running the AIO tests.
    > > +      #
    > > +      # --privileged is needed so the prepare step can write to sysctls
    > > +      # under /proc/sys (it's mounted read-only without it). We use it to
    > > +      # set kernel.core_pattern.
    > > +      options: --pid=host --ipc=host --ulimit memlock=-1:-1 --privileged
    > > +    env:
    > > +      BUILD_JOBS: 4
    > > +      TEST_JOBS: 8
    > > +      CCACHE_DIR: /tmp/ccache_dir
    > > +      DEBUGINFOD_URLS: "https://debuginfod.debian.net"
    > > +
    > > +      SANITIZER_FLAGS: -fsanitize=address
    > > +      UBSAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2
    > > +      ASAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0:detect_stack_use_after_return=0
    > > +      CFLAGS: -Og -ggdb -fno-sanitize-recover=all -fsanitize=address
    > > +      CXXFLAGS: -Og -ggdb -fno-sanitize-recover=all -fsanitize=address
    > > +      LDFLAGS: -fsanitize=address
    > > +      CC: ccache gcc
    > > +      CXX: ccache g++
    >
    > There's a fair bit of stuff shared between the meson/autoconf linux
    > tasks. Previously they used a matrix to reduce that a *bit*. But now it's
    > entirely duplicated, including stuff that doesn't apply to the current job
    > (e.g. UBSAN_OPTIONS/ASAN_OPTIONS).  And blocks like the following:
    >
    >
    > > +      - name: Prepare workspace
    > > +        run: |
    > > +          useradd -m postgres
    > > +          chown -R postgres:postgres .
    > > +          mkdir -p "$CCACHE_DIR"
    > > +          chown -R postgres:postgres "$CCACHE_DIR"
    > > +          mkdir -m 770 /tmp/cores
    > > +          chown root:postgres /tmp/cores
    > > +          sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
    > > +
    > > +          # Hosts for the load balance test
    > > +          cat >> /etc/hosts <<-EOF
    > > +            127.0.0.1 pg-loadbalancetest
    > > +            127.0.0.2 pg-loadbalancetest
    > > +            127.0.0.3 pg-loadbalancetest
    > > +          EOF
    
    
    I found we can use matrices and merged all linux tasks. I am not sure
    that is better since it is a bit harder to read now.
    
    
    > > +      # Install dependencies via Homebrew rather than Macports. On stock
    > > +      # GH runners macports requires a heavy bootstrap, and the relevant
    > > +      # Postgres deps are all available in brew.
    >
    > What does "heavy bootstrap" mean?
    
    I used MacPorts on my first version. It took ~10 minutes to download
    MacPorts. I think that if we could use caching like we did in the
    Cirrus, it makes sense to use MacPorts. I will spend some time on
    that.
    
    And after spending some time, I am able to make it work. Now the first
    run's dependencies install takes ~10 minutes since there is no
    MacPorts cache but subsequent runs' install only take ~5 seconds.
    
    
    > > +      - name: Install dependencies
    > > +        run: |
    > > +          brew update
    > > +          brew install \
    > > +            ccache meson openldap python@3.12 tcl-tk
    > > +          # IPC::Run via cpanm (system perl)
    > > +          sudo cpan -T -i IPC::Run IO::Tty
    >
    > We do spend ~95s on this every run, that's not nothing. And it puts a bunch of
    > load onto the brew's mirrors to do that every run.
    
    You are right. MacPorts is used now.
    
    
    > > +      - name: Test world
    > > +        run: |
    > > +          ulimit -c unlimited
    > > +          ulimit -n 1024
    > > +          meson test ${MTEST_ARGS} --num-processes ${TEST_JOBS}
    >
    > I'd re-add the comments that were in .cirrus.yml about this.
    
    Done.
    
    
    > > +  windows-vs:
    > > +    name: Windows - Server 2022, VS 2022 - Meson & ninja
    > > +    needs: [setup, sanity-check]
    > > +    if: |
    > > +      !cancelled() &&
    > > +      needs.setup.outputs.windows == 'true' &&
    > > +      needs.sanity-check.result != 'failure'
    > > +    runs-on: windows-2022
    > > +    timeout-minutes: 60
    > > +    env:
    > > +      TEST_JOBS: 8
    > > +      # Avoid port conflicts between concurrent tap tests
    > > +      PG_TEST_USE_UNIX_SOCKETS: 1
    > > +      PG_REGRESS_SOCK_DIR: 'c:\pgsock\'
    >
    > At least my editor gets confused by the \', thinking it's escaping the '. As
    > everything just works without the trailing \, I'd go that way.
    
    Done.
    
    
    > > +      # The TAP tests build an initdb template under build/tmp_install and
    > > +      # then `robocopy` it into per-test data directories. Robocopy with the
    > > +      # default /COPY:DAT flag doesn't copy ACLs — destinations inherit from
    > > +      # their parent dir. On GitHub-hosted Windows runners the workspace's
    > > +      # inherited ACL grants Administrators:(F) and Users:(RX) but does NOT
    > > +      # grant the runner user (runneradmin) directly. That matters because
    > > +      # pg_ctl on Windows uses CreateRestrictedProcess to drop admin
    > > +      # privileges from postmaster, so the postmaster process has the user
    > > +      # SID in its token but no longer the Administrators group — leaving it
    > > +      # with only "Users:(RX)" on pg_control and friends, which causes
    > > +      # "PANIC: could not open file global/pg_control: Permission denied".
    > > +      #
    > > +      # Fix it once on the workspace dir with (OI)(CI) inheritance flags so
    > > +      # every file/dir created underneath gets an explicit grant for the
    > > +      # current user.
    > > +      - name: Grant workspace ACL to runner user
    > > +        shell: pwsh
    > > +        run: |
    > > +          icacls "${{ github.workspace }}" /grant "${env:USERNAME}:(OI)(CI)F" /Q | Out-Null
    > > +          Write-Host "Granted Full Control to $env:USERNAME on ${{ github.workspace }}"
    >
    > Perhaps this would be better to fix by changing the robocopy flags?
    
    I couldn't fix this by using robocopy flags. I used /COPYALL and
    /SECFIX together but they didn't work.
    
    
    > > +      # postgres' plpython3u loads python3.dll (the stable-ABI forwarder)
    > > +      # which in turn loads whichever python3NN.dll the Windows loader finds
    > > +      # first on PATH. On windows-2022 `C:\Program Files\Mercurial\` ships
    > > +      # its own python3.dll + python39.dll and appears on PATH *before* the
    > > +      # hostedtoolcache Python 3.12 — so without intervention the backend
    > > +      # ends up running Python 3.9 while postgres' stdlib search uses 3.12,
    > > +      # producing `ImportError: cannot import name 'text_encoding' from
    > > +      # 'io'` (the 3.12 `io.py` calling into 3.9's `_io`).
    > > +      #
    > > +      # Pin PYTHONHOME to the Python 3.12 prefix, and prepend that prefix
    > > +      # to PATH so its python3.dll wins the DLL search.
    > > +      - name: Pin Python prefix on PATH and PYTHONHOME
    > > +        shell: pwsh
    > > +        run: |
    > > +          $prefix = (python -c "import sys; print(sys.prefix)").Trim()
    > > +          Add-Content $env:GITHUB_ENV "PYTHONHOME=$prefix"
    > > +          Add-Content $env:GITHUB_PATH $prefix
    > > +          Write-Host "PYTHONHOME=$prefix"
    > > +          Write-Host "Prepended $prefix to PATH"
    >
    > GRJGJKLJKJDFJKDF.
    
    I re-checked this since Jelte wasn't completely sure about this [2]
    but this is unfortunately correct :(
    
    
    > > +      - name: Install dependencies
    > > +        shell: pwsh
    > > +        run: |
    > > +          choco install -y --no-progress --limitoutput diffutils winflexbison
    > > +          # meson + ninja aren't preinstalled on windows-2022. Install via pip
    > > +          python -m pip install --upgrade meson ninja
    > > +
    > > +          # OpenSSL 1.1 via the slproweb installer (pinned to match the
    > > +          # version used elsewhere in postgres CI).
    > > +          curl.exe -fsSL -o openssl-setup.exe https://slproweb.com/download/Win64OpenSSL-1_1_1w.exe
    > > +          Start-Process -Wait -FilePath ./openssl-setup.exe `
    > > +            -ArgumentList '/DIR=c:\openssl\1.1\ /VERYSILENT /SP- /SUPPRESSMSGBOXES'
    > > +          # The slproweb installer puts libcrypto-1_1-x64.dll / libssl-1_1-x64.dll
    > > +          # in c:\openssl\1.1\bin\ and updates the system PATH. GH Actions
    > > +          # snapshots PATH at job start though, so the running job won't
    > > +          # see those DLLs and initdb.exe would crash silently at runtime.
    > > +          # Push the bin dir onto GITHUB_PATH so it persists for later steps.
    > > +          Add-Content $env:GITHUB_PATH "c:\openssl\1.1\bin"
    >
    > I don't like that much, but I'm not sure we have a better alternative
    > short-term.
    
    Making chocolatey would be a nice alternative. You already said
    sometimes chocolatey takes too much time. I am planning to spend time
    on it unless we are planning to use our own Windows containers.
    
    
    > > +  windows-mingw:
    > > +    name: Windows - Server 2022, MinGW64 - Meson
    > > +    needs: [setup, sanity-check]
    > > +    if: |
    > > +      !cancelled() &&
    > > +      needs.setup.outputs.mingw == 'true' &&
    > > +      needs.sanity-check.result != 'failure'
    > > +    runs-on: windows-2022
    > > +    timeout-minutes: 60
    > > +    env:
    > > +      TEST_JOBS: 4  # higher concurrency causes occasional failures
    > > +      PG_TEST_USE_UNIX_SOCKETS: 1
    > > +      PG_REGRESS_SOCK_DIR: 'c:\pgsock\'
    > > +      TAR: "c:/windows/system32/tar.exe"
    > > +      # for mingw plpython to find its installation
    > > +      PYTHONHOME: D:/a/_temp/msys64/ucrt64
    > > +
    > > +      MSYS: winjitdebug
    > > +      CHERE_INVOKING: 1
    > > +      MESON_FEATURES: >-
    > > +        -Dnls=disabled
    >
    > Missing comments from .cirrus.tasks.yml
    
    Done.
    
    v3 is attached. Just a quick note, v3 includes Zsolt [3] And Peter's
    [4] reviews & feedback too. I will reply to them after sending this.
    
    GA run after v3 is applied:
    https://github.com/nbyavuz/postgres/actions/runs/26587973538
    
    
    [1]
    https://github.com/actions/runner/issues/1182
    https://github.com/orgs/community/discussions/185877
    [2] https://postgr.es/m/CAGECzQQBCF%3DHSk4eCc1fEYTpCt59rgpcwWp47%2B6M-CDMYEaM2A%40mail.gmail.com
    [3] https://postgr.es/m/CAN4CZFO4usEzFQoYzEywvOgoagW%3DU4yhpB4Oq-a7bUCR53djHA%40mail.gmail.com
    [4] https://postgr.es/m/3daa29a4-6a08-41c1-8a6a-53ba8cd3c7fb%40eisentraut.org
    
    
    --
    Regards,
    Nazir Bilal Yavuz
    Microsoft