Thread

  1. BUG #18971: Server passes an invalid (indirect) path in PGDATA to the external program

    PG Bug reporting form <noreply@postgresql.org> — 2025-06-28T07:17:54Z

    The following bug has been logged on the website:
    
    Bug reference:      18971
    Logged by:          Dmitry Kovalenko
    Email address:      d.kovalenko@postgrespro.ru
    PostgreSQL version: 18beta1
    Operating system:   Ubuntu 2024.04
    Description:        
    
    Hello,
    PG version is 18beta1 (bbccf7ec)
    OS is Ubunty 24.04
    ---------- SHORT DESCRIPTION
    Server passes an invalid (indirect) path in PGDATA to the external program.
    Server must provide to the external programs a right, absolute path in
    PGDATA.
    ---------- FULL DESCRIPTION / STEPS
    1) Our SPECIAL test starts PG with an indirect path to the database cluster
    files:
    CWD is:
    "/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01"
    COMMAND is:
    pg_ctl start -D
    "../../../../../../../../../../../..//home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
    -l
    "../../../../../../../../../../../..//home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/logs/postgres.log"
    postgresql.auto.conf of this cluster has the following lines:
    ------ [begin of postgresql.auto.conf]
    archive_mode =  'on'
    archive_command =  'exec
    "/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/builddir/src/pg_probackup3"
    archive-push -B
    /home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/backup
    --instance=node --compress-algorithm=zlib --log-level-console=trace
    --no-sync --wal-file-path=%p --wal-file-name=%f'
    recovery_target_timeline =  'current'
    recovery_target_action =  'pause'
    restore_command =
    '"/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/builddir/src/pg_probackup3"
    archive-get -B
    "/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/backup"
    --instance "node" --wal-file-path=%p --wal-file-name=%f'
    ------ [/end of postgresql.auto.conf]
    2) pc_ctl starts the postmaster with this PWD:
    "/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01"
    So, the path to the cluster files in "-D" is valid and is:
    "/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
    3) During start Postgres runs the external program pg_probackup3 with the
    following commandline (I got it with "ps -eo args | grep probackup"):
    sh -c --
    "/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/builddir/src/pg_probackup3"
    archive-get -B
    "/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/backup"
    --instance "node" --wal-file-path=pg_wal/RECOVERYXLOG
    --wal-file-name=000000010000000000000002
    This command runs the next process:
    /home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/builddir/src/pg_probackup3
    archive-get -B
    /home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/backup
    --instance node --wal-file-path=pg_wal/RECOVERYXLOG
    --wal-file-name=000000010000000000000002
    4) pg_probackup3 process has the following environment variables:
    PWD="/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
    PGDATA="../../../../../../../../../../../../home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
    ---------- PROBLEM
    PGDATA (4) is invalid for pg_probackup3. Because PWD of probackup (4) != PWD
    of server (2).
    When probackup is trying to use this PGDATA, it fails.
    Server must provide to the external programs a right, absolute path in
    PGDATA.
    Thanks&Regards,
    Dmitry Kovalenko,
    Postgres Professional, Russia
    
    
  2. Re: BUG #18971: Server passes an invalid (indirect) path in PGDATA to the external program

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-06-28T19:13:25Z

    PG Bug reporting form <noreply@postgresql.org> writes:
    > 4) pg_probackup3 process has the following environment variables:
    > PWD="/home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
    > PGDATA="../../../../../../../../../../../../home/dima/MY/Work/CurrentTasks/2025/PBCKP-2289/_Iter003/001--Vanilla-master/Probackup3/work01/tests/tmp_dirs/ArchiveTest/test_archive_get_relative_path/restored/data"
    > ---------- PROBLEM
    > PGDATA (4) is invalid for pg_probackup3.
    
    I do not think this is a Postgres bug.  The PGDATA environment
    variable is not canonical, it is just the default to be used if
    you don't specify a -D switch on the postmaster/pg_ctl command line.
    In fact, it might not be set at all.  (I see that pg_ctl does set
    it, but pg_ctl is not the only way to start the postmaster.)
    Therefore, relying on PGDATA in a restore_command script isn't safe.
    You should be relying on the current working directory (PWD) instead.
    
    			regards, tom lane
    
    
    
    
  3. Re: BUG #18971: Server passes an invalid (indirect) path in PGDATA to the external program

    Dmitry Kovalenko <d.kovalenko@postgrespro.ru> — 2025-06-28T19:32:59Z

    Hello Tom,
    
    > I do not think this is a Postgres bug.  The PGDATA environment
    > variable is not canonical, it is just the default to be used if
    > you don't specify a -D switch on the postmaster/pg_ctl command line.
    > In fact, it might not be set at all.  (I see that pg_ctl does set
    > it, but pg_ctl is not the only way to start the postmaster.)
    > Therefore, relying on PGDATA in a restore_command script isn't safe.
    > You should be relying on the current working directory (PWD) instead.
    A test executes pg_ctl with the correct path to cluster and the correct CWD.
    
    pg_ctl (and so on) executes an external program with inconsistent PGDATA 
    and CWD.
    
    This is clearly a problem on your side )
    
    I wonder that it was not detected earlier.
    
    For my point of view, if you can't provider a valid path in PGDATA, do 
    not do it at all.
    
    With Best Regards,
    
    Dmitry Kovalenko
    
    
    
    
    
  4. Re: BUG #18971: Server passes an invalid (indirect) path in PGDATA to the external program

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-06-28T20:19:13Z

    Dmitry Kovalenko <d.kovalenko@postgrespro.ru> writes:
    > For my point of view, if you can't provider a valid path in PGDATA, do 
    > not do it at all.
    
    I'm not sure which part of "don't rely on PGDATA" you didn't get.
    Under normal circumstances, Postgres itself doesn't set that at all.
    
    If you want to establish a site convention that PGDATA is valid,
    fine, but then it's on you to set it correctly.
    
    			regards, tom lane
    
    
    
    
  5. Re: BUG #18971: Server passes an invalid (indirect) path in PGDATA to the external program

    David G. Johnston <david.g.johnston@gmail.com> — 2025-06-28T21:12:02Z

    On Sat, Jun 28, 2025 at 1:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > Dmitry Kovalenko <d.kovalenko@postgrespro.ru> writes:
    > > For my point of view, if you can't provider a valid path in PGDATA, do
    > > not do it at all.
    >
    > I'm not sure which part of "don't rely on PGDATA" you didn't get.
    > Under normal circumstances, Postgres itself doesn't set that at all.
    >
    >
    If we have to tell someone to not rely on a value that we've set we've done
    something wrong.  Either haven't explained how to use it correctly, we've
    set it in unusual circumstances when we should not have, or we are choosing
    a bad value to set it in the first place.  Setting it to an absolute path
    would seem like the least problematic approach to take to make our system
    more usable.
    
    David J.
    
  6. Re: BUG #18971: Server passes an invalid (indirect) path in PGDATA to the external program

    Laurenz Albe <laurenz.albe@cybertec.at> — 2025-06-29T06:00:11Z

    On Sat, 2025-06-28 at 14:12 -0700, David G. Johnston wrote:
    > On Sat, Jun 28, 2025 at 1:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > I'm not sure which part of "don't rely on PGDATA" you didn't get.
    > > Under normal circumstances, Postgres itself doesn't set that at all.
    > 
    > If we have to tell someone to not rely on a value that we've set we've
    > done something wrong.
    
    I am not sure that that is an accurate description.  Look at the code
    in pg_ctl.c:
    
        /* process command-line options */
        while ((c = getopt_long(argc, argv, "cD:e:l:m:N:o:p:P:sS:t:U:wW",
                                long_options, &option_index)) != -1)
        {
            switch (c)
            {
                case 'D':
                    {
                        char       *pgdata_D;
    
                        pgdata_D = pg_strdup(optarg);
                        canonicalize_path(pgdata_D);
                        setenv("PGDATA", pgdata_D, 1);
    
                        /*
                         * We could pass PGDATA just in an environment variable
                         * but we do -D too for clearer postmaster 'ps' display
                         */
                        pgdata_opt = psprintf("-D \"%s\" ", pgdata_D);
                        free(pgdata_D);
                        break;
                    }
    
    So we take the value specified with the -D option, canonicalize_path() it
    and set it in the environment.  canonicalize_path() does the following:
    
    /*
     * canonicalize_path()
     *
     *  Clean up path by:
     *      o  make Win32 path use Unix slashes
     *      o  remove trailing quote on Win32
     *      o  remove trailing slash
     *      o  remove duplicate (adjacent) separators
     *      o  remove '.' (unless path reduces to only '.')
     *      o  process '..' ourselves, removing it if possible
     *  Modifies path in-place.
    
    So, essentially, Dmitry feeds a weird path to the -D option and then
    complains that PGDATA is set to a weird value.
    
    Yours,
    Laurenz Albe