Thread

  1. Re: [Patch] Windows relation extension failure at 2GB and 4GB

    Bryan Green <dbryan.green@gmail.com> — 2025-11-06T14:56:14Z

    On 11/6/2025 3:20 AM, Thomas Munro wrote:
    > On Wed, Oct 29, 2025 at 3:42 AM Bryan Green <dbryan.green@gmail.com> wrote:
    >> That said, I'm finding off_t used in many other places throughout the
    >> codebase - buffile.c, various other file utilities such as backup and
    >> archive, probably more. This is likely causing latent bugs elsewhere on
    >> Windows, though most are masked by the 1GB default segment size. I'm
    >> investigating the full scope, but I think this needs to be broken up
    >> into multiple patches. The core file I/O layer (fd.c, md.c,
    >> pg_pwrite/pg_pread) should probably go first since that's what's
    >> actively breaking file extension.
    > 
    > The way I understand this situation, there are two kinds of file I/O,
    > with respect to large files:
    > 
    > 1.  Some places *have* to deal with large files (eg navigating in a
    > potentially large tar file), and there we should already be using
    > pgoff_t and the relevant system call wrappers should be using the
    > int64_t stuff Windows provides.  These are primarily frontend code.
    > 2.  Some places use segmentation *specifically because* there are
    > systems with 32 bit off_t.  These are mostly backend code dealing with
    > relation data files.  The only system left with narrow off_t is
    > Windows.
    > 
    > In reality the stuff in category 1 has been developed through a
    > process of bug reports and patches (970b97e and 970b97e^ springs to
    > mind as the most recent case I had something to with, but see also
    > stat()-related stuff, and see aa5518304 where we addressed the one
    > spot in buffile.c that had to consider multiple segments).  But the
    > fact that Windows can't use segments > 2GB because the fd.c and
    > smgr.c/md.c layers work with off_t is certainly a well known
    > limitation, ie specifically that relation and temporary/buf files are
    > special in this way.  I'm mostly baffled by the fact that --relsegsize
    > actually *lets* you set it higher than 2 on that platform.  Perhaps we
    > should at least backpatch a configure check or static assertion to
    > block that?  It's not good if it compiles but doesn't actually work.
    > 
    
    I agree that the backpatch should just block setting -relsegsize > 2GB
    on Windows.
    
    > For master I think it makes sense to clean this up, as you say,
    > because the fuzzy boundary between the two categories of file I/O is
    > bound to cause more problems, it's just unfinished business that has
    > been tackled piecemeal as required by bug reports...  In fact, on a
    > thread[1] where I explored making the segment size a runtime option
    > specified at initdb time, I even posted patches much like yours in the
    > first version, spreading pgoff_t into more places, and then in a later
    > version it was suggested that it might be better to just block
    > settings that are too big for your off_t, so I did that.  I probably
    > thought that we already did that somewhere for the current
    > compile-time constant...
    > 
    
    For master, I'd like to proceed with the cleanup approach - spreading
    pgoff_t into the core I/O layer (fd.c, md.c, pg_pread/pg_pwrite
    wrappers, etc). That would let us eliminate the artificial 2GB ceiling
    on Windows and clean up the file I/O category boundary.
    
    >> Not urgent since few people hit this in practice, but it's clearly wrong
    >> code.
    > 
    > Yeah.  In my experience dealing with bug reports, the Windows users
    > community skews very heavily towards just consuming EDB's read-built
    > installer.  We rarely hear about configuration-level problems, so I
    > suppose it's not surprising that no one has ever complained that it
    > lets you configure it in a way that we hackers all know is certainly
    > going to break.
    > 
    > [1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BBGXwMbrvzXAjL8VMGf25y_ga_XnO741g10y0%3Dm6dDiA%40mail.gmail.com
    
    Thanks for the feedback.
    
    -- 
    Bryan Green
    EDB: https://www.enterprisedb.com