Thread

  1. Re: [Patch] Windows relation extension failure at 2GB and 4GB

    Bryan Green <dbryan.green@gmail.com> — 2025-11-06T17:17:52Z

    On 11/5/2025 11:05 PM, Michael Paquier wrote:
    > On Tue, Oct 28, 2025 at 09:42:11AM -0500, Bryan Green wrote:
    >> I found two related bugs in PostgreSQL's Windows port that prevent files
    >> from exceeding 2GB. While unlikely to affect most installations (default 1GB
    >> segments), the code is objectively wrong and worth fixing.
    >>
    >> The first bug is a pervasive use of off_t where pgoff_t should be used. On
    >> Windows, off_t is only 32-bit, causing signed integer overflow at exactly
    >> 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 for this
    >> purpose and some function declarations in headers already use it, but the
    >> implementations weren't updated to match.
    > 
    > Ugh.  That's the same problem as "long", which is 8 bytes wide
    > everywhere except WIN32.  Removing traces of "long" from the code has
    > been a continuous effort over the years because of these silent
    > overflow issues.
    > 
    >> After fixing all those off_t issues, there's a second bug at 4GB in the
    >> Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and
    >> win32pread.c. The current implementation uses an OVERLAPPED structure for
    >> positioned I/O, but only sets the Offset field (low 32 bits), leaving
    >> OffsetHigh at zero. This works up to 4GB by accident, but beyond that,
    >> offsets wrap around.
    >>
    >> I can reproduce both bugs reliably with --with-segsize=8. The file grows to
    >> exactly 2GB and fails with "could not extend file: Invalid argument" despite
    >> having 300GB free. After fixing the off_t issues, it grows to exactly 4GB
    >> and hits the OVERLAPPED bug. Both are independently verifiable.
    > 
    > The most popular option in terms of installation on Windows is the EDB
    > installer, where I bet that a file segment size of 1GB is what's
    > embedded in the code compiled.  This argument would not hold with WAL
    > segment sizes if we begin to support even higher sizes than 1GB at
    > some point, and we use pwrite() in the WAL insert code.  That should
    > not be a problem even in the near future.
    > 
    >> It's safe for all platforms since pgoff_t equals off_t on Unix where off_t
    >> is already 64-bit. Only Windows behavior changes.
    > 
    > win32_port.h and port.h say so, yeah.
    > 
    >> Not urgent since few people hit this in practice, but it's clearly wrong
    >> code.
    >> Someone building with larger segments would see failures at 2GB and
    >> potential corruption at 4GB. Windows supports files up to 16 exabytes - no
    >> good reason to limit PostgreSQL to 2GB.
    > 
    > The same kind of limitations with 4GB files existed with stat() and
    > fstat(), but it was much more complicated than what you are doing
    > here, where COPY was not able to work with files larger than 4GB on
    > WIN32.  See the saga from bed90759fcbc.
    > 
    >> I have attached the patch to fix the relation extension problems for Windows
    >> to this email.
    >>
    >> Can provide the other patches that changes off_t for pgoff_t in the rest of
    >> the code if there's interest in fixing this.
    > 
    > Yeah, I think that we should rip out these issues, and move to the
    > more portable pgoff_t across the board.  I doubt that any of this
    > could be backpatched due to the potential ABI breakages these
    > signatures changes would cause.  Implementing things in incremental 
    > steps is more sensible when it comes to such changes, as a revert
    > blast can be reduced if a portion is incorrectly handled.
    > 
    > I'm seeing as well the things you are pointing in buffile.c.  These
    > should be fixed as well.  The WAL code is less annoying due to the 1GB
    > WAL segment size limit, still consistency across the board makes the
    > code easier to reason about, at least.
    > 
    > Among the files you have mentioned, there is also copydir.c.
    > 
    > pg_rewind seems also broken with files larger than 4GB, from what I
    > can see in libpq_source.c and filemap.c..  Worse.  Oops.
    > 
    > -	/* Note that this changes the file position, despite not using it. */
    > -	overlapped.Offset = offset;
    > +	overlapped.Offset = (DWORD) offset;
    > +	overlapped.OffsetHigh = (DWORD) (offset >> 32);
    > 
    > Based on the docs at [1], yes, this change makes sense.
    > 
    > It seems to me that a couple of extra code paths should be handled in
    > the first patch, and I have spotted three of them.  None of them are
    > critical as they are related to WAL segments, just become masked and
    > inconsistent:
    > - xlogrecovery.c, pg_pread() called with a cast to off_t.  WAL
    > segments have a max size of 1GB, meaning that we're OK.
    > - xlogreader.c, pg_pread() with a cast to off_t.
    > - walreceiver.c, pg_pwrite().
    > 
    > Except for these three spots, the first patch looks like a cut good
    > enough on its own.
    > 
    
    Latest patch attached that includes these code paths.
    
    > Glad to see someone who takes time to spend time on this kind of
    > stuff with portability in mind, by the way.
    > 
    > [1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped
    > --
    > Michael
    
    Thanks for the quick reviewing.
    
    -- 
    Bryan Green
    EDB: https://www.enterprisedb.com