v2-0003-Add-file_extend_method-ftruncate-chsize-options.patch
text/x-patch
Filename: v2-0003-Add-file_extend_method-ftruncate-chsize-options.patch
Type: text/x-patch
Part: 2
From cfeffb032d96e96016e5b840b07bcf8c04262860 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 15 Dec 2025 16:39:56 +1300
Subject: [PATCH v2 3/3] Add file_extend_method=ftruncate,chsize options.
Since COW file systems can't reserve space for future writes by any
means, provide an alternative that should at least be more efficient.
At least it delays kernel buffer allocation and skips copying zeros
around, like posix_fallocate.
"ftruncate" isn't a concept on Windows, so provide a different
surface-level option "chsize". It actually differs in a crucially
relevant way on the most common file system NTFS: it reserves disk
blocks immediately rather than creating a sparse file. On the other
hand, it surely can't do that on ReFS, so it seems inappropriate to
pretend that Windows has "posix_fallocate". Exposing the true
operation's name makes it the user's problem to figure out what the
filesystem does when we call it.
Tested-by: Dimitrios Apostolou <jimis@gmx.net>
Discussion: https://postgr.es/m/b1843124-fd22-e279-a31f-252dffb6fbf2%40gmx.net
---
doc/src/sgml/config.sgml | 20 ++++++++++++++
src/backend/storage/smgr/md.c | 26 ++++++++++++-------
src/backend/utils/misc/guc_tables.c | 1 +
src/backend/utils/misc/postgresql.conf.sample | 2 ++
src/include/storage/fd.h | 25 ++++++++++++++++++
5 files changed, 65 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5a298646100..ff8b66f52cf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2440,6 +2440,26 @@ include_dir 'conf.d'
function <function>posix_fallocate</function>.
</para>
</listitem>
+ <listitem>
+ <para>
+ <literal>ftruncate</literal> (Unix) extends files without
+ allocating space. Out-of-space errors are deferred until PostgreSQL
+ writes data out later, potentially preventing checkpoints from
+ completing, so it is not recommended for tradition "overwrite"
+ file systems. It is provided as an option for copy-on-write file
+ systems where <literal>posix_fallocate</literal> and
+ <literal>write_zeros</literal> can't reserve space eagerly, and
+ <literal>ftruncate</literal> might be more efficient.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>chsize</literal> (Windows) allocates space and reports
+ out-of-space errors immediately on NTFS (like
+ <literal>posix_fallocate</literal>), but defers allocation on
+ ReFS (like <literal>fallocate_ftruncate</literal>).
+ </para>
+ </listitem>
</itemizedlist>
The <literal>write_zeros</literal> method is always used when data
files are extended by <literal>file_extend_method_threshold</literal>
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index f893687814b..b65cd308fd3 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -595,7 +595,12 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum,
* If available and useful, use posix_fallocate() (via
* FileFallocate()) to extend the relation. That's often more
* efficient than using write(), as it commonly won't cause the kernel
- * to allocate page cache space for the extended pages.
+ * to allocate page cache space for the extended pages. COW
+ * filesystems can't really reserve disk space for future writeback
+ * (possibly moving the ENOSPC error into the checkpointer), but
+ * ftruncate() can still still be used to defer the kernel cache
+ * overheads until then. Note that on Windows, ftruncate() is really
+ * _chsize_s(), which *does* allocate blocks, at least on NTFS.
*
* However, we don't use FileFallocate() for small extensions, as it
* defeats delayed allocation on some filesystems.
@@ -605,25 +610,28 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum,
{
int ret = 0;
+ if (file_extend_method == FILE_EXTEND_METHOD_FTRUNCATE)
+ ret = FileTruncate(v->mdfd_vfd,
+ seekpos + (pgoff_t) BLCKSZ * numblocks,
+ WAIT_EVENT_DATA_FILE_EXTEND);
#ifdef HAVE_POSIX_FALLOCATE
- if (file_extend_method == FILE_EXTEND_METHOD_POSIX_FALLOCATE)
- {
+ else if (file_extend_method == FILE_EXTEND_METHOD_POSIX_FALLOCATE)
ret = FileFallocate(v->mdfd_vfd,
seekpos, (pgoff_t) BLCKSZ * numblocks,
WAIT_EVENT_DATA_FILE_EXTEND);
- }
- else
#endif
- {
+ else
elog(ERROR, "unsupported file_extend_method: %d",
file_extend_method);
- }
+
if (ret != 0)
{
ereport(ERROR,
errcode_for_file_access(),
- errmsg("could not extend file \"%s\" with FileFallocate(): %m",
- FilePathName(v->mdfd_vfd)),
+ errmsg("could not extend file \"%s\" with %s(): %m",
+ FilePathName(v->mdfd_vfd),
+ file_extend_method == FILE_EXTEND_METHOD_FTRUNCATE ?
+ "FileTruncate" : "FileFallocate"),
errhint("Check free disk space."));
}
}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 6c65a47a88d..63712c9e465 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -497,6 +497,7 @@ static const struct config_enum_entry file_extend_method_options[] = {
{"posix_fallocate", FILE_EXTEND_METHOD_POSIX_FALLOCATE, false},
#endif
{"write_zeros", FILE_EXTEND_METHOD_WRITE_ZEROS, false},
+ {FILE_EXTEND_METHOD_FTRUNCATE_NAME, FILE_EXTEND_METHOD_FTRUNCATE, false},
{NULL, 0, false}
};
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b745e31a38d..18ed8a6a549 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -184,6 +184,8 @@
# by the operating system:
# posix_fallocate (most Unix-like systems)
# write_zeros
+ # ftruncate (Unix)
+ # chsize (Windows)
#max_notify_queue_pages = 1048576 # limits the number of SLRU pages allocated
# for NOTIFY / LISTEN queue
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 7074c3f118b..bb1729a41d1 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -61,11 +61,36 @@ enum FileExtendMethod
FILE_EXTEND_METHOD_POSIX_FALLOCATE,
#endif
FILE_EXTEND_METHOD_WRITE_ZEROS,
+ FILE_EXTEND_METHOD_FTRUNCATE,
};
/* Default to the first available file_extend_method. */
#define DEFAULT_FILE_EXTEND_METHOD 0
+#ifdef WIN32
+
+ /*
+ * Even though file_extend_method=chsize uses the same code path as
+ * file_extend_method=ftruncate, our ftruncate() macro for Windows expands to
+ * _chsize_s(), whose filesystem-dependent behavior might not match
+ * ftruncate() in a relevant way:
+ *
+ * 1. NTFS allocates physical blocks so that overwriting them later can't
+ * fail with ENOSPC. It would be confusing and misleading to label it
+ * "ftruncate", as it sounds like a recipe for sparse files.
+ *
+ * 2. ReFS doesn't, being a COW system, and nor is allocation in the
+ * function's contract, so it would also be also be misleading to label it
+ * "posix_fallocate".
+ *
+ * We don't know what the file system does, and Unix terminology would only
+ * obfuscate matters, so we expose the name of the real OS function.
+ */
+#define FILE_EXTEND_METHOD_FTRUNCATE_NAME "chsize"
+#else
+#define FILE_EXTEND_METHOD_FTRUNCATE_NAME "ftruncate"
+#endif
+
/*
* Values 4-8 were experimentally determined to avoid interference between
* posix_fallocate() and delayed allocation on common Linux file systems, but
--
2.51.2