basebackup.c's sendFile() ignores read errors
Robert Haas <robertmhaas@gmail.com>
From: Robert Haas <robertmhaas@gmail.com>
To: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Date: 2019-08-27T16:42:03Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
When performing a base backup, check for read errors.
- fbe897134801 9.4.25 landed
- f697c6396d73 9.5.20 landed
- 23df8822601d 9.6.16 landed
- 62fb12d7e138 10.11 landed
- 61c65cce40e6 11.6 landed
- ce5542d40253 12.0 landed
- 286af0ce1211 13.0 landed
While reviewing a proposed patch to basebackup.c this morning, I found myself a bit underwhelmed by the quality of the code and comments in basebackup.c's sendFile(). I believe it's already been pointed out that the the retry logic here is not particularly correct, and the comments demonstrate a pretty clear lack of understanding of how the actual race conditions that are possible here might operate. However, I then noticed another problem which I think is significantly more serious: this code totally ignores the possibility of a failure in fread(). It just assumes that if fread() fails to return a positive value, it's reached the end of the file, and if that happens, it just pads out the rest of the file with zeroes. So it looks to me like if fread() encountered, say, an I/O error while trying to read a data file, and if that error were on the first data block, then the entire contents of that file would be replaced with zero bytes in the backup, and no error or warning of any kind would be given to the user. If it hit the error later, everything from the point of the error onward would just get replaced with zero bytes. To be clear, I think it is fine for basebackup.c to fill out the rest of the expected length with zeroes if in fact the file has been truncated during the backup; recovery should fix it. But recovery is not going to fix data that got eaten because EIO was encountered while copying a file. The logic that rereads a block appears to have the opposite problem. Here, it will detect an error, but it will also fail in the face of a concurrent truncation, which it shouldn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company