Thread
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
When performing a base backup, check for read errors.
- fbe897134801 9.4.25 landed
- f697c6396d73 9.5.20 landed
- 23df8822601d 9.6.16 landed
- 62fb12d7e138 10.11 landed
- 61c65cce40e6 11.6 landed
- ce5542d40253 12.0 landed
- 286af0ce1211 13.0 landed
-
basebackup.c's sendFile() ignores read errors
Robert Haas <robertmhaas@gmail.com> — 2019-08-27T16:42:03Z
While reviewing a proposed patch to basebackup.c this morning, I found myself a bit underwhelmed by the quality of the code and comments in basebackup.c's sendFile(). I believe it's already been pointed out that the the retry logic here is not particularly correct, and the comments demonstrate a pretty clear lack of understanding of how the actual race conditions that are possible here might operate. However, I then noticed another problem which I think is significantly more serious: this code totally ignores the possibility of a failure in fread(). It just assumes that if fread() fails to return a positive value, it's reached the end of the file, and if that happens, it just pads out the rest of the file with zeroes. So it looks to me like if fread() encountered, say, an I/O error while trying to read a data file, and if that error were on the first data block, then the entire contents of that file would be replaced with zero bytes in the backup, and no error or warning of any kind would be given to the user. If it hit the error later, everything from the point of the error onward would just get replaced with zero bytes. To be clear, I think it is fine for basebackup.c to fill out the rest of the expected length with zeroes if in fact the file has been truncated during the backup; recovery should fix it. But recovery is not going to fix data that got eaten because EIO was encountered while copying a file. The logic that rereads a block appears to have the opposite problem. Here, it will detect an error, but it will also fail in the face of a concurrent truncation, which it shouldn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
-
Re: basebackup.c's sendFile() ignores read errors
Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-08-28T10:31:06Z
On Tue, Aug 27, 2019 at 10:33 PM Robert Haas <robertmhaas@gmail.com> wrote: > While reviewing a proposed patch to basebackup.c this morning, I found > myself a bit underwhelmed by the quality of the code and comments in > basebackup.c's sendFile(). I believe it's already been pointed out > that the the retry logic here is not particularly correct, and the > comments demonstrate a pretty clear lack of understanding of how the > actual race conditions that are possible here might operate. > > However, I then noticed another problem which I think is significantly > more serious: this code totally ignores the possibility of a failure > in fread(). It just assumes that if fread() fails to return a > positive value, it's reached the end of the file, and if that happens, > it just pads out the rest of the file with zeroes. So it looks to me > like if fread() encountered, say, an I/O error while trying to read a > data file, and if that error were on the first data block, then the > entire contents of that file would be replaced with zero bytes in the > backup, and no error or warning of any kind would be given to the > user. If it hit the error later, everything from the point of the > error onward would just get replaced with zero bytes. To be clear, I > think it is fine for basebackup.c to fill out the rest of the expected > length with zeroes if in fact the file has been truncated during the > backup; recovery should fix it. But recovery is not going to fix data > that got eaten because EIO was encountered while copying a file. > Per fread(3) manpage, we cannot distinguish between an error or EOF. So to check for any error we must use ferror() after fread(). Attached patch which tests ferror() and throws an error if it returns true. However, I think fread()/ferror() both do not set errno (per some web reference) and thus throwing a generic error here. I have manually tested it. > The logic that rereads a block appears to have the opposite problem. > Here, it will detect an error, but it will also fail in the face of a > concurrent truncation, which it shouldn't. > For this, I have checked for feof() assuming that when the file gets truncated we reach EOF. And if yes, getting out of the loop instead of throwing an error. I may be wrong as I couldn't reproduce this issue. Please have a look over the patch and let me know your views. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
-
Re: basebackup.c's sendFile() ignores read errors
Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> — 2019-08-29T09:47:31Z
Hi Jeevan, On Wed, Aug 28, 2019 at 10:26 PM Jeevan Chalke < jeevan.chalke@enterprisedb.com> wrote: > > > On Tue, Aug 27, 2019 at 10:33 PM Robert Haas <robertmhaas@gmail.com> > wrote: > >> While reviewing a proposed patch to basebackup.c this morning, I found >> myself a bit underwhelmed by the quality of the code and comments in >> basebackup.c's sendFile(). I believe it's already been pointed out >> that the the retry logic here is not particularly correct, and the >> comments demonstrate a pretty clear lack of understanding of how the >> actual race conditions that are possible here might operate. >> >> However, I then noticed another problem which I think is significantly >> more serious: this code totally ignores the possibility of a failure >> in fread(). It just assumes that if fread() fails to return a >> positive value, it's reached the end of the file, and if that happens, >> it just pads out the rest of the file with zeroes. So it looks to me >> like if fread() encountered, say, an I/O error while trying to read a >> data file, and if that error were on the first data block, then the >> entire contents of that file would be replaced with zero bytes in the >> backup, and no error or warning of any kind would be given to the >> user. If it hit the error later, everything from the point of the >> error onward would just get replaced with zero bytes. To be clear, I >> think it is fine for basebackup.c to fill out the rest of the expected >> length with zeroes if in fact the file has been truncated during the >> backup; recovery should fix it. But recovery is not going to fix data >> that got eaten because EIO was encountered while copying a file. >> > > Per fread(3) manpage, we cannot distinguish between an error or EOF. So to > check for any error we must use ferror() after fread(). Attached patch > which > tests ferror() and throws an error if it returns true. > You are right. This seems the right approach to me. I can see at least couple of places already using ferror() to guard errors of fread() like CopyGetData(), read_backup_label() etc. > However, I think > fread()/ferror() both do not set errno (per some web reference) and thus > throwing a generic error here. I have manually tested it. > If we are interested in getting the errno, then instead of fread(), we can use read(). But, here, in particular, we are not decision making anything depending on errno so I think we should be fine with your current patch. > >> The logic that rereads a block appears to have the opposite problem. >> Here, it will detect an error, but it will also fail in the face of a >> concurrent truncation, which it shouldn't. >> > > For this, I have checked for feof() assuming that when the file gets > truncated > we reach EOF. And if yes, getting out of the loop instead of throwing an > error. > I may be wrong as I couldn't reproduce this issue. > I had a look at the patch and this seems correct to me. Few minor comments: + /* Check fread() error. */ + CHECK_FREAD_ERROR(fp, pathbuf); + The comments above the macro call at both the places are not necessary as your macro name itself is self-explanatory. ---------- + /* + * If file is truncated, then we will hit + * end-of-file error in which case we don't + * want to error out, instead just pad it with + * zeros. + */ + if (feof(fp)) The if block does not do the truncation right away, so I think the comment above can be reworded to explain why we reset cnt? Regards, Jeevan Ladhe
-
Re: basebackup.c's sendFile() ignores read errors
Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-08-30T10:34:43Z
On Thu, Aug 29, 2019 at 3:17 PM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: > Hi Jeevan > > I had a look at the patch and this seems correct to me. > Thanks, Jeevan Ladhe. > > Few minor comments: > > + /* Check fread() error. */ > + CHECK_FREAD_ERROR(fp, pathbuf); > + > > The comments above the macro call at both the places are not necessary as > your macro name itself is self-explanatory. > > ---------- > + /* > + * If file is truncated, then we will hit > + * end-of-file error in which case we don't > + * want to error out, instead just pad it with > + * zeros. > + */ > + if (feof(fp)) > > The if block does not do the truncation right away, so I think the comment > above can be reworded to explain why we reset cnt? > Fixed both comments in the attached patch. -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
-
Re: basebackup.c's sendFile() ignores read errors
Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> — 2019-08-30T11:04:53Z
> > Fixed both comments in the attached patch. > Thanks, the patch looks good to me. Regards, Jeevan Ladhe
-
Re: basebackup.c's sendFile() ignores read errors
Robert Haas <robertmhaas@gmail.com> — 2019-09-05T18:10:37Z
On Fri, Aug 30, 2019 at 7:05 AM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote: >> Fixed both comments in the attached patch. > > Thanks, the patch looks good to me. Here's a version of the patch with a further change to the wording of the comment. I hope this is clearer. I think this needs to be back-patched all the way back to 9.4, and it doesn't seem to apply cleanly before v11. Any chance you could prepare a version for the older branches? Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
-
Re: basebackup.c's sendFile() ignores read errors
Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-09-06T06:08:12Z
On Thu, Sep 5, 2019 at 11:40 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Aug 30, 2019 at 7:05 AM Jeevan Ladhe > <jeevan.ladhe@enterprisedb.com> wrote: > >> Fixed both comments in the attached patch. > > > > Thanks, the patch looks good to me. > > Here's a version of the patch with a further change to the wording of > the comment. I hope this is clearer. > Yep. > > I think this needs to be back-patched all the way back to 9.4, and it > doesn't seem to apply cleanly before v11. Any chance you could > prepare a version for the older branches? > Attached patch for v10 and pre. The same v10 patch applies cleanly. Changes related to the page checksum verification is not present on v10 and pre, and thus those changes are not applicable, so removed those. Also, wal_segment_size is XLogSegSize over there, adjusted that. > > Thanks, > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
-
Re: basebackup.c's sendFile() ignores read errors
Robert Haas <robertmhaas@gmail.com> — 2019-09-06T13:18:03Z
On Fri, Sep 6, 2019 at 2:08 AM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: > Attached patch for v10 and pre. The same v10 patch applies cleanly. > > Changes related to the page checksum verification is not present on v10 and > pre, and thus those changes are not applicable, so removed those. Also, > wal_segment_size is XLogSegSize over there, adjusted that. Thanks. Committed the v3 patch to v11+ and this version to the older branches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company