Thread
-
Re: BUG #17619: AllocSizeIsValid violation in parallel hash join
Dmytro Astapov <dastapov@gmail.com> — 2022-09-22T16:38:00Z
And I think that if on line 322 should be ">" and not ">=" as well. On Thu, 22 Sep 2022, 17:11 Dmitry Astapov, <dastapov@gmail.com> wrote: > Thomas, I think I know what is going on, and it is happening in > sts_puttuple when it needs to write 32760 bytes exacly. All references > below a taken in the tag REL_13_8. > > Lets say that we enter sts_puttuple and on line 321 we compute a size > <https://github.com/postgres/postgres/blob/4bc493d14409857090928ea51c02a20aba8db364/src/backend/utils/sort/sharedtuplestore.c#L321> > of 32760 bytes. Lets say that buffer is currently partially filled so a > write of this size would not fit. > > We proceed to flush chunk on line 338, which resets the write pointer to 8 > bytes past the start of SharedTuplestoreChunk, leaving exactly 32760 bytes > free. > > We take the if on line 342 > <https://github.com/postgres/postgres/blob/4bc493d14409857090928ea51c02a20aba8db364/src/backend/utils/sort/sharedtuplestore.c#L342> (the > one with the comment "It still might not be enough space", and then proceed > to write whole 32760 bytes and reach the line 373 with size == 0. > > It looks like this codepath is written with assumption that size >0 here > and sts_flush_chunk would be called immediately after inside the loop on > line 375. But because size == 0, loop will not execute, sts_flush_chunk > will not be called, and since we haven't updated write pointer so far, it > will stay at the beginning of the chunk. At this point we exit > sts_puttuple, and next call of sts_putttuple would overwrite the tuple we > just memcopied into the chank. If this new tuple is shorter, then remaining > part of the chunk would be garbage from the overwritten tuple. > > I think >= on line 342 should be just > . I tried this change locally, and > it fixed the issue for me. > > Do you agree with my analysis? > > On Thu, Sep 22, 2022 at 1:51 PM Dmitry Astapov <dastapov@gmail.com> wrote: > >> >> >> On Thu, Sep 22, 2022 at 9:44 AM Thomas Munro <thomas.munro@gmail.com> >> wrote: >> >>> On Thu, Sep 22, 2022 at 7:46 PM PG Bug reporting form >>> <noreply@postgresql.org> wrote: >>> > (gdb) p size >>> > $2 = 1702125924 >>> >>> Thanks for the detailed report. Hmm. That size, on a little-endian >>> system, is equivalent to the byte sequence "date\0\0\0\0", which looks >>> pretty suspiciously like the inside of a tuple, and not its size. We >>> must have got out of sync somehow. >>> >> >> Great catch! I added some debugging output to the BufFileRead and >> sts_read_tuple and it looks to me that the sequence of events immediately >> before the problem happens is like this: >> >> >> 1. sts_read_tuple is called multiple times to read from a particular >> file, and a bunch of tuples are read this way >> 2. sts_read_tuple is called once more. In my experiments, I always >> had read_ntuples_available - 2 == read_ntuples at this point, but maybe >> this is just a coincidence. >> 1. inside this sts_read_tuple_call, BufFileRead is called to read >> metadata, and succeeds (and this chunk of the file looks -- to my naive eye >> -- exactly like any other tuple start) >> 2. Second BufFileRead call happens to get the size, and reads the >> correct size (I see what's in the file, I could find corresponding values >> in the database tables, and it all lines up). >> 3. Third BufFileRead happens to read the actual tuple. So far so >> good, and sts_read_tuple finishes. >> 3. At this point, the position in the file (buf_file -> currOffset + >> post) points right after the last tuple read, but what's there does not >> look like the start of the next tuple to me. It seems like there was >> another, longer tuple written there before (with quite long rhs.payload >> value) and somehow the tuple that was just read was written over it, so we >> have "<tuple read on step 2 above><current read position here><what looks >> like leftovers from another tuple><a bunch of zeroes><new tuple start>" >> 4. So another sts_read_tuple call happens, and it reads 4 bytes of >> this garbage as metadata, and the next 4 bytes as length, and this is when >> everything derails... >> >> I also double-checked at the logs of the (test) database where this issue >> was first spotted (and where a lot of queries similar to the one I am using >> to trigger this issue are happening) and found out that along with "invalid >> memory alloc request size" we also got a couple of occasions of "Unexpected >> chunk in shared tuplestore temporary file. Expected overflow chunk", which >> looks like another error that could be potentially explained by >> BufFileRead's inside sts_read_tuple reading "garbage". >> >> I'm happy to build a patched version of 13.8 (or another revision if you >> want me to) to collect more debug information for you if this could help. >> >> -- >> D. Astapov >> > > > -- > D. Astapov >