Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Remove unnecessary #include references, per pgrminclude script.

  2. Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo

  3. Allow more include files to be compiled in their own by adding missing

  4. Add support for #elif to pgrminclude.

  1. pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-01T11:22:43Z

    Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
    builfarm member.
    
    Branch
    ------
    master
    
    Details
    -------
    http://git.postgresql.org/pg/commitdiff/d5321842528dfb73f8254a48556b4adb1b6d1c5a
    
    Modified Files
    --------------
    contrib/cube/cubedata.h |    2 --
    1 files changed, 0 insertions(+), 2 deletions(-)
    
    
  2. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-01T14:10:12Z

    Bruce Momjian <bruce@momjian.us> writes:
    > Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
    > builfarm member.
    
    mongoose is still crashing, so it must have been some other aspect of
    commit 4bd7333 that caused this.
    
    			regards, tom lane
    
    
  3. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-01T14:49:36Z

    Tom Lane wrote:
    > Bruce Momjian <bruce@momjian.us> writes:
    > > Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
    > > builfarm member.
    > 
    > mongoose is still crashing, so it must have been some other aspect of
    > commit 4bd7333 that caused this.
    
    Agreed.  Let me look some more.  Odd this succeeds:
    
    	okapi 	  	Gentoo 1.12.14 icc 11.1.072 x86_64
    
    but this fails:
    
    	mongoose 	Gentoo 1.6.14 icc 9.0.032 i686
    
    The backtrace:
    
    	http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2011-09-01%2013%3A45%3A01
    
    shows it failing on this line:
    
    	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2;
    
    so I wonder if this is some compiler bug.  offsetof is:
    
    	((long) &((type *)0)->field)
    
    and the struct is:
    
    	typedef struct NDBOX
    	{
    	    int32       vl_len_;        /* varlena header (do not touch directly!) */
    	    unsigned int dim;
    	    double      x[1];
    	} NDBOX;
    
    That "x" is quite a common symbol.  Is there any way to get access to
    this machine?  Should I just revert it all and see what happens?
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  4. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Jeremy Drake <pgsql@jdrake.com> — 2011-09-01T15:42:46Z

    On Thu, 1 Sep 2011, Bruce Momjian wrote:
    
    > Tom Lane wrote:
    > > Bruce Momjian <bruce@momjian.us> writes:
    > > > Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
    > > > builfarm member.
    > >
    > > mongoose is still crashing, so it must have been some other aspect of
    > > commit 4bd7333 that caused this.
    >
    > Agreed.  Let me look some more.  Odd this succeeds:
    >
    > 	okapi 	  	Gentoo 1.12.14 icc 11.1.072 x86_64
    >
    > but this fails:
    >
    > 	mongoose 	Gentoo 1.6.14 icc 9.0.032 i686
    >
    > The backtrace:
    >
    > 	http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2011-09-01%2013%3A45%3A01
    >
    > shows it failing on this line:
    >
    > 	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2;
    >
    > so I wonder if this is some compiler bug.  offsetof is:
    >
    > 	((long) &((type *)0)->field)
    >
    > and the struct is:
    >
    > 	typedef struct NDBOX
    > 	{
    > 	    int32       vl_len_;        /* varlena header (do not touch directly!) */
    > 	    unsigned int dim;
    > 	    double      x[1];
    > 	} NDBOX;
    >
    > That "x" is quite a common symbol.  Is there any way to get access to
    > this machine?  Should I just revert it all and see what happens?
    >
    >
    
    I am the owner of both mongoose and okapi.  Let me know if there's
    anything you want me to try.
    
    
    
  5. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-01T18:07:11Z

    Jeremy Drake wrote:
    > On Thu, 1 Sep 2011, Bruce Momjian wrote:
    > 
    > > Tom Lane wrote:
    > > > Bruce Momjian <bruce@momjian.us> writes:
    > > > > Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
    > > > > builfarm member.
    > > >
    > > > mongoose is still crashing, so it must have been some other aspect of
    > > > commit 4bd7333 that caused this.
    > >
    > > Agreed.  Let me look some more.  Odd this succeeds:
    > >
    > > 	okapi 	  	Gentoo 1.12.14 icc 11.1.072 x86_64
    > >
    > > but this fails:
    > >
    > > 	mongoose 	Gentoo 1.6.14 icc 9.0.032 i686
    > >
    > > The backtrace:
    > >
    > > 	http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2011-09-01%2013%3A45%3A01
    > >
    > > shows it failing on this line:
    > >
    > > 	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2;
    > >
    > > so I wonder if this is some compiler bug.  offsetof is:
    > >
    > > 	((long) &((type *)0)->field)
    > >
    > > and the struct is:
    > >
    > > 	typedef struct NDBOX
    > > 	{
    > > 	    int32       vl_len_;        /* varlena header (do not touch directly!) */
    > > 	    unsigned int dim;
    > > 	    double      x[1];
    > > 	} NDBOX;
    > >
    > > That "x" is quite a common symbol.  Is there any way to get access to
    > > this machine?  Should I just revert it all and see what happens?
    > >
    > >
    > 
    > I am the owner of both mongoose and okapi.  Let me know if there's
    > anything you want me to try.
    
    Thanks.  I would either like to email you patches to test or get ssh
    access so I can compile it myself.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  6. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Jeremy Drake <pgsql@jdrake.com> — 2011-09-02T06:48:31Z

    On Thu, 1 Sep 2011, Bruce Momjian wrote:
    
    > Jeremy Drake wrote:
    >
    > > I am the owner of both mongoose and okapi.  Let me know if there's
    > > anything you want me to try.
    >
    > Thanks.  I would either like to email you patches to test or get ssh
    > access so I can compile it myself.
    
    You can send me patches if you want, but I spent a little time with it
    tonight and it seems to be the change to src/include/access/xlog.h:
    http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/include/access/xlog.h;h=1fd60fb98d7362b677112517a20a41e32227a34f;hp=cdbf63fa76e0e7b154c084191d0df6138e1cbfcc;hb=4bd7333;hpb=d010391ac8f706e17998671534ca1230f68d2f38
    
    Unfortunately, I also had to revert commit
    6416a82a62db4e66b2edb0fa8fc83a580c3f1931 to fix compile errors.  I expect
    you would be able to do something a little more surgical than that...
    
    
    
  7. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-02T15:20:50Z

    Jeremy Drake wrote:
    > On Thu, 1 Sep 2011, Bruce Momjian wrote:
    > 
    > > Jeremy Drake wrote:
    > >
    > > > I am the owner of both mongoose and okapi.  Let me know if there's
    > > > anything you want me to try.
    > >
    > > Thanks.  I would either like to email you patches to test or get ssh
    > > access so I can compile it myself.
    > 
    > You can send me patches if you want, but I spent a little time with it
    > tonight and it seems to be the change to src/include/access/xlog.h:
    > http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/include/access/xlog.h;h=1fd60fb98d7362b677112517a20a41e32227a34f;hp=cdbf63fa76e0e7b154c084191d0df6138e1cbfcc;hb=4bd7333;hpb=d010391ac8f706e17998671534ca1230f68d2f38
    > 
    > Unfortunately, I also had to revert commit
    > 6416a82a62db4e66b2edb0fa8fc83a580c3f1931 to fix compile errors.  I expect
    > you would be able to do something a little more surgical than that...
    
    Wow, that is interesting.  So the problem is the inclusion of
    replication/walsender.h in xlog.h.  Hard to see how that could cause the
    cube regression tests to fail, but of course, it is.
    
    I noticed you are using these compile flags:
    
    	'CFLAGS' => '-O3 -xN -parallel -ip',
    	'CC' => 'icc'
    
    Can you test it with lower optimizations?
    
    I looked at the contrib/cube compile messages and didn't see anything
    unusual. 
    
    The only other idea I have is to try the attached patch which changes
    the offsetof() call to mention a struct field name, and not the first
    element of the field.  However, I see other uses of accessing the
    element of a struct field, so I might be wrong here.
    
    I will say that our buildfarm is great at giving developers information
    to diagnose the cause of failures!  It just isn't helping me find the
    cause in this particular case.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
  8. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Alvaro Herrera <alvherre@commandprompt.com> — 2011-09-02T15:38:28Z

    Excerpts from Bruce Momjian's message of vie sep 02 12:20:50 -0300 2011:
    
    > The only other idea I have is to try the attached patch which changes
    > the offsetof() call to mention a struct field name, and not the first
    > element of the field.  However, I see other uses of accessing the
    > element of a struct field, so I might be wrong here.
    
    I wonder if this would be the right time to start using the
    FLEXIBLE_ARRAY_MEMBER stuff in contrib/cube.  Note pg_config.h.in says
    
    /* Define to nothing if C supports flexible array members, and to 1 if it does
       not. That way, with a declaration like `struct s { int n; double
       d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
       compilers. When computing the size of such an object, don't use 'sizeof
       (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
       instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
       MSVC and with C++ compilers. */
    #undef FLEXIBLE_ARRAY_MEMBER
    
    -- 
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
    
    
  9. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-02T16:11:00Z

    Alvaro Herrera <alvherre@commandprompt.com> writes:
    > I wonder if this would be the right time to start using the
    > FLEXIBLE_ARRAY_MEMBER stuff in contrib/cube.  Note pg_config.h.in says
    
    > /* Define to nothing if C supports flexible array members, and to 1 if it does
    >    not. That way, with a declaration like `struct s { int n; double
    >    d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
    >    compilers. When computing the size of such an object, don't use 'sizeof
    >    (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
    >    instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
    >    MSVC and with C++ compilers. */
    
    D'oh ... I bet that last sentence is pointing us at the problem.  cube
    is using exactly that construct, and for some reason it's crashing.
    The most likely explanation for why it's crashing is that the compiler
    is trying to dereference NULL instead of successfully reducing
    offsetof() to a compile-time constant.  It's still not too clear to me
    why the inclusion changes cause that, but certainly walsender.h is
    pulling in a crapload of other stuff that was not previously included
    there ... which connects back to my previous complaints that I think
    Bruce was way too aggressive in adding #includes to headers.
    
    Jeremy, could you look at the preprocessor output for cube.c (ie,
    use -E instead of -c in the gcc call) and see how the relevant line
    of cube_f8_f8 looks in both broken and non-broken cases?  What I see
    on a Fedora box is
    
     size = __builtin_offsetof (NDBOX, x[0]) +sizeof(double) * 2;
    
    but I'm thinking you might be getting something different.
    
    			regards, tom lane
    
    
  10. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-02T16:44:28Z

    Tom Lane wrote:
    > Alvaro Herrera <alvherre@commandprompt.com> writes:
    > > I wonder if this would be the right time to start using the
    > > FLEXIBLE_ARRAY_MEMBER stuff in contrib/cube.  Note pg_config.h.in says
    > 
    > > /* Define to nothing if C supports flexible array members, and to 1 if it does
    > >    not. That way, with a declaration like `struct s { int n; double
    > >    d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
    > >    compilers. When computing the size of such an object, don't use 'sizeof
    > >    (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
    > >    instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
    > >    MSVC and with C++ compilers. */
    > 
    > D'oh ... I bet that last sentence is pointing us at the problem.  cube
    > is using exactly that construct, and for some reason it's crashing.
    > The most likely explanation for why it's crashing is that the compiler
    > is trying to dereference NULL instead of successfully reducing
    > offsetof() to a compile-time constant.  It's still not too clear to me
    > why the inclusion changes cause that, but certainly walsender.h is
    > pulling in a crapload of other stuff that was not previously included
    > there ... which connects back to my previous complaints that I think
    > Bruce was way too aggressive in adding #includes to headers.
    > 
    > Jeremy, could you look at the preprocessor output for cube.c (ie,
    > use -E instead of -c in the gcc call) and see how the relevant line
    > of cube_f8_f8 looks in both broken and non-broken cases?  What I see
    > on a Fedora box is
    > 
    >  size = __builtin_offsetof (NDBOX, x[0]) +sizeof(double) * 2;
    > 
    > but I'm thinking you might be getting something different.
    
    I see 35 instances of this coding, and only 12 are in contrib/cube; 
    examples attached.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
  11. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-02T16:52:05Z

    Bruce Momjian <bruce@momjian.us> writes:
    > Tom Lane wrote:
    >> Alvaro Herrera <alvherre@commandprompt.com> writes:
    >>> instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
    >>> MSVC and with C++ compilers. */
    
    >> D'oh ... I bet that last sentence is pointing us at the problem.  cube
    >> is using exactly that construct, and for some reason it's crashing.
    
    > I see 35 instances of this coding, and only 12 are in contrib/cube; 
    > examples attached.
    
    Yeah, so the next question would be why those other ones aren't showing
    problems.  But at least now we have a potential mechanism for getting
    from "the include list changed" to "cube is crashing on an offsetof",
    namely that something is affecting the expansion of the offsetof macro.
    Up to now it's been black magic, and I don't like patching around
    problems we don't understand any better than that.
    
    			regards, tom lane
    
    
  12. Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Alvaro Herrera <alvherre@commandprompt.com> — 2011-09-02T16:53:12Z

    Excerpts from Bruce Momjian's message of vie sep 02 12:20:50 -0300 2011:
    
    > Wow, that is interesting.  So the problem is the inclusion of
    > replication/walsender.h in xlog.h.  Hard to see how that could cause the
    > cube regression tests to fail, but of course, it is.
    
    Hmm, so you included walsender.h into xlog.h?  That seems a bit funny
    considering that walsender.h already includes xlog.h.  It seems the
    reason for this is only the AllowCascadeReplication() definition.  Maybe
    that should go elsewhere instead, for example walsender.h?
    
    I wonder if there should be a new header, something like
    walsender_internal.h, for stuff like WalSnd and WalSndCtlData struct
    defs.
    
    -- 
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
    
    
  13. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Jeremy Drake <pgsql@jdrake.com> — 2011-09-03T05:54:33Z

    On Fri, 2 Sep 2011, Tom Lane wrote:
    
    > Yeah, so the next question would be why those other ones aren't showing
    > problems.  But at least now we have a potential mechanism for getting
    > from "the include list changed" to "cube is crashing on an offsetof",
    > namely that something is affecting the expansion of the offsetof macro.
    > Up to now it's been black magic, and I don't like patching around
    > problems we don't understand any better than that.
    
    
    I'm going to try answering about 3 emails at once here, so I apologize for
    the confusion.
    
    Checking preprocessor output on cube_f8_f8 between working and broken, the
    output was identical for that function.
    
    Changing offsetof(NDBOX, x[0]) to offsetof(NDBOX, x), no change in
    behavior.
    
    Then I started investigating the disassembly.  The PIC disassembly was
    making my head hurt (why is the compiler calling the next instruction,
    popping the return address into a register, and using it as a pointer?
    Oh yeah, PIC, duh!), but I'm pretty sure that what it crashed on was
    attempting to access the global external variable CurrentMemoryContext.
    The odd thing is, that the disassembly code between the working and
    non-working was the same, except for the offsets.
    
    Here's the disassembly output from the core dump:
    Program terminated with signal 11, Segmentation fault.
    #0  cube_f8_f8 () at cube.c:1435
    1435		size = offsetof(NDBOX, x) +sizeof(double) * 2;
    (gdb) set disassembly-flavor intel
    (gdb) disass
    Dump of assembler code for function cube_f8_f8:
       0xb776ab50 <+0>:	push   ebp
       0xb776ab51 <+1>:	mov    ebp,esp
       0xb776ab53 <+3>:	and    esp,0xfffffff8
       0xb776ab56 <+6>:	push   ebx
       0xb776ab57 <+7>:	sub    esp,0x14
       0xb776ab5a <+10>:	mov    ecx,DWORD PTR [ebp+0x8]
       0xb776ab5d <+13>:	mov    DWORD PTR [esp+0x10],ebx
       0xb776ab61 <+17>:	mov    edx,DWORD PTR [ecx+0x14]
       0xb776ab64 <+20>:	movsd  xmm0,QWORD PTR [edx]
       0xb776ab68 <+24>:	mov    edx,DWORD PTR [ecx+0x18]
       0xb776ab6b <+27>:	movsd  xmm1,QWORD PTR [edx]
       0xb776ab6f <+31>:	movsd  QWORD PTR [esp],xmm0
       0xb776ab74 <+36>:	movsd  QWORD PTR [esp+0x8],xmm1
       0xb776ab7a <+42>:	push   0x18
       0xb776ab7c <+44>:	call   0xb776ab81 <cube_f8_f8+49>
       0xb776ab81 <+49>:	pop    eax
       0xb776ab82 <+50>:	add    eax,0x9472
       0xb776ab87 <+55>:	mov    edx,DWORD PTR [eax-0x58]
    => 0xb776ab8d <+61>:	push   DWORD PTR [edx]
       0xb776ab8f <+63>:	mov    DWORD PTR [esp+0x18],ebx
       0xb776ab93 <+67>:	mov    ebx,eax
       0xb776ab95 <+69>:	call   0xb776745c <MemoryContextAllocZero@plt>
    
    
    
    With this knowledge, I thought maybe cube wasn't the actual problem, but
    just happened to be early in the list of contrib modules being tested.  So
    I arbitrarily picked another contrib module to test, intarray.
    
    ============== running regression test queries        ==============
    test _int                     ... FAILED (test process exited with exit
    code 2)
    
    Program terminated with signal 11, Segmentation fault.
    #0  0xb785a1fa in g_intbig_union ()
       from
    /buildfarm/test/pgsql_test/contrib/intarray/tmp_check/install/usr/local/pgsql/lib/_int.so
    (gdb) set disassembly-flavor intel
    (gdb) disass
    Dump of assembler code for function g_intbig_union:
    <SNIP UNRELATED ASSEMBLY CODE>
       0xb785a087 <+31>:	call   0xb785a08c <g_intbig_union+36>
       0xb785a08c <+36>:	pop    eax
       0xb785a08d <+37>:	add    eax,0x6f67
       0xb785a092 <+42>:	mov    DWORD PTR [esp+0x104],eax
    <SNIP MORE UNRELATED ASSEMBLY CODE (this is a much longer function)>
       0xb785a1e5 <+381>:	mov    eax,DWORD PTR [esp+0x104]
       0xb785a1ec <+388>:	mov    esi,DWORD PTR [eax-0x24]
       0xb785a1f2 <+394>:	mov    DWORD PTR [esp+0x108],ebx
       0xb785a1f9 <+401>:	push   ebx
    => 0xb785a1fa <+402>:	push   DWORD PTR [esi]
       0xb785a1fc <+404>:	mov    DWORD PTR [esp+0x104],ebx
       0xb785a203 <+411>:	mov    ebx,eax
       0xb785a205 <+413>:	call   0xb7853ef0 <MemoryContextAlloc@plt>
    
    
    Looks pretty familiar, right?
    
    Still, I have no idea why adding an include would cause issues accessing
    CurrentMemoryContext.  But at least we're not blaming offsetof or cube
    anymore...
    
    
    
    
  14. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-03T14:23:09Z

    Jeremy Drake <pgsql@jdrake.com> writes:
    > ... I'm pretty sure that what it crashed on was
    > attempting to access the global external variable CurrentMemoryContext.
    
    Ah-hah, good insight!
    
    > The odd thing is, that the disassembly code between the working and
    > non-working was the same, except for the offsets.
    
    The code seems to be fetching a pointer to CurrentMemoryContext from a
    PC-relative location; presumably that's a literal that the dynamic
    linker is supposed to update at shlib load time.  I guess that pointer
    is not correctly computed in the other case, or else it's fetching the
    wrong pointer value.
    
    > Still, I have no idea why adding an include would cause issues accessing
    > CurrentMemoryContext.
    
    Me either, but at least it's something to work from.  You might try
    diffing the working and non-working -E output from cube.c to see if
    there are any changes that obviously affect CurrentMemoryContext.
    
    			regards, tom lane
    
    
  15. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-03T14:35:41Z

    Did you test with a lower optimization level?  Gentoo is notorious for
    buggy/bleeding edge tools.
    
    ---------------------------------------------------------------------------
    
    Jeremy Drake wrote:
    > On Fri, 2 Sep 2011, Tom Lane wrote:
    > 
    > > Yeah, so the next question would be why those other ones aren't showing
    > > problems.  But at least now we have a potential mechanism for getting
    > > from "the include list changed" to "cube is crashing on an offsetof",
    > > namely that something is affecting the expansion of the offsetof macro.
    > > Up to now it's been black magic, and I don't like patching around
    > > problems we don't understand any better than that.
    > 
    > 
    > I'm going to try answering about 3 emails at once here, so I apologize for
    > the confusion.
    > 
    > Checking preprocessor output on cube_f8_f8 between working and broken, the
    > output was identical for that function.
    > 
    > Changing offsetof(NDBOX, x[0]) to offsetof(NDBOX, x), no change in
    > behavior.
    > 
    > Then I started investigating the disassembly.  The PIC disassembly was
    > making my head hurt (why is the compiler calling the next instruction,
    > popping the return address into a register, and using it as a pointer?
    > Oh yeah, PIC, duh!), but I'm pretty sure that what it crashed on was
    > attempting to access the global external variable CurrentMemoryContext.
    > The odd thing is, that the disassembly code between the working and
    > non-working was the same, except for the offsets.
    > 
    > Here's the disassembly output from the core dump:
    > Program terminated with signal 11, Segmentation fault.
    > #0  cube_f8_f8 () at cube.c:1435
    > 1435		size = offsetof(NDBOX, x) +sizeof(double) * 2;
    > (gdb) set disassembly-flavor intel
    > (gdb) disass
    > Dump of assembler code for function cube_f8_f8:
    >    0xb776ab50 <+0>:	push   ebp
    >    0xb776ab51 <+1>:	mov    ebp,esp
    >    0xb776ab53 <+3>:	and    esp,0xfffffff8
    >    0xb776ab56 <+6>:	push   ebx
    >    0xb776ab57 <+7>:	sub    esp,0x14
    >    0xb776ab5a <+10>:	mov    ecx,DWORD PTR [ebp+0x8]
    >    0xb776ab5d <+13>:	mov    DWORD PTR [esp+0x10],ebx
    >    0xb776ab61 <+17>:	mov    edx,DWORD PTR [ecx+0x14]
    >    0xb776ab64 <+20>:	movsd  xmm0,QWORD PTR [edx]
    >    0xb776ab68 <+24>:	mov    edx,DWORD PTR [ecx+0x18]
    >    0xb776ab6b <+27>:	movsd  xmm1,QWORD PTR [edx]
    >    0xb776ab6f <+31>:	movsd  QWORD PTR [esp],xmm0
    >    0xb776ab74 <+36>:	movsd  QWORD PTR [esp+0x8],xmm1
    >    0xb776ab7a <+42>:	push   0x18
    >    0xb776ab7c <+44>:	call   0xb776ab81 <cube_f8_f8+49>
    >    0xb776ab81 <+49>:	pop    eax
    >    0xb776ab82 <+50>:	add    eax,0x9472
    >    0xb776ab87 <+55>:	mov    edx,DWORD PTR [eax-0x58]
    > => 0xb776ab8d <+61>:	push   DWORD PTR [edx]
    >    0xb776ab8f <+63>:	mov    DWORD PTR [esp+0x18],ebx
    >    0xb776ab93 <+67>:	mov    ebx,eax
    >    0xb776ab95 <+69>:	call   0xb776745c <MemoryContextAllocZero@plt>
    > 
    > 
    > 
    > With this knowledge, I thought maybe cube wasn't the actual problem, but
    > just happened to be early in the list of contrib modules being tested.  So
    > I arbitrarily picked another contrib module to test, intarray.
    > 
    > ============== running regression test queries        ==============
    > test _int                     ... FAILED (test process exited with exit
    > code 2)
    > 
    > Program terminated with signal 11, Segmentation fault.
    > #0  0xb785a1fa in g_intbig_union ()
    >    from
    > /buildfarm/test/pgsql_test/contrib/intarray/tmp_check/install/usr/local/pgsql/lib/_int.so
    > (gdb) set disassembly-flavor intel
    > (gdb) disass
    > Dump of assembler code for function g_intbig_union:
    > <SNIP UNRELATED ASSEMBLY CODE>
    >    0xb785a087 <+31>:	call   0xb785a08c <g_intbig_union+36>
    >    0xb785a08c <+36>:	pop    eax
    >    0xb785a08d <+37>:	add    eax,0x6f67
    >    0xb785a092 <+42>:	mov    DWORD PTR [esp+0x104],eax
    > <SNIP MORE UNRELATED ASSEMBLY CODE (this is a much longer function)>
    >    0xb785a1e5 <+381>:	mov    eax,DWORD PTR [esp+0x104]
    >    0xb785a1ec <+388>:	mov    esi,DWORD PTR [eax-0x24]
    >    0xb785a1f2 <+394>:	mov    DWORD PTR [esp+0x108],ebx
    >    0xb785a1f9 <+401>:	push   ebx
    > => 0xb785a1fa <+402>:	push   DWORD PTR [esi]
    >    0xb785a1fc <+404>:	mov    DWORD PTR [esp+0x104],ebx
    >    0xb785a203 <+411>:	mov    ebx,eax
    >    0xb785a205 <+413>:	call   0xb7853ef0 <MemoryContextAlloc@plt>
    > 
    > 
    > Looks pretty familiar, right?
    > 
    > Still, I have no idea why adding an include would cause issues accessing
    > CurrentMemoryContext.  But at least we're not blaming offsetof or cube
    > anymore...
    > 
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  16. Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-03T14:39:15Z

    Tom Lane wrote:
    > Jeremy Drake <pgsql@jdrake.com> writes:
    > > ... I'm pretty sure that what it crashed on was
    > > attempting to access the global external variable CurrentMemoryContext.
    > 
    > Ah-hah, good insight!
    > 
    > > The odd thing is, that the disassembly code between the working and
    > > non-working was the same, except for the offsets.
    > 
    > The code seems to be fetching a pointer to CurrentMemoryContext from a
    > PC-relative location; presumably that's a literal that the dynamic
    > linker is supposed to update at shlib load time.  I guess that pointer
    > is not correctly computed in the other case, or else it's fetching the
    > wrong pointer value.
    
    This would explain why the regular regression test work but the /contrib
    modules, which do dynamic loading, do not.   Good to know the problem is
    more the contrib/cube.  FYI, I noticed in the contrib/cube failure that
    palloc0() was the next line after the reported crash line.   Are the
    contrib's crashing on the first access of any backend/DLL function?
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  17. Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-04T00:51:13Z

    Alvaro Herrera wrote:
    > Excerpts from Bruce Momjian's message of vie sep 02 12:20:50 -0300 2011:
    > 
    > > Wow, that is interesting.  So the problem is the inclusion of
    > > replication/walsender.h in xlog.h.  Hard to see how that could cause the
    > > cube regression tests to fail, but of course, it is.
    > 
    > Hmm, so you included walsender.h into xlog.h?  That seems a bit funny
    > considering that walsender.h already includes xlog.h.  It seems the
    > reason for this is only the AllowCascadeReplication() definition.  Maybe
    > that should go elsewhere instead, for example walsender.h?
    
    Moved to walsender.h. Thanks.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  18. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-04T01:08:41Z

    Bruce Momjian <bruce@momjian.us> writes:
    > Alvaro Herrera wrote:
    >> Hmm, so you included walsender.h into xlog.h?  That seems a bit funny
    >> considering that walsender.h already includes xlog.h.  It seems the
    >> reason for this is only the AllowCascadeReplication() definition.  Maybe
    >> that should go elsewhere instead, for example walsender.h?
    
    > Moved to walsender.h. Thanks.
    
    You seem to have entirely missed the point of Alvaro's remark, which is
    that you've got xlog.h including walsender.h (still) as well as
    walsender.h including xlog.h.  That's broken.
    
    			regards, tom lane
    
    
  19. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-04T01:25:17Z

    Tom Lane wrote:
    > Bruce Momjian <bruce@momjian.us> writes:
    > > Alvaro Herrera wrote:
    > >> Hmm, so you included walsender.h into xlog.h?  That seems a bit funny
    > >> considering that walsender.h already includes xlog.h.  It seems the
    > >> reason for this is only the AllowCascadeReplication() definition.  Maybe
    > >> that should go elsewhere instead, for example walsender.h?
    > 
    > > Moved to walsender.h. Thanks.
    > 
    > You seem to have entirely missed the point of Alvaro's remark, which is
    > that you've got xlog.h including walsender.h (still) as well as
    > walsender.h including xlog.h.  That's broken.
    
    Oh, OK, done.  xlog.h removed from walsender.h and tested.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  20. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-04T01:48:38Z

    Bruce Momjian <bruce@momjian.us> writes:
    > Tom Lane wrote:
    >> You seem to have entirely missed the point of Alvaro's remark, which is
    >> that you've got xlog.h including walsender.h (still) as well as
    >> walsender.h including xlog.h.  That's broken.
    
    > Oh, OK, done.  xlog.h removed from walsender.h and tested.
    
    I would have gone the other way on that one, if possible.  Seems like
    xlog.h ought to be the lower-level file.
    
    Some quick mechanical analysis shows that's not the only circular
    inclusion path in our headers, either: we have storage/proc.h including
    replication/syncrep.h which includes storage/proc.h.  That one appears
    to be Simon's fault not yours though.
    
    			regards, tom lane
    
    
  21. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-04T04:30:43Z

    Tom Lane wrote:
    > Bruce Momjian <bruce@momjian.us> writes:
    > > Tom Lane wrote:
    > >> You seem to have entirely missed the point of Alvaro's remark, which is
    > >> that you've got xlog.h including walsender.h (still) as well as
    > >> walsender.h including xlog.h.  That's broken.
    > 
    > > Oh, OK, done.  xlog.h removed from walsender.h and tested.
    > 
    > I would have gone the other way on that one, if possible.  Seems like
    > xlog.h ought to be the lower-level file.
    
    Agreed.  Let me work on that.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  22. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-04T05:18:01Z

    Bruce Momjian <bruce@momjian.us> writes:
    > Tom Lane wrote:
    >> I would have gone the other way on that one, if possible.  Seems like
    >> xlog.h ought to be the lower-level file.
    
    > Agreed.  Let me work on that.
    
    Uh, I just did it.  Painful.  It would have been a lot easier before
    the pgrminclude run, because that baked-in a whole lot of bad decisions.
    
    			regards, tom lane
    
    
  23. Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Jeremy Drake <pgsql@jdrake.com> — 2011-09-04T07:13:53Z

    On Sun, 4 Sep 2011, Tom Lane wrote:
    
    > Jeremy Drake <jeremyd@jdrake.com> writes:
    > > I didn't see any changes that looked like they affected
    > > CurrentMemoryContext, but I attached the compressed context diff in case
    > > you want to look at it.
    >
    > Right now I have a feeling that this is a compiler bug.
    
    That's my feeling, also.
    
    > Don't know
    > whether you have the interest/energy to try to reduce it to a reportable
    > test case.
    
    If you mean reporting it to the compiler vendor (Intel), I doubt that
    would be worthwhile.  The version of the compiler on this machine is very
    out of date.  It is version 9.0 20060222.  I would bet that if I did track
    down and report an issue in a 5-year-old compiler version, their first
    question would be, does the issue duplicate in the current version.  Given
    that my other buildfarm member is running 11.1 20100414 (albeit for the
    x64 platform instead of the x86) and had no issue, I would expect that the
    current x86 version would also have no problem.
    
    I have intentionally been keeping the compiler versions on my buildfarm
    members pretty much fixed, for the benefit of reproducable results.
    However, I would be interested in hearing any guidelines on "how old is
    too old" for buildfarm member versions.
    
    
    
  24. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-04T07:29:00Z

    Jeremy Drake <pgsql@jdrake.com> writes:
    > On Sun, 4 Sep 2011, Tom Lane wrote:
    >> Right now I have a feeling that this is a compiler bug.
    
    > That's my feeling, also.
    
    >> Don't know
    >> whether you have the interest/energy to try to reduce it to a reportable
    >> test case.
    
    > If you mean reporting it to the compiler vendor (Intel), I doubt that
    > would be worthwhile.  The version of the compiler on this machine is very
    > out of date.  It is version 9.0 20060222.  I would bet that if I did track
    > down and report an issue in a 5-year-old compiler version, their first
    > question would be, does the issue duplicate in the current version.  Given
    > that my other buildfarm member is running 11.1 20100414 (albeit for the
    > x64 platform instead of the x86) and had no issue, I would expect that the
    > current x86 version would also have no problem.
    
    > I have intentionally been keeping the compiler versions on my buildfarm
    > members pretty much fixed, for the benefit of reproducable results.
    > However, I would be interested in hearing any guidelines on "how old is
    > too old" for buildfarm member versions.
    
    Well, I'm still running this on one development machine:
    
    $ gcc -v
    Reading specs from /usr/local/lib/gcc-lib/hppa2.0-hp-hpux10.20/2.95.3/specs
    gcc version 2.95.3 20010315 (release)
    
    so I'm not in the "it's too old" camp.  My perspective on tool bugs
    is whether there is something (a) well defined and (b) not costly
    that we can do to work around it.  So far, this issue is failing
    test (a) ... we know that one header inclusion order triggers the
    crash and another not, but there is no theory as to why.
    
    What I would suggest is to see whether a more recent x86 version shows
    the problem or not.  If not, let's just write it off as an already-fixed
    compiler bug.
    
    			regards, tom lane
    
    
  25. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Jeremy Drake <pgsql@jdrake.com> — 2011-09-05T07:07:49Z

    On Sun, 4 Sep 2011, Tom Lane wrote:
    
    > What I would suggest is to see whether a more recent x86 version shows
    > the problem or not.  If not, let's just write it off as an already-fixed
    > compiler bug.
    
    I have installed the most recent version in the home directory of a
    purpose-made user on that machine.
    
    configure:3252: icc --version >&5
    icc (ICC) 12.0.5 20110719
    Copyright (C) 1985-2011 Intel Corporation.  All rights reserved.
    
    I did
    git checkout 6416a82a62db4e66b2edb0fa8fc83a580c3f1931
    to get a revision I knew was right in the broken range for mongoose.
    
    Apparently they deprecated one of my compiler flags: -xN (N is for
    Nocona), seems they renamed it to -xSSE2.  Since this is a one-off run, I
    ignored that warning.
    
    The result is no crash in the cube test.
    
    I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
    and if the issue duplicates there, I can see about setting up SSH access
    if anyone is still interested in investigating this further.
    
    
    
  26. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-05T13:45:09Z

    Jeremy Drake wrote:
    > On Sun, 4 Sep 2011, Tom Lane wrote:
    > 
    > > What I would suggest is to see whether a more recent x86 version shows
    > > the problem or not.  If not, let's just write it off as an already-fixed
    > > compiler bug.
    > 
    > I have installed the most recent version in the home directory of a
    > purpose-made user on that machine.
    > 
    > configure:3252: icc --version >&5
    > icc (ICC) 12.0.5 20110719
    > Copyright (C) 1985-2011 Intel Corporation.  All rights reserved.
    > 
    > I did
    > git checkout 6416a82a62db4e66b2edb0fa8fc83a580c3f1931
    > to get a revision I knew was right in the broken range for mongoose.
    > 
    > Apparently they deprecated one of my compiler flags: -xN (N is for
    > Nocona), seems they renamed it to -xSSE2.  Since this is a one-off run, I
    > ignored that warning.
    > 
    > The result is no crash in the cube test.
    > 
    > I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
    > and if the issue duplicates there, I can see about setting up SSH access
    > if anyone is still interested in investigating this further.
    
    What would we investigate except a compiler bug?
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  27. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Jeremy Drake <pgsql@jdrake.com> — 2011-09-05T19:53:55Z

    On Mon, 5 Sep 2011, Bruce Momjian wrote:
    
    > Jeremy Drake wrote:
    > > I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
    > > and if the issue duplicates there, I can see about setting up SSH access
    > > if anyone is still interested in investigating this further.
    >
    > What would we investigate except a compiler bug?
    
    To me, simply chalking it up to some uncharacterized compiler bug is still
    quite a bit of black magic.
    
    But, if that explanation is good enough for you, I've certainly got
    better things to do with my holiday than spending time on this :)
    
    
    
  28. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Bruce Momjian <bruce@momjian.us> — 2011-09-05T19:55:51Z

    Jeremy Drake wrote:
    > On Mon, 5 Sep 2011, Bruce Momjian wrote:
    > 
    > > Jeremy Drake wrote:
    > > > I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
    > > > and if the issue duplicates there, I can see about setting up SSH access
    > > > if anyone is still interested in investigating this further.
    > >
    > > What would we investigate except a compiler bug?
    > 
    > To me, simply chalking it up to some uncharacterized compiler bug is still
    > quite a bit of black magic.
    > 
    > But, if that explanation is good enough for you, I've certainly got
    > better things to do with my holiday than spending time on this :)
    
    If the underlying tools are buggy, the system can't be reliable.   We
    can't invest time to track down every compiler bug, especially when
    there are later compiler versions available.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  29. Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-05T20:19:19Z

    Jeremy Drake <pgsql@jdrake.com> writes:
    > On Mon, 5 Sep 2011, Bruce Momjian wrote:
    >> What would we investigate except a compiler bug?
    
    > To me, simply chalking it up to some uncharacterized compiler bug is still
    > quite a bit of black magic.
    
    If there were some reason to believe either that it wasn't a compiler
    bug, or that there would be something reasonable we could do to work
    around it, then I'd be interested in pressing further.  But on the
    strength of what we have now, neither of those things seem real likely.
    I'm with Bruce on thinking that it's probably not going to repay further
    effort.
    
    			regards, tom lane
    
    
  30. Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

    Alvaro Herrera <alvherre@commandprompt.com> — 2011-09-12T19:11:21Z

    Excerpts from Alvaro Herrera's message of vie sep 02 13:53:12 -0300 2011:
    
    > I wonder if there should be a new header, something like
    > walsender_internal.h, for stuff like WalSnd and WalSndCtlData struct
    > defs.
    
    ... as in the attached patch.
    
    -- 
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support