Thread

  1. Re: [HACKERS] Problem after removal of exec(), help

    Goran Thyni <goran@bildbasen.se> — 1998-06-23T11:11:26Z

    Bruce Momjian wrote:
    > 
    > Since the removal of exec(), Thomas has seen, and I have confirmed that
    > if a backend crashes, and the postmaster must reset the shared memory,
    > no backends can connect anymore.  One way to reproduce it is to run the
    > regression tests, which on their last test will crash for an un-related
    > reason.  However, it will not allow you to restart any more backends.
    > 
    > The error it gets is:
    > 
    > Failed Assertion("!((((unsigned long)nextElem) > ShmemBase)):", File: "shmqueue.
    > c", Line: 83)
    > !((((unsigned long)nextElem) > ShmemBase)) (0) [No such file or directory]
    > 
    > In this case nextElem = ShmemBase, so it is not greater.  Removing the
    > Assert() still does not make things work, so there must be something
    > else.
    > 
    > Now, the problem is probably not at that exact spot, but somewhere
    > deeper.  There are two differences between the old non-exec() behavior
    > and new behavior.  In the old setup, the backend had all its global
    > variables initialized, while in the new no-exec case, they take the
    > global variable values from the postmaster.  Second, the old setup had
    > each backend attaching to the shared memory, while the new setup has
    > them inheriting the shared memory from the fork().
    
    Bruce,
    I have not look into it the specifics yet,
    but I suggest looking into what is done when
    the child process exits.
    This (the pg_exit() et al.) caused some bugs
    when we introduced unix domain sockets and
    it is not the first place one looks. :-(
    
    	regards,
    -- 
    ---------------------------------------------
    Göran Thyni, sysadm, JMS Bildbasen, Kiruna