Thread
-
Re: [HACKERS] Problem after removal of exec(), help
Goran Thyni <goran@bildbasen.se> — 1998-06-23T11:11:26Z
Bruce Momjian wrote: > > Since the removal of exec(), Thomas has seen, and I have confirmed that > if a backend crashes, and the postmaster must reset the shared memory, > no backends can connect anymore. One way to reproduce it is to run the > regression tests, which on their last test will crash for an un-related > reason. However, it will not allow you to restart any more backends. > > The error it gets is: > > Failed Assertion("!((((unsigned long)nextElem) > ShmemBase)):", File: "shmqueue. > c", Line: 83) > !((((unsigned long)nextElem) > ShmemBase)) (0) [No such file or directory] > > In this case nextElem = ShmemBase, so it is not greater. Removing the > Assert() still does not make things work, so there must be something > else. > > Now, the problem is probably not at that exact spot, but somewhere > deeper. There are two differences between the old non-exec() behavior > and new behavior. In the old setup, the backend had all its global > variables initialized, while in the new no-exec case, they take the > global variable values from the postmaster. Second, the old setup had > each backend attaching to the shared memory, while the new setup has > them inheriting the shared memory from the fork(). Bruce, I have not look into it the specifics yet, but I suggest looking into what is done when the child process exits. This (the pg_exit() et al.) caused some bugs when we introduced unix domain sockets and it is not the first place one looks. :-( regards, -- --------------------------------------------- Göran Thyni, sysadm, JMS Bildbasen, Kiruna