Thread
-
Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)
Tom Lane <tgl@sss.pgh.pa.us> — 1999-05-09T16:58:13Z
Wayne Piekarski <wayne@senet.com.au> writes: > Currently, I start up postmaster with -B 192, which I guess puts it below > the value of 256 which causes problems. Apart from when I got past 256 > buffers, does the patch fix anything else that might be causing problems? Yes: if you have more than 256 active locks then you also will have problems with the unpatched code. I don't know how to relate that to any easily-measured external information, but since you have a bunch of concurrently running backends it seems possible that you are running into locktable problems. Anyway I do urge you to apply the patch and see if things get better. > however we have 5 tables which get vacuum at midnight each day, we drop > all the indexes, vacuum, then recreate. If we don't do the index thing, > the vacuum can take tens of minutes, which is not acceptable Yup, it takes an unreasonable amount of time to vacuum an index for a table in which a lot of rows have been deleted. The drop/recreate hack is a good workaround for now. (Has anyone looked into why vacuum is so slow in this case?) > For large tables, when I perform joins, I repeatedly get hash table out of > memory errors. So I have two tables, one called unix, with 20000 rows, and > another called services, with 80000 rows - I am producing a result which > contains about 20000 rows in it as well, so there is lots of data moving > around. Yes, the hashtable code needs work. As a short-term workaround, you might try disabling hashjoins entirely, which you can do by passing the debug option "-fh" ("forbid hash"). For example, setenv PGOPTIONS "-fh" psql dbase The optimizer will then usually choose mergejoins, which are reasonable in performance if you have indexes on the columns being joined by. (There ought to be a SET variable that controls this, but there isn't.) > In most cases, the problem occurs when the optimiser mistakenly choses to > use seq scan rather than index scan. To get around these problems, we > initially tried increasing the -B value to larger values -B doesn't have any direct influence on the optimizer's choices, AFAIR. > set COST_INDEX = '0'; set COST_HEAP = '99999999'; > The optimiser then used index scan for almost anything where possible, the > explain output looked really expensive, but the queries actually executed > properly even with small -B values. So this is what I do to make these big > queries work. There are a few cases where the above set statements > actually cause hash table out of memory as well, so you set them back to > the defaults and then it usually works ok :) Again, that doesn't directly prevent the optimizer from using hash joins, it just skews the cost estimates so that index scans will be used in preference to sequential scans, whereupon you get silly plans like the one you quoted: > Hash Join (cost=30000000.00 size=43361 width=20) > -> Index Scan using unix_snum_inv_index on unix > (cost=20311001006080.00 size=20311 width=16) > -> Hash (cost=0.00 size=0 width=0) > -> Index Scan using services_snum_inv_index on services > (cost=43360999964672.00 size=43361 width=4) This is silly because hash join doesn't care whether its inputs are sorted or not --- the extra cost of scanning the index is just being wasted here. (Unless you have WHERE conditions that can be combined with the index to allow not scanning the whole table. From the size estimates it looks like that might be happening for services, but not for the unix table, so the index scan on unix is definitely a waste of time.) Using an index scan *is* a good idea for merge join, on the other hand, because merge join requires sorted inputs. (So if you don't have a suitable index, the plan will have to include an explicit sort step before the join.) > One other problem related to the shared memory buffers is every so often, > the postmaster will die with shared memory errors, and device full. Er, could you quote the exact error reports? This is too vague to allow any conclusions. > Quick question: when you do a query with a join why does the hash table > code need to use shared memory? Can't it do the join within its own memory > space? It doesn't use shared memory. It's just that for historical reasons, the amount of private memory allocated for a hashjoin table is the same as the amount of shared memory allocated for buffers (ie, the -B switch). I've been thinking of changing it to be driven by the -S switch instead, since that seems to make more sense. regards, tom lane -
Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)
Wayne Piekarski <wayne@senet.com.au> — 1999-05-10T12:30:44Z
Hi everyone! Tom Lane Writes: > Wayne Piekarski <wayne@senet.com.au> writes: > > Currently, I start up postmaster with -B 192, which I guess puts it below > > the value of 256 which causes problems. Apart from when I got past 256 > > buffers, does the patch fix anything else that might be causing problems? > > Yes: if you have more than 256 active locks then you also will have > problems with the unpatched code. I don't know how to relate that to > any easily-measured external information, but since you have a bunch of > concurrently running backends it seems possible that you are running > into locktable problems. Anyway I do urge you to apply the patch and > see if things get better. Ok, that is cool. I will try the patch out in the next few days and get back with my results. I'll try to stress the machine by bogging it down a bit with I/O, to see if I can cause the problem to happen on demand, so I should hopefully be able to give you a definite answer on if it is the solution or not. > > however we have 5 tables which get vacuum at midnight each day, we drop > > all the indexes, vacuum, then recreate. If we don't do the index thing, > > the vacuum can take tens of minutes, which is not acceptable > > Yup, it takes an unreasonable amount of time to vacuum an index for a > table in which a lot of rows have been deleted. The drop/recreate hack > is a good workaround for now. (Has anyone looked into why vacuum is so > slow in this case?) I don't understand the source code, but I've done some experiments and found out the following (with 6.4.2 - no comment about 6.5): I have this table where each row gets updated about 2-3 times. So my picture of the disk file is this massive file with 80000 rows, of which only 20000 of them are the real ones, the rest of them being holes that can't get used. This takes *ages* to vacuum (with indexes that is) - BTW, my impression of ages is a couple of minutes, so this isn't taking an hour to run. However, I do have 5 tables to vacuum, so it ends up taking near 10 minutes! The performance of vacuum was a real worry as sometimes things were taking ages, so I added some debugging code to src/backend/commands/vacuum.c to do the following: * I had problems with the elapsed times from getrusage() not truly indicating how long the vacuum took so I added calls to time(NULL) to produce stats for real life seconds. * I wanted to know where it was spending most of its time, so I added elog statements each time I entered or left the scan, index, etc functions in vacuum.c as well. I used this to work out where I was losing all my time. Here is the output: DEBUG: ===> VACUUM non-verbose analyze test_table STARTS <=== DEBUG: Table test_table - Vacuum analyze begins DEBUG: Table test_table - Initial stats complete at 0 secs DEBUG: Table test_table - vc_scanheap starts DEBUG: Table test_table - vc_scanheap ends - Pages 791: Changed 82, Reapped 787, Empty 0, New 0; Tup 20111: Vac 64425, Crash 0, UnUsed 0, MinLen 72, Max Len 72; Re-using: Free/Avail. Space 4679524/4679524; EndEmpty/Avail. Pages 0/787. Elapsed 0/0 sec. Real 1 sec. DEBUG: Table test_table - Heap scan complete at 1 secs DEBUG: Index test_table_seid_index: vc_vaconeind begins DEBUG: Index test_table_seid_index: vc_vaconeind ends - Pages 252; Tuples 20111: Deleted 64425. Elapsed 0/26 sec. Real 37 sec. DEBUG: Index test_table_id_index: vc_vaconeind begins DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: Index test_table_id_index: vc_vaconeind ends - Pages 252; Tuples 20111: Deleted 64425. Elapsed 0/27 sec. Real 31 sec. DEBUG: Index test_table_bid_seid_index: vc_vaconeind begins DEBUG: Index test_table_bid_seid_index: vc_vaconeind ends - Pages 332; Tuples 20111: Deleted 64425. Elapsed 0/2 sec. Real 5 sec. DEBUG: Table test_table - Index clean/scan complete at 74 secs DEBUG: Rel test_table: Pages: 791 --> 188; Tuple(s) moved: 19655. Elapsed 2/5 sec. Real 36 sec. DEBUG: Index test_table_seid_index: vc_vaconeind begins DEBUG: Index test_table_seid_index: vc_vaconeind ends - Pages 253; Tuples 20111: Deleted 19655. Elapsed 0/1 sec. Real 1 sec. DEBUG: Index test_table_id_index: vc_vaconeind begins DEBUG: Index test_table_id_index: vc_vaconeind ends - Pages 252; Tuples 20111: Deleted 19655. Elapsed 0/0 sec. Real 1 sec. DEBUG: Index test_table_bid_seid_index: vc_vaconeind begins DEBUG: Index test_table_bid_seid_index: vc_vaconeind ends - Pages 332; Tuples 20111: Deleted 19655. Elapsed 0/1 sec. Real 1 sec. DEBUG: Table test_table - Processing complete at 114 secs DEBUG: ===> VACUUM non-verbose analyze test_table COMPLETE (115 sec) <=== So vc_vaconeind is taking a while to run, but the wierd part is when processing test_table_bid_seid_index, it took only 5 seconds compared to test_table_seid_index or test_table_id_index. These indexes are all normal ones except for id_index, which is a unique index. So I don't know why they take differing amounts of time, I don't think its a caching thing, because it would have affected the second call for test_table_id_index as well. I've seen cases of where the first index vaconeind is fast and the remaining ones are slower. The wierd part for me is that it processes each index one by one, then the relation, and then it is done. I would have thought it did everything row by row, updating the index as required. Hmmm, this is way beyond me, I thought the above might be useful for someone, although you already may know this so please ignore it then. > Yes, the hashtable code needs work. As a short-term workaround, you > might try disabling hashjoins entirely, which you can do by passing > the debug option "-fh" ("forbid hash"). For example, > setenv PGOPTIONS "-fh" > psql dbase > The optimizer will then usually choose mergejoins, which are reasonable > in performance if you have indexes on the columns being joined by. > (There ought to be a SET variable that controls this, but there isn't.) I have indexes on almost every one of my columns, but in many cases the optimiser always chooses hash joins. Just today, I found a query which wouldn't run with the SET cost hacks and lo and behold it works perfectly with -fh. Do the values for the cost of merge join need changing? The merge join is great, does 6.5 use it more or less than 6.4 for joins with indexes? From what I've read, 6.5 is using more hash joins. I'm going to put the -fh switch into my little toolbox of hacks to try for special occasions when all else fails :) > Again, that doesn't directly prevent the optimizer from using hash > joins, it just skews the cost estimates so that index scans will be used > in preference to sequential scans, whereupon you get silly plans like > the one you quoted: > > > Hash Join (cost=30000000.00 size=43361 width=20) > > -> Index Scan using unix_snum_inv_index on unix > > (cost=20311001006080.00 size=20311 width=16) > > -> Hash (cost=0.00 size=0 width=0) > > -> Index Scan using services_snum_inv_index on services > > (cost=43360999964672.00 size=43361 width=4) > > This is silly because hash join doesn't care whether its inputs are > sorted or not --- the extra cost of scanning the index is just being > wasted here. (Unless you have WHERE conditions that can be combined > with the index to allow not scanning the whole table. From the > size estimates it looks like that might be happening for services, > but not for the unix table, so the index scan on unix is definitely > a waste of time.) I did it as a way of tricking the optimiser into doing something that made it work. When I first tried it, I didn't really understand what I was doing, but when it made my queries run that previously wouldn't work, I just started using it without question because I needed to get out of a jam :) Hmmm, I'll have to go through some more examples to check on this, maybe the above example I gave was wrong .... I did a more complicated query which does a join plus some extra conditions: *** Hash table out of memory Hash Join (cost=4662.25 size=4317 width=320) -> Seq Scan on unix (cost=1212.26 size=20310 width=204) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on services (cost=2204.91 size=4317 width=116) *** Works with set cost hack Hash Join (cost=30000000.00 size=4317 width=320) -> Seq Scan on services (cost=4336099929358336.00 size=4317 width=116) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on unix (cost=2031099966390272.00 size=20310 width=204) *** Works with forbid hash "-fh" Nested Loop (cost=11054.74 size=4317 width=320) -> Seq Scan on services (cost=2204.91 size=4317 width=116) -> Index Scan using unix_snum_inv_index on unix (cost=2.05 size=20310 width=204) The extra conditions is that services.snum = services.id and services.snum = unix.snum and services.invalid = 0 and unix.invalid = 0 So the only difference is that it is scanning them in a different order, which is making it work ok. It could be caused by the size/cost not being correct for the data actually there? I guess with the hash table fixes this problem should go away, although I am thinking about disabling hash joins in a few places to make things more reliable though for now .... Hmmm, something to think about :) Others who are having trouble might want to try this as well? > Using an index scan *is* a good idea for merge join, on the other hand, > because merge join requires sorted inputs. (So if you don't have a > suitable index, the plan will have to include an explicit sort step > before the join.) Is merge sort immune from the hash table out of memory thing? If so, then I can just pay the cost of the sort and live with it, knowing that my queries will always work? > > One other problem related to the shared memory buffers is every so often, > > the postmaster will die with shared memory errors, and device full. > > Er, could you quote the exact error reports? This is too vague to allow > any conclusions. I couldn't find any log files when I wrote the last email, but I had a bad day today, (postmaster failed 3 times in quick succession) - I had waiting backends, then I restarted, then had the problem again, then restarted, then another restart :) and the output was: DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600 NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL DEBUG: DumpAllLocks: xidLook->tag.lock = NULL FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting. FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting. DEBUG: DumpAllLocks: xidLook->tag.lock = NULL At this point, the postmaster is dead, and there are lots of postgres children laying around, all still happy to process queries by the way, which is pretty neat. But bad because there is no postmaster to control them. So I kill everything off, and restart, and everything is ok again (The indexes on a table got messed up, but I fixed that by dropping them and recreating them). The DumpAllLocks implies that things are pretty busy and the deadlock code is performing checks .... Any reason why it prints NULL, I would have thought it would print something there like the name of a lock? And another one: IpcSemaphoreCreate: semget failed (No space left on device) key=5432015, num=16, permission=600 NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting. FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting. Then another one after restarting everything: ERROR: cannot open segment 1 of relation sessions_done_id_index FATAL: s_lock(1800d37c) at bufmgr.c:657, stuck spinlock. Aborting. FATAL: s_lock(1800d37c) at bufmgr.c:657, stuck spinlock. Aborting. NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. This is where I dropped the index sessions_done_id_index. I didn't run the ipcclean program, but I've never had to run it before, and everything always starts up ok. After the third restart, everything was ok. I think the dying was caused by too many things trying to happen at once. Note that the above errors are a normal 6.4.2 with no patches - I am not running Tom's patch yet. The whole affair was cause by me running a program which moves 20000 or so tuples from one table to another with 500000 tuples. It takes a few minutes and thrashes the I/O on the machine - postgres died during this process. > > Quick question: when you do a query with a join why does the hash table > > code need to use shared memory? Can't it do the join within its own memory > > space? > > It doesn't use shared memory. It's just that for historical reasons, > the amount of private memory allocated for a hashjoin table is the same > as the amount of shared memory allocated for buffers (ie, the -B > switch). I've been thinking of changing it to be driven by the -S > switch instead, since that seems to make more sense. Ahhh, cool ... Ok, that explains it then :) Thankyou very much for you help again, I hope the above is useful for everyone to look at. I'm not sure how much has been fixed in 6.5 but I want to make sure that if they still exist in 6.5beta we can flush them out. bye, Wayne ------------------------------------------------------------------------------ Wayne Piekarski Tel: (08) 8221 5221 Research & Development Manager Fax: (08) 8221 5220 SE Network Access Pty Ltd Mob: 0407 395 889 222 Grote Street Email: wayne@senet.com.au Adelaide SA 5000 WWW: http://www.senet.com.au -
[HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another example))
Hiroshi Inoue <inoue@tpf.co.jp> — 1999-05-13T10:28:49Z
Hello all, > > Hi everyone! > > > Tom Lane Writes: > > > Wayne Piekarski <wayne@senet.com.au> writes: > > > Currently, I start up postmaster with -B 192, which I guess > puts it below > > > the value of 256 which causes problems. Apart from when I got past 256 > > > buffers, does the patch fix anything else that might be > causing problems? > > [snip] > > Then another one after restarting everything: > > ERROR: cannot open segment 1 of relation sessions_done_id_index > I got the same error in my test cases. I don't understand the cause of this error. But it seems I found another problem instead. spinlock io_in_progress_lock of a buffer page is not released by operations called by elog() such as ProcReleaseSpins(),ResetBufferPool() etc. For example,the error we have encountered probably occured in ReadBufferWithBufferLock(). When elog(ERROR/FATAL) occurs in smgrread/extend() which is called from ReadBufferWithBufferLock(),smgrread/extend() don't release the io_in_progress_lock spinlock of the page. If other transactions get that page as a free Buffer page,those transactions wait the release of io_in_progress_lock spinlock and would abort with message such as > FATAL: s_lock(1800d37c) at bufmgr.c:657, stuck spinlock. Aborting. > > FATAL: s_lock(1800d37c) at bufmgr.c:657, stuck spinlock. Aborting. Comments ? I don't know details about spinlock stuff. Sorry,if my thought is off the point. And I have another question. It seems elog(FATAL) doesn't release allocated buffer pages. It's OK ? AFAIC elog(FATAL) causes proc_exit(0) and proc_exit() doesn't call ResetBufferPool(). Thanks. Hiroshi Inoue Inoue@tpf.co.jp -
DROP TABLE does not drop a table completely
Hiroshi Inoue <inoue@tpf.co.jp> — 1999-05-17T02:30:45Z
Hi all, This mail is about the original cause of [HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another example)). > -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Hiroshi Inoue > Sent: Thursday, May 13, 1999 7:29 PM > To: Tom Lane; Wayne Piekarski > Cc: pgsql-hackers@postgreSQL.org > Subject: [HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another > example)) > > [snip] > > > > Then another one after restarting everything: > > > > ERROR: cannot open segment 1 of relation sessions_done_id_index > > > > I got the same error in my test cases. > I don't understand the cause of this error. > I got this error message by dropping a table while concurrent transactions inserting rows to the same table. I think other transactions should abort with message "Table does not exist". But in most cases the result is not so. It seems that other transactions could proceed before DROP TABLE command is completed. AFAIC heap_destroy_with_catalog() acquires AccessExclusiveLock and releases the lock inside the function. I think that heap_destroy_with_catalog() (or upper level function) should not release the lock. Comments ? Hiroshi Inoue Inoue@tpf.co.jp
-
Re: [HACKERS] DROP TABLE does not drop a table completely
Vadim Mikheev <vadim@krasnet.ru> — 1999-05-17T07:01:19Z
Hiroshi Inoue wrote: > > > > > > > ERROR: cannot open segment 1 of relation sessions_done_id_index > > > > > > > I got the same error in my test cases. > > I don't understand the cause of this error. > > > > I got this error message by dropping a table while concurrent transactions > inserting rows to the same table. > > I think other transactions should abort with message "Table does not > exist". But in most cases the result is not so. > > It seems that other transactions could proceed before DROP TABLE > command is completed. > > AFAIC heap_destroy_with_catalog() acquires AccessExclusiveLock and > releases the lock inside the function. > > I think that heap_destroy_with_catalog() (or upper level function) should > not > release the lock. You're right - this should be done keeping in mind that DROP is allowed inside BEGIN/END (long transactions), but I'm not sure that this will help generally: does it matter when unlock dropped relation - in heap_destroy_with_catalog() or in commit? The real problem is that heap/index_open open file _before_ acquiring any locks and doesn't check t_xmax of relation/index tuple. I believe that this is old problem. There are another ones, sure. Catalog cache code must be re-designed. Vadim
-
RE: [HACKERS] DROP TABLE does not drop a table completely
Hiroshi Inoue <inoue@tpf.co.jp> — 1999-05-17T07:40:56Z
> -----Original Message----- > From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim > Mikheev > Sent: Monday, May 17, 1999 4:01 PM > To: Hiroshi Inoue > Cc: pgsql-hackers > Subject: Re: [HACKERS] DROP TABLE does not drop a table completely > > > Hiroshi Inoue wrote: > > > > > > > > > > ERROR: cannot open segment 1 of relation sessions_done_id_index > > > > > > > > > > I got the same error in my test cases. > > > I don't understand the cause of this error. > > > > > > > I got this error message by dropping a table while concurrent > transactions > > inserting rows to the same table. > > > > I think other transactions should abort with message "Table does not > > exist". But in most cases the result is not so. > > > > It seems that other transactions could proceed before DROP TABLE > > command is completed. > > > > AFAIC heap_destroy_with_catalog() acquires AccessExclusiveLock and > > releases the lock inside the function. > > > > I think that heap_destroy_with_catalog() (or upper level > function) should > > not > > release the lock. > > You're right - this should be done keeping in mind that DROP is allowed > inside BEGIN/END (long transactions), but I'm not sure that this > will help generally: does it matter when unlock dropped relation - > in heap_destroy_with_catalog() or in commit? Unlocking dropped relation before commit enables other transactions proceed and the transactions regard the relation as still alive before the ^^^^^^^^^ commit of DROP TABLE command(It's right,I think). As a result,those transactions behave strangely,though I don't know more about the reason. Thanks. Hiroshi Inoue Inoue@tpf.co.jp
-
spinlock freeze again
Hiroshi Inoue <inoue@tpf.co.jp> — 1999-05-18T10:36:40Z
Hi all, ProcReleaseSpins() does nothing unless MyProc is set. So both elog(ERROR/FATAL) and proc_exit(0) before InitProcess() don't release spinlocks. Comments ? Hiroshi Inoue Inoue@tpf.co.jp
-
Re: [HACKERS] spinlock freeze again
Bruce Momjian <maillist@candle.pha.pa.us> — 1999-07-07T18:20:35Z
[Charset iso-8859-1 unsupported, filtering to ASCII...] > Hi all, > > ProcReleaseSpins() does nothing unless MyProc is set. > So both elog(ERROR/FATAL) and proc_exit(0) before > InitProcess() don't release spinlocks. Are their any locks acquired before InitProcess()? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
-
RE: [HACKERS] spinlock freeze again
Hiroshi Inoue <inoue@tpf.co.jp> — 1999-07-07T23:50:53Z
> -----Original Message----- > From: Bruce Momjian [mailto:maillist@candle.pha.pa.us] > Sent: Thursday, July 08, 1999 3:21 AM > To: Hiroshi Inoue > Cc: pgsql-hackers > Subject: Re: [HACKERS] spinlock freeze again > > > [Charset iso-8859-1 unsupported, filtering to ASCII...] > > Hi all, > > > > ProcReleaseSpins() does nothing unless MyProc is set. > > So both elog(ERROR/FATAL) and proc_exit(0) before > > InitProcess() don't release spinlocks. > > Are their any locks acquired before InitProcess()? > OidGenLockId spinlock is acquired in InitTransactionLog(). Regards. Hiroshi Inoue Inoue@tpf.co.jp
-
Re: [HACKERS] spinlock freeze again
Bruce Momjian <maillist@candle.pha.pa.us> — 1999-07-08T01:09:12Z
> > Are their any locks acquired before InitProcess()? > > > > OidGenLockId spinlock is acquired in InitTransactionLog(). > > Regards. > Well, seems we have a Proc queue that holds locks, but for these other cases, we don't. We could use the on_shmexit queue to add an cleanup handler once we get the lock, and remove it from the queue once we release the lock. We don't currently have the ability to remove specific queue entries, but we could easily do that. Is the lock failure a problem that happens a lot? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
-
RE: [HACKERS] spinlock freeze again
Hiroshi Inoue <inoue@tpf.co.jp> — 1999-07-08T02:07:10Z
> -----Original Message----- > From: Bruce Momjian [mailto:maillist@candle.pha.pa.us] > Sent: Thursday, July 08, 1999 10:09 AM > To: Hiroshi Inoue > Cc: pgsql-hackers > Subject: Re: [HACKERS] spinlock freeze again > > > > > Are their any locks acquired before InitProcess()? > > > > > > > OidGenLockId spinlock is acquired in InitTransactionLog(). > > > > Regards. > > > > Well, seems we have a Proc queue that holds locks, but for these other > cases, we don't. We could use the on_shmexit queue to add an cleanup > handler once we get the lock, and remove it from the queue once we > release the lock. We don't currently have the ability to remove > specific queue entries, but we could easily do that. > > Is the lock failure a problem that happens a lot? > It doesn't happen oridinarily. I don't remember well how it happend. Probably it was caused by other spinlock(io_in_progress_lock ?) freeze while testing 6.5-beta. Regards. Hiroshi Inoue Inoue@tpf.co.jp
-
Re: [HACKERS] spinlock freeze again
Bruce Momjian <maillist@candle.pha.pa.us> — 1999-07-08T03:01:16Z
> It doesn't happen oridinarily. > I don't remember well how it happend. > Probably it was caused by other spinlock(io_in_progress_lock ?) > freeze while testing 6.5-beta. OK, let's see if it happens again. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026