Thread

  1. Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

    Tom Lane <tgl@sss.pgh.pa.us> — 1999-05-06T14:04:56Z

    Wayne Piekarski <wayne@senet.com.au> writes:
    > We are experiencing exactly the same problem as above - during the day,
    > all of a sudden Postgres will completely jam up, with all processing in
    > one of the following states: (from ps -axwwwwww)
    
    It seems possible that the hashtable bugs I fixed a couple months ago
    are rising up to bite you.  (Basically, the shared hashtables that
    contain things like locks and buffers would go nuts if there got to be
    more than 256 entries ... and it sure sounds like your installation is
    big enough that it could have, eg, more than 256 active locks when
    under load.)  One quick thing you might try to test this is to reduce
    the postmaster's -B setting to less than 256 (if you have it set that
    high) and see if stability improves.
    
    These bugs are fixed in 6.5-beta1, but it has enough other bugs that
    I don't think Wayne would be wise to try moving to 6.5 just yet.
    I have a patch for 6.4.2 that I believe also fixes the problems, but
    it hasn't gotten quite as much testing as I would like so I haven't
    committed it into the REL6_4 tree.  (There's not going to be a 6.4.3
    release, according to current plans, so it's hardly worth doing anyway.)
    
    What I will do is send the patch to Wayne in a separate message, and
    also cc: it to the PATCHES list --- anyone else who needs it can get it
    from there.  Please let us know if this helps, Wayne.
    
    			regards, tom lane
    
    
  2. Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

    Wayne Piekarski <wayne@senet.com.au> — 1999-05-09T08:08:42Z

    Tom Lane wrote:
    > Wayne Piekarski <wayne@senet.com.au> writes:
    > > We are experiencing exactly the same problem as above - during the day,
    > > all of a sudden Postgres will completely jam up, with all processing in
    > > one of the following states: (from ps -axwwwwww)
    > 
    > It seems possible that the hashtable bugs I fixed a couple months ago
    > are rising up to bite you.  (Basically, the shared hashtables that
    > contain things like locks and buffers would go nuts if there got to be
    > more than 256 entries ... and it sure sounds like your installation is
    > big enough that it could have, eg, more than 256 active locks when
    > under load.)  One quick thing you might try to test this is to reduce
    > the postmaster's -B setting to less than 256 (if you have it set that
    > high) and see if stability improves.
    
    Currently, I start up postmaster with -B 192, which I guess puts it below
    the value of 256 which causes problems. Apart from when I got past 256
    buffers, does the patch fix anything else that might be causing problems?
    
    Just for everyones information, the system contains about 80 tables and
    129 indexes. There is about 700 mb of data sprayed over all the tables,
    although some have more rows than others. At any one time during the day,
    we have about 8 to 10 active postgres connections, half of them are
    connected to daemons which continuously sent updates and inserts into the
    system, the rest of them are very quick queries from CGI programs. The
    problems we experience are always during the day, when the CGI programs
    are hammering the database - we don't ever have a problem at night when
    the staff go home. 
    
    The whole thing runs 24 hours a day, 7 days a week. Most of the tables
    rarely get vacuumed (they have tens of thousands of rows and only inserts
    get done to them - the optimiser makes good choices for most of these) -
    however we have 5 tables which get vacuum at midnight each day, we drop
    all the indexes, vacuum, then recreate. If we don't do the index thing,
    the vacuum can take tens of minutes, which is not acceptable - the tables
    contain about 20000 rows, each of which gets updated about 3 times during 
    the day. I sent an email a while back about vacuum performance, and this
    hack is the only way around it.
    
    If any other programs try to query the four tables getting vacuumed then I
    get into real trouble. I wish I could do soemthing like:
    
    BEGIN;
    LOCK TABLE x;
    DROP INDEX x_idx;
    VACUUM ANALYZE x;
    CREATE INDEX x_idx;
    END;
    
    I've seen a #define which looked like it enabled this kind of thing, but
    I'm not sure if it is safe to use.
    
    
    > What I will do is send the patch to Wayne in a separate message, and
    > also cc: it to the PATCHES list --- anyone else who needs it can get it
    > from there.  Please let us know if this helps, Wayne.
    
    During the week when I get a chance I will trial the patch and see if it
    has any affect on the problems we are having. It is very wierd and
    impossible to reproduce on demand as it is related to the number of
    queries and the load of the machine at the time.
    
    Hopefully I will have some results for this by the end of the week.
    
    
    
    While I'm asking some questions here, I should tell you about some of the
    other wierd things I've encountered, many of them are related to shared
    memory and hash tables, which is making me think more and more that all
    the problems I am having are somehow related.
    
    For large tables, when I perform joins, I repeatedly get hash table out of
    memory errors. So I have two tables, one called unix, with 20000 rows, and
    another called services, with 80000 rows - I am producing a result which
    contains about 20000 rows in it as well, so there is lots of data moving
    around.
    
    In most cases, the problem occurs when the optimiser mistakenly choses to
    use seq scan rather than index scan. To get around these problems, we
    initially tried increasing the -B value to larger values (This was a long
    time ago but we had problems, it may have been more than 256 which fits in
    with what Tom Lane said). Every time we kept increasing the number of
    buffers but it got to the point where I was annoyed that the optimiser was
    making bad decisions, and I was at a loss on what to do. So I then
    discovered the COST_INDEX and COST_HEAP variables, which I set to:
    
    set COST_INDEX = '0'; set COST_HEAP = '99999999';
    
    The optimiser then used index scan for almost anything where possible, the
    explain output looked really expensive, but the queries actually executed
    properly even with small -B values. So this is what I do to make these big
    queries work. There are a few cases where the above set statements
    actually cause hash table out of memory as well, so you set them back to
    the defaults and then it usually works ok :)
    
    I know the above is a hack but I needed to get out of a jam and that was
    the only way I could think of doing it. Are there any other join methods
    besides hash join? I thought that lets say I have two tables, A and B,
    both with a column called ID which is indexed, and i do a join on A.id and
    B.id it can use a more efficient means of joining using indexes rather
    than reading both tables into memory and join there?
    
    Here are some explain statements for a big join:
    
    
    reactor=> explain select unix.username from unix where unix.snum =
    services.snum
    NOTICE:  QUERY PLAN:
    
    Hash Join  (cost=6191.62 size=43361 width=20)
      ->  Seq Scan on services  (cost=2204.91 size=43361 width=4)
      ->  Hash  (cost=0.00 size=0 width=0)
            ->  Seq Scan on unix  (cost=1212.26 size=20311 width=16)
    
    
    
    reactor=> set COST_INDEX = '0';
    SET VARIABLE
    reactor=> set COST_HEAP = '999999999';
    SET VARIABLE
    reactor=> explain select unix.username from unix where unix.snum =
    services.snum;
    NOTICE:  QUERY PLAN:
    
    Hash Join  (cost=30000000.00 size=43361 width=20)
      ->  Index Scan using unix_snum_inv_index on unix
    (cost=20311001006080.00 size=20311 width=16)
      ->  Hash  (cost=0.00 size=0 width=0)
            ->  Index Scan using services_snum_inv_index on services
    (cost=43360999964672.00 size=43361 width=4)
                   
    
    I would assume that the above one which uses indexes would be a lot
    better, but why did the optimiser chose the seq scan - do the indexes help
    when doing joins and at the same time all rows are being returned back? I
    understand that the optimiser will choose not to use indexes if it feels
    that it will return most of the rows anyway and so a seq scan is better.
    
    
    ------
    
    
    One other problem related to the shared memory buffers is every so often,
    the postmaster will die with shared memory errors, and device full. This
    happens very rarely (once every one to two weeks) but it happens, and I
    figured that it might be related to the number of buffers I've started up
    with. Note that this problem is not varied by changing the -B value, so I
    don't think its my FreeBSD setup.
    
    
    
    
    So I hope someone finds the above useful, I've been reading the mailing
    lists a lot and I've heard about developers discovering bugs in locking,
    indexes, and vacuum in 6.5, but I wasn't sure if they were applicable to
    6.4.2 as well, so I figured I should tell someone just in case.
    
    
    Sorry about the length of this email, but I had a lot of things to cover.  
    Thanks for your help everyone, I look forward to hearing from you ...
    
    Wayne
    
    ------------------------------------------------------------------------------
    Wayne Piekarski                               Tel:     (08) 8221 5221
    Research & Development Manager                Fax:     (08) 8221 5220
    SE Network Access Pty Ltd                     Mob:     0407 395 889
    222 Grote Street                              Email:   wayne@senet.com.au
    Adelaide SA 5000                              WWW:     http://www.senet.com.au
    
    
  3. Re: [SQL] Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

    Bruce Momjian <maillist@candle.pha.pa.us> — 1999-05-09T11:14:17Z

    > The whole thing runs 24 hours a day, 7 days a week. Most of the tables
    > rarely get vacuumed (they have tens of thousands of rows and only inserts
    > get done to them - the optimiser makes good choices for most of these) -
    > however we have 5 tables which get vacuum at midnight each day, we drop
    > all the indexes, vacuum, then recreate. If we don't do the index thing,
    > the vacuum can take tens of minutes, which is not acceptable - the tables
    > contain about 20000 rows, each of which gets updated about 3 times during 
    > the day. I sent an email a while back about vacuum performance, and this
    > hack is the only way around it.
    
    6.5 beta speeds up vacuuming with existing indexes, thanks to Vadim.
    Also, accessing during vacuuming may be better too.
    
    > While I'm asking some questions here, I should tell you about some of the
    > other wierd things I've encountered, many of them are related to shared
    > memory and hash tables, which is making me think more and more that all
    > the problems I am having are somehow related.
    
    6.5 beta has some _major_ hash fixes.  We always knew there were hash
    problems, but now Tom has fixed many of them.
    
    > I would assume that the above one which uses indexes would be a lot
    > better, but why did the optimiser chose the seq scan - do the indexes help
    > when doing joins and at the same time all rows are being returned back? I
    > understand that the optimiser will choose not to use indexes if it feels
    > that it will return most of the rows anyway and so a seq scan is better.
    
    6.5 beta also has a faster and smarter optimizer.
    
    It may be wise for you to test 6.5beta to see how many problems we fix.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      maillist@candle.pha.pa.us            |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  4. Re: [SQL] Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

    Wayne Piekarski <wayne@senet.com.au> — 1999-05-10T10:37:47Z

    Hi,
    
    > 6.5 beta speeds up vacuuming with existing indexes, thanks to Vadim.
    > Also, accessing during vacuuming may be better too.
    
    That is good news :) When I first heard about MVCC I remember someone
    suggested it would be possible to still do SELECT on tables being
    vacuumed, is this right or not in the current 6.5?
    
    When we were developing the system we spent a lot of time working out
    ways of getting around vacuum, and I've learned a lot from it. I am going
    to try it out on a full dump of our current database and test some
    examples to see what kind of improvement there is. 
    
    > 6.5 beta also has a faster and smarter optimizer.
    > 
    > It may be wise for you to test 6.5beta to see how many problems we fix.
    
    This week I intend to test out the patches I've received, and hopefully
    they will fix up my big problems (the one with the backend locking up)
    then I will grab the latest 6.5 and try that out with some test data to
    see what happens.
    
    Unfortunately, I can't test 6.5 like I would the real thing because many
    of my problems only occur when everyone is busy firing off queries and the
    box is running an unusually high load and things start waiting on locks.
    I'll see what I can do here although the only true way is to go live with
    it - but I'm not ready for that yet :)
    
    I should be able to check the optimiser improvements though, I've got a
    lot of code which does the SET COST_HEAP/COST_INDEX hack to make things
    work :)
    
    thanks,
    Wayne
    
    ------------------------------------------------------------------------------
    Wayne Piekarski                               Tel:     (08) 8221 5221
    Research & Development Manager                Fax:     (08) 8221 5220
    SE Network Access Pty Ltd                     Mob:     0407 395 889
    222 Grote Street                              Email:   wayne@senet.com.au
    Adelaide SA 5000                              WWW:     http://www.senet.com.au