Thread

  1. Re: AW: [HACKERS] Really slow query on 6.4.2

    Tom Lane <tgl@sss.pgh.pa.us> — 1999-03-25T19:34:29Z

    The Hermit Hacker <scrappy@hub.org> writes:
    > I'm not sure what is all contained in the stats, but the easiest one, I
    > think, to have done automagically is table sizes...add a tuple, update the
    > table of number of rows automatically.  If that numbers gets "off", at
    > least it will be more reasonable then not doing anything...no?
    
    The number of tuples is definitely the most important stat; updating it
    automatically would make the optimizer work better.  The stuff in
    pg_statistics is not nearly as important.
    
    The only objection I can think of to auto-updating reltuples is that
    it'd mean additional computation (to access and rewrite the pg_class
    entry) and additional disk I/O (to write back pg_class) for every INSERT
    and DELETE.  There's also a potential problem of multiple backends all
    trying to write pg_class and being delayed or even deadlocked because of
    it.  (Perhaps the MVCC code will help here.)
    
    I'm not convinced that accurate stats are worth that cost, but I don't
    know how big the cost would be anyway.  Anyone have a feel for it?
    
    			regards, tom lane
    
    
  2. Re: AW: [HACKERS] Really slow query on 6.4.2

    Brian E Gallew <geek+@cmu.edu> — 1999-03-25T21:02:40Z

    Then <tgl@sss.pgh.pa.us> spoke up and said:
    > I'm not convinced that accurate stats are worth that cost, but I don't
    > know how big the cost would be anyway.  Anyone have a feel for it?
    
    They are definitely *not* worth the cost.  Especially since no table
    will have the default 0 rows entry after a single vacuum analyze of
    that table.  Let's be honest: if you aren't interested in doing a
    vacuum, then really aren't interested in performance, anyway.
    
    -- 
    =====================================================================
    | JAVA must have been developed in the wilds of West Virginia.      |
    | After all, why else would it support only single inheritance??    |
    =====================================================================
    | Finger geek@cmu.edu for my public key.                            |
    =====================================================================
    
  3. Re: AW: [HACKERS] Really slow query on 6.4.2

    Marc G. Fournier <scrappy@hub.org> — 1999-03-29T18:37:12Z

    On 25 Mar 1999 geek+@cmu.edu wrote:
    
    > Then <tgl@sss.pgh.pa.us> spoke up and said:
    > > I'm not convinced that accurate stats are worth that cost, but I don't
    > > know how big the cost would be anyway.  Anyone have a feel for it?
    > 
    > They are definitely *not* worth the cost.  Especially since no table
    > will have the default 0 rows entry after a single vacuum analyze of
    > that table.  Let's be honest: if you aren't interested in doing a
    > vacuum, then really aren't interested in performance, anyway.
    
    What I personally am not interested in is having to spend 20 minute per
    day with a totally locked up database because I want my queries to be
    faster, when there are other ways of doing it...
    
    Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
    Systems Administrator @ hub.org 
    primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 
    
    
    
  4. Re: AW: [HACKERS] Really slow query on 6.4.2

    Marc G. Fournier <scrappy@hub.org> — 1999-03-29T18:40:00Z

    On Thu, 25 Mar 1999, Tom Lane wrote:
    
    > The Hermit Hacker <scrappy@hub.org> writes:
    > > I'm not sure what is all contained in the stats, but the easiest one, I
    > > think, to have done automagically is table sizes...add a tuple, update the
    > > table of number of rows automatically.  If that numbers gets "off", at
    > > least it will be more reasonable then not doing anything...no?
    > 
    > The number of tuples is definitely the most important stat; updating it
    > automatically would make the optimizer work better.  The stuff in
    > pg_statistics is not nearly as important.
    > 
    > The only objection I can think of to auto-updating reltuples is that
    > it'd mean additional computation (to access and rewrite the pg_class
    > entry) and additional disk I/O (to write back pg_class) for every INSERT
    > and DELETE.  There's also a potential problem of multiple backends all
    > trying to write pg_class and being delayed or even deadlocked because of
    > it.  (Perhaps the MVCC code will help here.)
    > 
    > I'm not convinced that accurate stats are worth that cost, but I don't
    > know how big the cost would be anyway.  Anyone have a feel for it?
    
    We're not looking for perfect numbers here, how about something just
    stored in cache and periodically written out to disk?  We already have the
    shard memory pool to work with...
    
    Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
    Systems Administrator @ hub.org 
    primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 
    
    
    
  5. Re: AW: [HACKERS] Really slow query on 6.4.2

    Brian E Gallew <geek+@cmu.edu> — 1999-03-29T20:07:20Z

    Then <scrappy@hub.org> spoke up and said:
    > On 25 Mar 1999 geek+@cmu.edu wrote:
    > > They are definitely *not* worth the cost.  Especially since no table
    > > will have the default 0 rows entry after a single vacuum analyze of
    > > that table.  Let's be honest: if you aren't interested in doing a
    > > vacuum, then really aren't interested in performance, anyway.
    > 
    > What I personally am not interested in is having to spend 20 minute per
    > day with a totally locked up database because I want my queries to be
    > faster, when there are other ways of doing it...
    
    Uhm, no.  The specific case we are talking about here is creation of a
    table, inserting rows into it, and NEVER running vacuum analyze on
    it.  This would not lock up your database for 20 minutes unless you
    are dropping and re-creating a bunch of tables.  Even that case could
    be scripted creatively[0], though.  Further, you don't have to run it
    on a whole database every night.  Just the tables of interest.
    
    We run a multi-gigabyte Ingres database her for our student systems.
    When we want to make sure that good plans are chosen, we sysmod and
    optimizedb it.  Since we always want good plans, but rarely inload
    massive amounts of data, we do this once a week.
    
    One of the things to be kept in mind with performance tuning is
    tradeoffs.  Does it make sense to penalize every transaction for the
    sake of updating statistics?  (the answer is "maybe") Does it make
    sense to penalize every transaction to provide a recovery mechanism?
    (yes) Does it make sense to penalize every transaction to prevent any
    one transaction from using up more than 1MB/s of bandwidth?  (no)
    Should you extract the data to a binary flat file, read it in C,
    collect the information of interest and then do something interesting
    with it?  (maybe)
    
    [0] Assuming the data are laid out "sequentially" on the index fields:
    create the table, chop off and insert only the first and last
    thousand[1] rows, vacuum, and then insert the rest.
    
    [1] Or perhaps a slightly bigger number.  Or a sampling of the file
    
    -- 
    =====================================================================
    | JAVA must have been developed in the wilds of West Virginia.      |
    | After all, why else would it support only single inheritance??    |
    =====================================================================
    | Finger geek@cmu.edu for my public key.                            |
    =====================================================================