Thread

  1. Interesting behaviour !

    Constantin Teodorescu <teo@flex.ro> — 1999-07-14T07:16:28Z

    Please Cc: to teo@flex.ro 
    
    I have a table called "note" that looks like this :
    
    create table "note" (id serial,perioada int2,schema int2,explicatie
    text,...);
    
    The "note" table has 22.000 records and the record length is about 75
    bytes (is has also a "text" field").
    
    Because I am frequently accesing the table with queries like "... where
    perioada=12" I was tempted to make also indexes on "perioada" and
    "schema" field.
    
    The tables have the following sizes (their file sizes into
    /usr/local/pgsql/data/base....)
    
    note         2.890 Kb
    note_id        385 Kb
    note_perioada  409 Kb
    note_schema    466 Kb      
    
    I ran previusly "vacuum analyze" on that database ensuring that
    statistical tables have been updated.
    
    Trying some selects with explain I got the following results:
    
    contabil=> explain select * from note where id=15;
    NOTICE:  QUERY PLAN:
    Index Scan using note_id on note  (cost=2.05 rows=2 width=87)  
    
    
    contabil=> explain select * from note where perioada=15;
    NOTICE:  QUERY PLAN:
    Seq Scan on note  (cost=1099.99 rows=1600 width=87)
    
    
    contabil=> explain select * from note where schema=15;
    NOTICE:  QUERY PLAN:
    Seq Scan on note  (cost=1099.99 rows=432 width=87)
    
    
    That means that searching on "perioada" field don't use "note_perioada"
    index!!!
    
    I know that the query optimisation take care of record lengths, table
    sizes, index sizes, but I thought that in this case it will use
    "note_perioada" index.
    
    The distribution of "perioada" values within "note" records is like that
    :
    
    contabil=> select perioada,count(*) from note group by perioada;
    perioada|count
    --------+-----
           4|    2
           7|   66
           8|  108
           9|  135
          10|  151
          11|  146
          12| 4468
          13| 3045
          14| 3377
          15| 3207
          16| 3100
          17| 3039
          18| 1789
          19|    1
          22|    2
    (15 rows)  
    
    So, I think that PostgreSQL is doing right when he chooses not to use
    "note_perioada" index for that type of query by comparing different
    costs (althought it still remains strange at the first look).
    
    Is there any chance to speed up that type of simple query (select * from
    note where perioada=N) ?
    
    I dropped the index and try with a "hash" index on the same "perioada"
    field. The same result.
    
    In this case, it seems that the "note_perioada" index will never be
    used. That means it can be safely dropped without affecting the
    application performance, isn't it? It is expected that the database will
    grow in the same manner, with approx. the same nr. of records per
    "perioada" field every month.
    
    Best regards,
    
    Please Cc: to teo@flex.ro
    
    ===============================
    Constantin Teodorescu
    FLEX Consulting Braila, ROMANIA