Interesting behaviour !

Constantin Teodorescu <teo@flex.ro>
From: Constantin Teodorescu <teo@flex.ro>
To: "pgsql-hackers@postgreSQL.org" <pgsql-hackers@postgreSQL.org>
Date: 1999-07-14T07:16:28Z
Lists: pgsql-hackers
Please Cc: to teo@flex.ro 

I have a table called "note" that looks like this :

create table "note" (id serial,perioada int2,schema int2,explicatie
text,...);

The "note" table has 22.000 records and the record length is about 75
bytes (is has also a "text" field").

Because I am frequently accesing the table with queries like "... where
perioada=12" I was tempted to make also indexes on "perioada" and
"schema" field.

The tables have the following sizes (their file sizes into
/usr/local/pgsql/data/base....)

note         2.890 Kb
note_id        385 Kb
note_perioada  409 Kb
note_schema    466 Kb      

I ran previusly "vacuum analyze" on that database ensuring that
statistical tables have been updated.

Trying some selects with explain I got the following results:

contabil=> explain select * from note where id=15;
NOTICE:  QUERY PLAN:
Index Scan using note_id on note  (cost=2.05 rows=2 width=87)  


contabil=> explain select * from note where perioada=15;
NOTICE:  QUERY PLAN:
Seq Scan on note  (cost=1099.99 rows=1600 width=87)


contabil=> explain select * from note where schema=15;
NOTICE:  QUERY PLAN:
Seq Scan on note  (cost=1099.99 rows=432 width=87)


That means that searching on "perioada" field don't use "note_perioada"
index!!!

I know that the query optimisation take care of record lengths, table
sizes, index sizes, but I thought that in this case it will use
"note_perioada" index.

The distribution of "perioada" values within "note" records is like that
:

contabil=> select perioada,count(*) from note group by perioada;
perioada|count
--------+-----
       4|    2
       7|   66
       8|  108
       9|  135
      10|  151
      11|  146
      12| 4468
      13| 3045
      14| 3377
      15| 3207
      16| 3100
      17| 3039
      18| 1789
      19|    1
      22|    2
(15 rows)  

So, I think that PostgreSQL is doing right when he chooses not to use
"note_perioada" index for that type of query by comparing different
costs (althought it still remains strange at the first look).

Is there any chance to speed up that type of simple query (select * from
note where perioada=N) ?

I dropped the index and try with a "hash" index on the same "perioada"
field. The same result.

In this case, it seems that the "note_perioada" index will never be
used. That means it can be safely dropped without affecting the
application performance, isn't it? It is expected that the database will
grow in the same manner, with approx. the same nr. of records per
"perioada" field every month.

Best regards,

Please Cc: to teo@flex.ro

===============================
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA