Thread

Re: [HACKERS] Fwd: Joins and links

Hannu Krosing <hannu@trust.ee> — 1999-07-09T13:55:02Z
Leon wrote:
> 
> Hannu Krosing wrote:
> 
> > The difference will quickly degrade as more rows are fetched in one
> > query and cache misses and disk head movement start rattling your
> > disks. The analogy being a man who needs 10 different items from a
> > supermarket and takes 10 full round trips from home to buy them.
> 
> Frankly, I didn't even consider fetching database from disk. This
> slows queries immensely and I wonder if there exist someone who
> doesn't keep their entire DB in RAM.

Well, I personally dont even know, how I could keep my entire PostgreSQL 
DB in RAM :)

It would be interesting to know what percentage of people do use 
PostgreSQL for databases that are small enough to fit in RAM - 
surely not the ones who need splitting of tables >2GB.

And I think that setting up PostgreSQL for maximum RAM usage would 
make a nice topic for "Optimizing PostgreSQL". 

When my backends are mostly idle they usually use about 3-4MB of memory, 
(hardly enough for any database :).

It is quite possible that some bigger tables end up in a disk-cache, 
but you can expect to find all your data in that cache only if you do 
many queries on the same tables in a row, and the machine is otherways 
idle.

> I think the gain will raise with the number of tables, because
> the more tables - the more index lookups are saved.

My point is that sometimes even sequential scan is faster than index
lookup,
and not only due to overhead of using the index, but due to better disk 
performance of sequential reads vs. random reads

For in-memory databases this of course does not count.

Still I'm quite sure that the main effort in PostgreSQL development has
so 
far gone to optimising queries where most of the data is fetched from
the 
disk.

> > Fixing the optimizer would get a performance gain on a far wider
> > range of tasks, and is still needed for links.
> 
> But general fixing of optimizer is total rewritement of it, whereas
> link fix is almost a fast hack.

I'm not too sure about it. It certainly can be done without a _total_ 
rewrite, but getting all the new node types and access methods into the 
parser/planner/executor may not be trivial. 

One idea would be a cross-table OID index for anything in memory.
Then, assuming that everything is in-memory, using oids as links would
be 
only trivially, if at all, slower (2-10 memory accesses and comparisons) 
than "straight" link lookup, that could also be chasing linear chains of 
forward-id-links on frequently updated DBs. On infrequently updated DBs 
you could just use triggers and/or cron jobs to keep your reports
updated,
I quess that this is what most commercial OLAP systems do.

Actually I lived my first halfyear of using PostgreSQL under a delusion 
that lookup by OID would be somewhat special (fast). 
Probably due to my misunderstanding of the (ever diminishing) O in
ORDBMS :) 
There have been some attempts to get the object-orientedness better 
supported by PGSQL, (possibly even some contrib funtions), but nobody
seems 
to have needed it bad enough to
1) implement it
 and 
2) shout long enough to get it included in standart distibution. 
Most (all?) of the developers seem to be die-hard RDBMS guys and thanks 
to that we have now a solid and reasonably fast Relational DBMS with 
some OO rudiments

So I quess that unless you do (at least part of) links, no-one else will
;(

> Unfortunately I already have a project to work on. There is too
> little of me for two projects.

Darn! I almost hoped we would get one more PostgreSQL hacker as I'm sure 
that after familiarising oneself with the code enougth to implement
links 
one would be quite capable of helping with most of PostgreSQL
development
<grin>

----------------------
Hannu