Thread

external data sources (e.g. a second RDBMS)

Christof Petig <christof.petig@wtal.de> — 2000-01-17T02:54:07Z
Dear hackers,

I have to port a lot of programs to PostgreSQL and don't want to wait
for completing them all before I put the first into production.

So after some thought I came to an enthrilling vision

If PostgreSQL had an interface for transparently accessing external
data, I could leave the needed tables on the old RDBMS and write a
wrapper which would provide access to the data on the old RDBMS. Both
old and new (pgsql) programs would see the same data. And I could start
using postgres' fine tools (e.g. psql) _now_ for all of my work. 

Putting these short sighted things aside it would provide an opportunity
for
- distributed databases
- load balancing via putting a table on multiple servers, creating a
unioning-view
- you could use SQL to access other data sources (e.g. LDAP)
- one step further to world domination (no matter where your data
resides, you could     use postgres for your queries)
- accessing multiple databases in _one_ SQL statement (e.g. join)

So I ask for your opinion on this strange idea. 

I suggest marking a table as external uses an interface which should
provide the following methods
- query the structure of the table (\d table-name)
- sequentially scan the table (returning selected attributes of each
tuple) [with some conditions]
- update/delete either by cursor or by where-condition
- query for statistics (see below)

Since this functionality would require modifications all over the place
in postgres I would like to start discussion about it. I might overlook
something but the thought of having such a thing around opens up a lot
of opportunities.

Perhaps an ODBC wrapper could be the correct point to start (after
implementing and testing the [additional] virtual table access layer)

    Christof

PS: I'd never dare to depend on this functionality, some kind of
mirroring program might cover the problem for me as well, but it looked
so cool.
PPS: Of course I would start to investigate it further _after_ TOAST is
finished.

------------------- example ------------------------------------

[a is an external table, b an internal table]

select tuple_a1,tuple_b1 from a,b where a.tuple_a2=b.tuple_b2 and
tuple_a3=42

would cause
 -> sequential scan (tuple_a1,tuple_a2) on a where tuple_a3=42
 -> for each entry index scan on b for a.tuple_a2=b.tuple_b2
 -> report result 
or
 -> sequential scan (tuple_b2, tuple_b1) on b
 -> index_scan (tuple_a1) by (tuple_a3=42,tuple_a2=b.tuple_b2)
 -> report result

clearly we need to collect statistics on external tables as well