Thread

  1. external data sources (e.g. a second RDBMS)

    Christof Petig <christof.petig@wtal.de> — 2000-01-17T02:54:07Z

    Dear hackers,
    
    I have to port a lot of programs to PostgreSQL and don't want to wait
    for completing them all before I put the first into production.
    
    So after some thought I came to an enthrilling vision
    
    If PostgreSQL had an interface for transparently accessing external
    data, I could leave the needed tables on the old RDBMS and write a
    wrapper which would provide access to the data on the old RDBMS. Both
    old and new (pgsql) programs would see the same data. And I could start
    using postgres' fine tools (e.g. psql) _now_ for all of my work. 
    
    Putting these short sighted things aside it would provide an opportunity
    for
    - distributed databases
    - load balancing via putting a table on multiple servers, creating a
    unioning-view
    - you could use SQL to access other data sources (e.g. LDAP)
    - one step further to world domination (no matter where your data
    resides, you could     use postgres for your queries)
    - accessing multiple databases in _one_ SQL statement (e.g. join)
    
    So I ask for your opinion on this strange idea. 
    
    I suggest marking a table as external uses an interface which should
    provide the following methods
    - query the structure of the table (\d table-name)
    - sequentially scan the table (returning selected attributes of each
    tuple) [with some conditions]
    - update/delete either by cursor or by where-condition
    - query for statistics (see below)
    
    Since this functionality would require modifications all over the place
    in postgres I would like to start discussion about it. I might overlook
    something but the thought of having such a thing around opens up a lot
    of opportunities.
    
    Perhaps an ODBC wrapper could be the correct point to start (after
    implementing and testing the [additional] virtual table access layer)
    
        Christof
    
    PS: I'd never dare to depend on this functionality, some kind of
    mirroring program might cover the problem for me as well, but it looked
    so cool.
    PPS: Of course I would start to investigate it further _after_ TOAST is
    finished.
    
    ------------------- example ------------------------------------
    
    [a is an external table, b an internal table]
    
    select tuple_a1,tuple_b1 from a,b where a.tuple_a2=b.tuple_b2 and
    tuple_a3=42
    
    would cause
     -> sequential scan (tuple_a1,tuple_a2) on a where tuple_a3=42
     -> for each entry index scan on b for a.tuple_a2=b.tuple_b2
     -> report result 
    or
     -> sequential scan (tuple_b2, tuple_b1) on b
     -> index_scan (tuple_a1) by (tuple_a3=42,tuple_a2=b.tuple_b2)
     -> report result
    
    clearly we need to collect statistics on external tables as well