Thread

  1. I want to change libpq and libpgtcl for better handling of large query results

    Constantin Teodorescu <teo@flex.ro> — 1998-01-05T18:43:45Z

    Hello all,
    
    WARNING : It's a long mail, but please have patience and read it *all*
    
    I have reached a point in developing PgAccess when I discovered that
    some functions in libpgtcl are bad implemented and even libpq does not
    help me a lot!
    
    What's the problem ! Is about working with large queries result, big
    tables with thousand of records that have to be processed in order to
    get a full report for example.
    
    Getting a query result from Tcl/Tk (pg_select function) uses PQexec.
    But PQexec IS GETTING ALL THE RECORDS IN MEMORY and after that user can
    handle query results.
    But what if table has thousand records ? Probably I would need more than
    512 Mb of RAM in order to get a report finished.
    
    Viorel Mitache from RENEL Braila, (mviorel@flex.ro, please cc him and me
    because we aren't on hacker list) proposed another sollution.
    
    With some small changes in libpq-fe.h
    
     (  void (* callback)(PGresult *,void *ptr,int stat);
        void *usr_ptr;)
    
    and also in libpq to allow a newly defined function in libpgtcl
    (pg_loop) to initiate a query and then calling back a user defined
    function after every record fetched from the connection.
    
    In order to do this, the connection is 'cloned' and on this new
    connection the query is issued. For every record fetched, the C callback
    function is called, here the Tcl interpreted is invoked for the source
    inside the loop, then memory used by the record is release and the next
    record is ready to come.
    More than that, after processing some records, user can choose to break
    the loop (using break command in Tcl) that is actually breaking the
    connection.
    
    What we achieve making this patches ?
    
    First of all the ability of sequential processing large tables.
    Then increasing performance due to parallel execution of receiving data
    on the network and local processing. The backend process on the server
    is filling the communication channel with data and the local task is
    processing it as it comes.
    In the old version, the local task has to wait until *all* data has
    comed (buffered in memory if it was room enough) and then processing it.
    
    What I would ask from you?
    1) First of all, if my needs could be satisfied in other way with
    current functions in libpq of libpgtcl. I can assure you that with
    current libpgtcl is rather impossible. I am not sure if there is another
    mechanism using some subtle functions that I didn't know about them.
    2) Then, if you agree with the idea, to whom we must send more accurate
    the changes that we would like to make in order to be analysed and
    checked for further development of Pg.
    3) Is there any other normal mode to tell to the backend not to send any
    more tuples instead of breaking the connection ?
    4) Even working in C, using PQexec , it's impossible to handle large
    query results, am I true ?
    
    Please cc to : mviorel@flex.ro      and also     teo@flex.ro
    
    -- 
    Constantin Teodorescu
    FLEX Consulting Braila, ROMANIA