Thread

  1. Re: [HACKERS] sorting big tables :(

    Michael Richards <miker@scifair.acadiau.ca> — 1998-05-20T00:02:38Z

    On Sun, 17 May 1998, Bruce Momjian wrote:
    
    > > > > I have a big table. 40M rows.
    > > > > On the disk, it's size is:
    > > > >  2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
    > > > > How should one decide based on table size how much room is needed?
    > 
    > Tape sort is a standard Knuth sorting.  It basically sorts in pieces,
    > and merges.  If you don't do this, the accessing around gets very poor
    > as you page fault all over the file, and the cache becomes useless.
    Right. I wasn't reading the right chapter. Internal sorting is much
    different than external sorts. Internal suggests the use of a Quicksort
    algorithim.
    Marc and I discussed over lunch. If I did a select * into, would it not
    make more sense to sort the results into the resulting table rather than
    into pieces and then copy into a table? From my limited knowlege, I think
    this should save 8/7 N the space.
    In this issue, I think there must be a lot more overhead than necessary.
    The table consists of only
    int4, int4, int2
    I read 10 bytes / row of actual data here.
    Instead, 40M/2gigs is about
    50 bytes / record
    What is there other than oid (4? bytes)
    
    -Mike