Thread

  1. Re: How to "unique-ify" HUGE table?

    D'Arcy Cain <darcy@druid.net> — 2008-12-23T17:39:17Z

    On Tue, 23 Dec 2008 12:25:48 -0500
    "Kynn Jones" <kynnjo@gmail.com> wrote:
    > Hi everyone!
    > I have a very large 2-column table (about 500M records) from which I want to
    > remove duplicate records.
    > 
    > I have tried many approaches, but they all take forever.
    > 
    > The table's definition consists of two short TEXT columns.  It is a
    > temporary table generated from a query:
    > 
    > CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;
    > 
    > Initially I tried
    > 
    >  CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;
    > 
    > but after waiting for nearly an hour I aborted the query, and repeated it
    
    Do you have an index on x and y?  Also, does this work better?
    
    CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... GROUP BY x, y;
    
    What does ANALYZE EXPLAIN have to say?
    
    -- 
    D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
    http://www.druid.net/darcy/                |  and a sheep voting on
    +1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.