Thread

  1. [Keystone Slip # 5] Query consumes all RAM on machine (OR + LIKE problem)

    webmaster@postgresql.org — 1999-07-24T16:59:51Z

    ---------------------------------------------------------------------------
    Slip number -----: 5
    Problem ---------: Query consumes all RAM on machine (OR + LIKE problem)
    Opened by -------: mascarim@yahoo.com on 07/23/99 11:48
    Assigned To -----: scrappy
    ---------------------------------------------------------------------------
    Summary:
    PostgreSQL 6.5.0 on i686-pc-linux-gnu, 
    compiled by gcc 2.7.2.3
    
    The following simple, 5-way join consumes all 
    RAM on the machine as it processess the query, 
    taking minutes to execute until all RAM is 
    consumed:
    
    SELECT DISTINCT supplies.supply,supplies.supplyunit,
    supplies.purchaseunit,supplies.vendor,
    supplies.vendorgroup,supplies.vendoritem,
    supplies.vendorname,supplies.description,
    supplies.conversion,supplies.price,
    supplies.inventory,supplies.commodity,
    supplies.adddate 
    FROM
    supplies,permitbuy,locations,supplychains,reserves
    WHERE 
    permitbuy.webuser = 'mascarj' AND
    (locations.company,locations.costcntr) =
    (permitbuy.company, permitbuy.costcntr) AND
    supplychains.target = locations.target AND
    reserves.target = supplychains.supplysource AND
    supplies.supply = reserves.supply AND
    supplies.inventory = '1' AND
    ((upper(supplies.supply) LIKE '%SEQ%') OR
    (upper(supplies.vendoritem) LIKE '%SEQ%') OR
    (upper(supplies.vendorname) LIKE '%SEQ%') OR
    (upper(supplies.description) LIKE '%SEQ%')) OR ((upper(supplies.supply) LIKE '%SCD%') OR
    (upper(supplies.vendoritem) LIKE '%SCD%') OR (upper(supplies.vendorname) LIKE '%SCD%') OR
    (upper(supplies.description) LIKE '%SCD%')) AND ((upper(supplies.supply) LIKE '%SLEE%') OR
    (upper(supplies.vendoritem) LIKE '%SLEE%') OR (upper(supplies.vendorname) LIKE '%SLEE%') OR
    (upper(supplies.description) LIKE '%SLEE%')) 
    ORDER BY 
    supplies.description
    
    Here's the plan. Obviously something is seriously
    wrong as NO indexes are used:
    
    NOTICE:  QUERY PLAN:
    
    Unique  
    (cost=61491617792.00 rows=1073741849 width=232)
      ->  Sort  
    (cost=61491617792.00 rows=1073741849 width=232)
            ->  Nested Loop  
    (cost=61491617792.00 rows=1073741849 width=232)
                  ->  Nested Loop  
    (cost=50662932480.00 rows=1073741850 width=204)
                        ->  Nested Loop  
    (cost=26479044608.00 rows=1073741850 width=168)
                              ->  Nested Loop  
    (cost=27997336.00 rows=536717461 width=160)
                                    ->  Seq Scan on supplies  
    (cost=1675.03 rows=29832 width=144)
                                    ->  Seq Scan on reserves  
    (cost=938.44 rows=20468 width=16)
                              ->  Seq Scan on supplychains  
    (cost=49.28 rows=1251 width=8)
                        ->  Seq Scan on permitbuy  
    (cost=22.52 rows=531 width=36)
                  ->  Seq Scan on locations  
    (cost=10.09 rows=245 width=28)
    
    EXPLAIN
    
    Table      	Rows
    --------------------------
    supplies	29926
    permitbuy	531
    locations	245
    supplychains	1251
    reserves	20476
    
    Index		Columns
    --------------------------
    k_supplies1	supply
    k_permitbuy1	webuser,company,costcntr
    k_locations1	target
    k_locations2	company, costcntr
    k_supplychains1	target,supplysource,priority
    k_supplychains2	target,supplysource
    k_supplychains3	target,priority
    k_reserves1	target,supply
    k_reserves2	supply
    
    Any help would be appreciated. I would give you 
    a complete description of the tables and 
    indices except that there is the 8K limit on 
    this form! ;-)
    
    
    
    ---------------------------------------------------------------------------
    History:
    07/24/99 12:59 by scrappy:
    Current tech changed to scrappy from momjian
    Status changed to A from U
    
    ---------------------------------------------------------------------------
    07/24/99 12:01 by tgl:
    I believe this gripe is mostly user error, to wit: the second
    group of LIKE clauses is added to the WHERE condition with
    a *top level* OR, which is almost surely not what is wanted.
    
    Poor optimization of OR-of-ANDs and failure to reclaim memory
    from expression evaluation are both known problems, of course.
    Should we start making Keystone slips for all the existing
    TODO-list items?
    ---------------------------------------------------------------------------
    07/23/99 16:16 by ANONYMOUS:
    (Comment from: mascarim@yahoo.com [])
    I just wanted to let you know that if the 
    query in question contains only one clause:
    
    
    
    that the planner/optimizer does, in fact, use
    indexes:
    
    NOTICE:  QUERY PLAN:
    
    Unique  (cost=24076.77 rows=8260854 width=220)
      ->  Sort  (cost=24076.77 rows=8260854 width=220)
            ->  Hash Join  (cost=24076.77 rows=8260854 width=220)
                  ->  Hash Join  (cost=1756.00 rows=597537 width=76)
                        ->  Seq Scan on reserves  (cost=938.44 rows=20468 width=16)
                        ->  Hash  (cost=121.44 rows=475 width=60)
                              ->  Hash Join  (cost=121.44 rows=475 width=60)
                                    ->  Seq Scan on supplychains  (cost=49.28 rows=1251 width=8)
                                    ->  Hash  (cost=26.80 rows=93 width=52)
                                          ->  Hash Join  (cost=26.80 rows=93 width=52)
                                                ->  Seq Scan on locations  (cost=10.09 rows=245 width=28)
                                                ->  Hash  (cost=5.78 rows=56 width=24)
                                                      ->  Index Scan using k_permitbuy1 on permitbuy  (cost=5.78 rows=56 width=24)
                  ->  Hash  (cost=1675.03 rows=17637 width=144)
                        ->  Seq Scan on supplies  (cost=1675.03 rows=17637 width=144)
    
    EXPLAIN
    
    Its only when more than one of those search
    clauses is in the query that ALL indexes are
    ignored:
    
    ...
    ((upper(supplies.supply) LIKE '%SEQ%') OR
    (upper(supplies.vendoritem) LIKE '%SEQ%') OR
    (upper(supplies.vendorname) LIKE '%SEQ%') OR
    (upper(supplies.description) LIKE '%SEQ%'))
    
    OR
    
    ((upper(supplies.supply) LIKE '%SED%') OR
    (upper(supplies.vendoritem) LIKE '%SED%') OR
    (upper(supplies.vendorname) LIKE '%SED%') OR
    (upper(supplies.description) LIKE '%SED%'))...
    
    even though each additional clause still only
    refers to the supplies table and there is 
    a sequential scan on the first plan anyways.
    
    Hope that helps some.
    
    
    ---------------------------------------------------------------------------
    07/23/99 12:50 by scrappy:
    Dependency changed to 0 from 
    Current tech changed to momjian from 
    Contact changed to  from ANONYMOUS
    Scheduled Close Date changed to 0 from 
    Scheduled Open Date changed to 0 from 
    Public setting changed to 1 from 0
    Policy changed to 0
    
    ---------------------------------------------------------------------------
    
    
              Full information on this slip is available at:
    http://www.postgresql.org/bugs/visitor.php3?sid=5&v_func=zoom
    
    ---------------------------------------------------------------------------
    This message was generated automatically by Keystone at http://www.postgresql.org