[Keystone Slip # 5] Query consumes all RAM on machine (OR + LIKE problem)

webmaster@postgresql.org

From: webmaster@postgresql.org
To: pgsql-bugs@postgresql.org
Date: 1999-07-24T16:59:51Z
Lists: pgsql-bugs
---------------------------------------------------------------------------
Slip number -----: 5
Problem ---------: Query consumes all RAM on machine (OR + LIKE problem)
Opened by -------: mascarim@yahoo.com on 07/23/99 11:48
Assigned To -----: scrappy
---------------------------------------------------------------------------
Summary:
PostgreSQL 6.5.0 on i686-pc-linux-gnu, 
compiled by gcc 2.7.2.3

The following simple, 5-way join consumes all 
RAM on the machine as it processess the query, 
taking minutes to execute until all RAM is 
consumed:

SELECT DISTINCT supplies.supply,supplies.supplyunit,
supplies.purchaseunit,supplies.vendor,
supplies.vendorgroup,supplies.vendoritem,
supplies.vendorname,supplies.description,
supplies.conversion,supplies.price,
supplies.inventory,supplies.commodity,
supplies.adddate 
FROM
supplies,permitbuy,locations,supplychains,reserves
WHERE 
permitbuy.webuser = 'mascarj' AND
(locations.company,locations.costcntr) =
(permitbuy.company, permitbuy.costcntr) AND
supplychains.target = locations.target AND
reserves.target = supplychains.supplysource AND
supplies.supply = reserves.supply AND
supplies.inventory = '1' AND
((upper(supplies.supply) LIKE '%SEQ%') OR
(upper(supplies.vendoritem) LIKE '%SEQ%') OR
(upper(supplies.vendorname) LIKE '%SEQ%') OR
(upper(supplies.description) LIKE '%SEQ%')) OR ((upper(supplies.supply) LIKE '%SCD%') OR
(upper(supplies.vendoritem) LIKE '%SCD%') OR (upper(supplies.vendorname) LIKE '%SCD%') OR
(upper(supplies.description) LIKE '%SCD%')) AND ((upper(supplies.supply) LIKE '%SLEE%') OR
(upper(supplies.vendoritem) LIKE '%SLEE%') OR (upper(supplies.vendorname) LIKE '%SLEE%') OR
(upper(supplies.description) LIKE '%SLEE%')) 
ORDER BY 
supplies.description

Here's the plan. Obviously something is seriously
wrong as NO indexes are used:

NOTICE:  QUERY PLAN:

Unique  
(cost=61491617792.00 rows=1073741849 width=232)
  ->  Sort  
(cost=61491617792.00 rows=1073741849 width=232)
        ->  Nested Loop  
(cost=61491617792.00 rows=1073741849 width=232)
              ->  Nested Loop  
(cost=50662932480.00 rows=1073741850 width=204)
                    ->  Nested Loop  
(cost=26479044608.00 rows=1073741850 width=168)
                          ->  Nested Loop  
(cost=27997336.00 rows=536717461 width=160)
                                ->  Seq Scan on supplies  
(cost=1675.03 rows=29832 width=144)
                                ->  Seq Scan on reserves  
(cost=938.44 rows=20468 width=16)
                          ->  Seq Scan on supplychains  
(cost=49.28 rows=1251 width=8)
                    ->  Seq Scan on permitbuy  
(cost=22.52 rows=531 width=36)
              ->  Seq Scan on locations  
(cost=10.09 rows=245 width=28)

EXPLAIN

Table      	Rows
--------------------------
supplies	29926
permitbuy	531
locations	245
supplychains	1251
reserves	20476

Index		Columns
--------------------------
k_supplies1	supply
k_permitbuy1	webuser,company,costcntr
k_locations1	target
k_locations2	company, costcntr
k_supplychains1	target,supplysource,priority
k_supplychains2	target,supplysource
k_supplychains3	target,priority
k_reserves1	target,supply
k_reserves2	supply

Any help would be appreciated. I would give you 
a complete description of the tables and 
indices except that there is the 8K limit on 
this form! ;-)



---------------------------------------------------------------------------
History:
07/24/99 12:59 by scrappy:
Current tech changed to scrappy from momjian
Status changed to A from U

---------------------------------------------------------------------------
07/24/99 12:01 by tgl:
I believe this gripe is mostly user error, to wit: the second
group of LIKE clauses is added to the WHERE condition with
a *top level* OR, which is almost surely not what is wanted.

Poor optimization of OR-of-ANDs and failure to reclaim memory
from expression evaluation are both known problems, of course.
Should we start making Keystone slips for all the existing
TODO-list items?
---------------------------------------------------------------------------
07/23/99 16:16 by ANONYMOUS:
(Comment from: mascarim@yahoo.com [])
I just wanted to let you know that if the 
query in question contains only one clause:



that the planner/optimizer does, in fact, use
indexes:

NOTICE:  QUERY PLAN:

Unique  (cost=24076.77 rows=8260854 width=220)
  ->  Sort  (cost=24076.77 rows=8260854 width=220)
        ->  Hash Join  (cost=24076.77 rows=8260854 width=220)
              ->  Hash Join  (cost=1756.00 rows=597537 width=76)
                    ->  Seq Scan on reserves  (cost=938.44 rows=20468 width=16)
                    ->  Hash  (cost=121.44 rows=475 width=60)
                          ->  Hash Join  (cost=121.44 rows=475 width=60)
                                ->  Seq Scan on supplychains  (cost=49.28 rows=1251 width=8)
                                ->  Hash  (cost=26.80 rows=93 width=52)
                                      ->  Hash Join  (cost=26.80 rows=93 width=52)
                                            ->  Seq Scan on locations  (cost=10.09 rows=245 width=28)
                                            ->  Hash  (cost=5.78 rows=56 width=24)
                                                  ->  Index Scan using k_permitbuy1 on permitbuy  (cost=5.78 rows=56 width=24)
              ->  Hash  (cost=1675.03 rows=17637 width=144)
                    ->  Seq Scan on supplies  (cost=1675.03 rows=17637 width=144)

EXPLAIN

Its only when more than one of those search
clauses is in the query that ALL indexes are
ignored:

...
((upper(supplies.supply) LIKE '%SEQ%') OR
(upper(supplies.vendoritem) LIKE '%SEQ%') OR
(upper(supplies.vendorname) LIKE '%SEQ%') OR
(upper(supplies.description) LIKE '%SEQ%'))

OR

((upper(supplies.supply) LIKE '%SED%') OR
(upper(supplies.vendoritem) LIKE '%SED%') OR
(upper(supplies.vendorname) LIKE '%SED%') OR
(upper(supplies.description) LIKE '%SED%'))...

even though each additional clause still only
refers to the supplies table and there is 
a sequential scan on the first plan anyways.

Hope that helps some.


---------------------------------------------------------------------------
07/23/99 12:50 by scrappy:
Dependency changed to 0 from 
Current tech changed to momjian from 
Contact changed to  from ANONYMOUS
Scheduled Close Date changed to 0 from 
Scheduled Open Date changed to 0 from 
Public setting changed to 1 from 0
Policy changed to 0

---------------------------------------------------------------------------


          Full information on this slip is available at:
http://www.postgresql.org/bugs/visitor.php3?sid=5&v_func=zoom

---------------------------------------------------------------------------
This message was generated automatically by Keystone at http://www.postgresql.org