Thread

  1. Re: workaround for expensive KNN?

    Oleg Bartunov <oleg@sai.msu.su> — 2011-04-08T15:25:12Z

    Hans,
    
    what if you create index (price,title) ?
    
    
    On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
    
    > hello ...
    >
    > i got that one ...
    >
    >    "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, title), int_price)
    >
    > so, i have a combined index on text + number.
    > to me the plan seems fine ... it looks like a prober KNN traversal.
    > the difference between my plan and your plan seems to be the fact that i have, say, 1 mio rows which have "handy" or so in it (1 mio out of 11 mio or so). you are moving out from one specific place.
    >
    > my maths is like that:
    > 	11 mio in total
    > 	1 mio matching "iphone"
    > 	cheapest / most expensive 10 out of this mio needed.
    >
    > operator classes are all nice and in place:
    >
    > SELECT 10 <-> 4 as distance;
    > distance
    > ----------
    >        6
    > (1 row)
    >
    > what does "buffers true" in your case say?
    >
    > many thanks,
    >
    > 	hans
    >
    >
    > On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:
    >
    >> Probably, you miss two-columnt index. From my early post:
    >> http://www.sai.msu.su/~megera/wiki/knngist
    >>
    >> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address));
    >> =# SELECT id, address,  (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
    >>   id    |                           address                           |         dist ---------+-------------------------------------------------------------+---------------------
    >>  366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05
    >> 4356328 | r Champ de Mars 75007 PARIS                                 |  0.00421854756964406
    >> 5200167 | Champ De Mars 75007 Paris                                   |  0.00453564562587288
    >> 9301676 | Champ de Mars, 75007 Paris,                                 |  0.00453564562587288
    >> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France     |  0.00624152097590896
    >> 1923818 | Champ de Mars Paris, France                                 |  0.00838214733539654
    >> 5165953 | 39 Rue Champ De Mars Paris, France                          |  0.00874410234569529
    >> 7395870 | 39 Rue Champ De Mars Paris, France                          |  0.00874410234569529
    >> 4358671 | 32 Rue Champ De Mars Paris, France                          |  0.00876089659276339
    >> 1923742 | 12 rue du Champ de Mars Paris, France                       |  0.00876764731845995
    >> (10 rows)
    >>
    >> Time: 7.859 ms
    >>
    >> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point
    >> AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
    >>
    >>                            QUERY PLAN
    >> --------------------------------------------------------------------------------------------------------------------------------------
    >> Limit
    >>   ->  Index Scan using spots_idx on spots
    >>         Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery))
    >> (3 rows)
    >>
    >>
    >> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
    >>
    >>> hello all ...
    >>>
    >>> given oleg's posting before i also wanted to fire up some KNN related question.
    >>> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price.
    >>> i did some tests:
    >>>
    >>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
    >>>                                                                           QUERY PLAN
    >>>
    >>> -----------------------------------------------------------------------------------------------------------------------------
    >>> --------------------------------------
    >>> Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1)
    >>>  Buffers: shared hit=9 read=5004
    >>>  ->  Index Scan using idx_product_t_product_titleprice on t_product  (cost=0.00..13251.91 rows=3224 width=16) (actual time=
    >>> 36391.715..45542.573 rows=10 loops=1)
    >>>        Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery)
    >>>        Order By: (int_price <-> 0::bigint)
    >>>        Buffers: shared hit=9 read=5004
    >>> Total runtime: 45542.676 ms
    >>> (7 rows)
    >>>
    >>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10;
    >>>                                                                           QUERY PLAN
    >>>
    >>> -----------------------------------------------------------------------------------------------------------------------------
    >>> -------------------------------------
    >>> Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1)
    >>>  Buffers: shared hit=3 read=2316
    >>>  ->  Index Scan using idx_product_t_product_titleprice on t_product  (cost=0.00..29762.61 rows=7255 width=16) (actual time=
    >>> 7243.524..10935.217 rows=10 loops=1)
    >>>        Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
    >>>        Order By: (int_price <-> 0::bigint)
    >>>        Buffers: shared hit=3 read=2316
    >>> Total runtime: 10935.265 ms
    >>> (7 rows)
    >>>
    >>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1;
    >>>                                                                        QUERY PLAN
    >>>
    >>> -----------------------------------------------------------------------------------------------------------------------------
    >>> -------------------------------
    >>> Limit  (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1)
    >>>  Buffers: shared hit=1 read=1577
    >>>  ->  Index Scan using idx_product_t_product_titleprice on t_product  (cost=0.00..29762.61 rows=7255 width=16) (actual time=
    >>> 28.525..28.525 rows=1 loops=1)
    >>>        Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
    >>>        Order By: (int_price <-> 0::bigint)
    >>>        Buffers: shared hit=1 read=1577
    >>> Total runtime: 28.558 ms
    >>> (7 rows)
    >>>
    >>>
    >>> under any circumstances - there is no way to reduce the number of buffers needed for a query like that.
    >>> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painful random I/O death.
    >>> is there any alternative which does not simply die when i try to achieve what i want?
    >>>
    >>> the use case is quite simple: all products with a certain word (10 cheapest or so).
    >>>
    >>> is there any alternative approach to this?
    >>> i was putting some hope into KNN but it seems it needs too much random I/O :(.
    >>>
    >>> 	many thanks,
    >>>
    >>> 		hans
    >>>
    >>> --
    >>> Cybertec Sch?nig & Sch?nig GmbH
    >>> Gr?hrm?hlgasse 26
    >>> A-2700 Wiener Neustadt, Austria
    >>> Web: http://www.postgresql-support.de
    >>>
    >>>
    >>>
    >>
    >> 	Regards,
    >> 		Oleg
    >> _____________________________________________________________
    >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
    >> Sternberg Astronomical Institute, Moscow University, Russia
    >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
    >> phone: +007(495)939-16-83, +007(495)939-23-83
    >>
    >> --
    >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    >> To make changes to your subscription:
    >> http://www.postgresql.org/mailpref/pgsql-hackers
    >>
    >
    > --
    > Cybertec Sch?nig & Sch?nig GmbH
    > Gr?hrm?hlgasse 26
    > A-2700 Wiener Neustadt, Austria
    > Web: http://www.postgresql-support.de
    >
    
     	Regards,
     		Oleg
    _____________________________________________________________
    Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
    Sternberg Astronomical Institute, Moscow University, Russia
    Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
    phone: +007(495)939-16-83, +007(495)939-23-83