Thread

  1. Re: application of KNN code to US zipcode searches?

    Oleg Bartunov <oleg@sai.msu.su> — 2011-02-17T20:17:52Z

    Mark,
    
    we investigating pgsphere http://pgsphere.projects.postgresql.org/, if we could add KNN support.
    
    
    Oleg
    On Thu, 17 Feb 2011, Mark Stosberg wrote:
    
    >
    >> I thought the benefit of KNN was that you could retrieve the rows in
    >> distance order, so that a query for the closest 20 locations (for
    >> example) would be very fast.  I wouldn't have expected it to be
    >> helpful when you're selecting all the rows regardless of distance.
    >
    > Kevin,
    >
    > Thanks for the feedback. You are right that my "reduced test case"
    > wasn't a good approximation. I added a limit, to simulate finding the
    > 100 zipcodes closest to 90210.
    >
    > Below I compare 4 approaches to the same query:
    >
    > 1. Cube search
    > 2. Earth Distance Search
    > 3. Simple point distance (no index)
    > 4. Simple point distance (KNN)
    >
    > Now KNN benchmarks to be almost 100x faster! That's very promising.
    > Then there's only the issue that simple point distance is not expected
    > to be a good enough approximation of earth-distances. Perhaps that can
    > be solved by pre-computing coordinates based on the lat/long pairs....
    > much like the map projections used to present a curved surface on a flat
    > map? Given that's OK to be be a few miles off, it seems we have some
    > leeway here.
    >
    > Recommendations?
    >
    >    Mark
    >
    > EXPLAIN ANALYZE
    > SELECT zipcode,
    >    cube_distance( '(-2513120.64361786, -4645511.0460328,
    > 3575538.9507084)', zipcodes.earth_coords)/1609.344 AS radius
    >    FROM zipcodes ORDER BY radius LIMIT 100;
    >
    > ---------------------------------------------------------------
    > Limit  (cost=2946.70..2946.95 rows=100 width=62) (actual
    > time=167.650..168.064 rows=100 loops=1)
    >   ->  Sort  (cost=2946.70..3050.40 rows=41483 width=62) (actual
    > time=167.644..167.829 rows=100 loops=1)
    >         Sort Key: ((cube_distance('(-2513120.64361786,
    > -4645511.0460328, 3575538.9507084)'::cube, earth_coords) /
    > 1609.344::double precision))
    >         Sort Method: top-N heapsort  Memory: 20kB
    >         ->  Seq Scan on zipcodes  (cost=0.00..1361.24 rows=41483
    > width=62) (actual time=0.030..90.807 rows=41483 loops=1)
    > Total runtime: 168.300 ms
    >
    > ############################################################3
    >
    > -- Using Earthdistance
    > EXPLAIN ANALYZE SELECT zipcode,
    >    lon_lat <@> '(-118.412426,34.096629)' As radius
    >    FROM zipcodes
    >    ORDER BY lon_lat <@> '(-118.412426,34.096629)'
    >    LIMIT 100;
    >
    > ------------------------------------------------------------
    > Limit  (cost=2842.99..2843.24 rows=100 width=22) (actual
    > time=187.995..188.451 rows=100 loops=1)
    >   ->  Sort  (cost=2842.99..2946.70 rows=41483 width=22) (actual
    > time=187.989..188.149 rows=100 loops=1)
    >         Sort Key: ((lon_lat <@> '(-118.412426,34.096629)'::point))
    >         Sort Method: top-N heapsort  Memory: 20kB
    >         ->  Seq Scan on zipcodes  (cost=0.00..1257.54 rows=41483
    > width=22) (actual time=0.033..108.203 rows=41483 loops=1)
    > Total runtime: 188.660 ms
    >
    > ##########################################
    >
    > Using simple point distance, but with no Gist Index:
    >
    > EXPLAIN ANALYZE SELECT zipcode,
    >    lon_lat <-> '(-118.412426,34.096629)' As radius
    >    FROM zipcodes
    >    ORDER BY lon_lat <-> '(-118.412426,34.096629)'
    >    LIMIT 100;
    >
    > --------------------------------------------------------
    > Limit  (cost=2842.99..2843.24 rows=100 width=22) (actual
    > time=160.574..161.057 rows=100 loops=1)
    >   ->  Sort  (cost=2842.99..2946.70 rows=41483 width=22) (actual
    > time=160.568..160.691 rows=100 loops=1)
    >         Sort Key: ((lon_lat <-> '(-118.412426,34.096629)'::point))
    >         Sort Method: top-N heapsort  Memory: 20kB
    >         ->  Seq Scan on zipcodes  (cost=0.00..1257.54 rows=41483
    > width=22) (actual time=0.027..84.610 rows=41483 loops=1)
    > Total runtime: 161.226 ms
    >
    > #########################################
    >
    > -- Using KNN-GIST index
    > EXPLAIN ANALYZE SELECT zipcode,
    >    lon_lat <-> '(-118.412426,34.096629)' As radius
    >    FROM zipcodes
    >    ORDER BY lon_lat <-> '(-118.412426,34.096629)'
    >    LIMIT 100;
    > ------------------------------------------------------------------
    > Limit  (cost=0.00..12.94 rows=100 width=22) (actual time=0.447..1.892
    > rows=100 loops=1)
    >   ->  Index Scan using zipcodes_knn on zipcodes  (cost=0.00..5365.93
    > rows=41483 width=22) (actual time=0.440..1.407 rows=100 loops=1)
    >         Order By: (lon_lat <-> '(-118.412426,34.096629)'::point)
    > Total runtime: 2.198 ms
    >
    >
    >
    
     	Regards,
     		Oleg
    _____________________________________________________________
    Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
    Sternberg Astronomical Institute, Moscow University, Russia
    Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
    phone: +007(495)939-16-83, +007(495)939-23-83