Re: application of KNN code to US zipcode searches?
От | Oleg Bartunov |
---|---|
Тема | Re: application of KNN code to US zipcode searches? |
Дата | |
Msg-id | Pine.LNX.4.64.1102172316360.278@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | Re: application of KNN code to US zipcode searches? (Mark Stosberg <mark@summersault.com>) |
Ответы |
Re: application of KNN code to US zipcode searches?
(Mark Stosberg <mark@summersault.com>)
|
Список | pgsql-performance |
Mark, we investigating pgsphere http://pgsphere.projects.postgresql.org/, if we could add KNN support. Oleg On Thu, 17 Feb 2011, Mark Stosberg wrote: > >> I thought the benefit of KNN was that you could retrieve the rows in >> distance order, so that a query for the closest 20 locations (for >> example) would be very fast. I wouldn't have expected it to be >> helpful when you're selecting all the rows regardless of distance. > > Kevin, > > Thanks for the feedback. You are right that my "reduced test case" > wasn't a good approximation. I added a limit, to simulate finding the > 100 zipcodes closest to 90210. > > Below I compare 4 approaches to the same query: > > 1. Cube search > 2. Earth Distance Search > 3. Simple point distance (no index) > 4. Simple point distance (KNN) > > Now KNN benchmarks to be almost 100x faster! That's very promising. > Then there's only the issue that simple point distance is not expected > to be a good enough approximation of earth-distances. Perhaps that can > be solved by pre-computing coordinates based on the lat/long pairs.... > much like the map projections used to present a curved surface on a flat > map? Given that's OK to be be a few miles off, it seems we have some > leeway here. > > Recommendations? > > Mark > > EXPLAIN ANALYZE > SELECT zipcode, > cube_distance( '(-2513120.64361786, -4645511.0460328, > 3575538.9507084)', zipcodes.earth_coords)/1609.344 AS radius > FROM zipcodes ORDER BY radius LIMIT 100; > > --------------------------------------------------------------- > Limit (cost=2946.70..2946.95 rows=100 width=62) (actual > time=167.650..168.064 rows=100 loops=1) > -> Sort (cost=2946.70..3050.40 rows=41483 width=62) (actual > time=167.644..167.829 rows=100 loops=1) > Sort Key: ((cube_distance('(-2513120.64361786, > -4645511.0460328, 3575538.9507084)'::cube, earth_coords) / > 1609.344::double precision)) > Sort Method: top-N heapsort Memory: 20kB > -> Seq Scan on zipcodes (cost=0.00..1361.24 rows=41483 > width=62) (actual time=0.030..90.807 rows=41483 loops=1) > Total runtime: 168.300 ms > > ############################################################3 > > -- Using Earthdistance > EXPLAIN ANALYZE SELECT zipcode, > lon_lat <@> '(-118.412426,34.096629)' As radius > FROM zipcodes > ORDER BY lon_lat <@> '(-118.412426,34.096629)' > LIMIT 100; > > ------------------------------------------------------------ > Limit (cost=2842.99..2843.24 rows=100 width=22) (actual > time=187.995..188.451 rows=100 loops=1) > -> Sort (cost=2842.99..2946.70 rows=41483 width=22) (actual > time=187.989..188.149 rows=100 loops=1) > Sort Key: ((lon_lat <@> '(-118.412426,34.096629)'::point)) > Sort Method: top-N heapsort Memory: 20kB > -> Seq Scan on zipcodes (cost=0.00..1257.54 rows=41483 > width=22) (actual time=0.033..108.203 rows=41483 loops=1) > Total runtime: 188.660 ms > > ########################################## > > Using simple point distance, but with no Gist Index: > > EXPLAIN ANALYZE SELECT zipcode, > lon_lat <-> '(-118.412426,34.096629)' As radius > FROM zipcodes > ORDER BY lon_lat <-> '(-118.412426,34.096629)' > LIMIT 100; > > -------------------------------------------------------- > Limit (cost=2842.99..2843.24 rows=100 width=22) (actual > time=160.574..161.057 rows=100 loops=1) > -> Sort (cost=2842.99..2946.70 rows=41483 width=22) (actual > time=160.568..160.691 rows=100 loops=1) > Sort Key: ((lon_lat <-> '(-118.412426,34.096629)'::point)) > Sort Method: top-N heapsort Memory: 20kB > -> Seq Scan on zipcodes (cost=0.00..1257.54 rows=41483 > width=22) (actual time=0.027..84.610 rows=41483 loops=1) > Total runtime: 161.226 ms > > ######################################### > > -- Using KNN-GIST index > EXPLAIN ANALYZE SELECT zipcode, > lon_lat <-> '(-118.412426,34.096629)' As radius > FROM zipcodes > ORDER BY lon_lat <-> '(-118.412426,34.096629)' > LIMIT 100; > ------------------------------------------------------------------ > Limit (cost=0.00..12.94 rows=100 width=22) (actual time=0.447..1.892 > rows=100 loops=1) > -> Index Scan using zipcodes_knn on zipcodes (cost=0.00..5365.93 > rows=41483 width=22) (actual time=0.440..1.407 rows=100 loops=1) > Order By: (lon_lat <-> '(-118.412426,34.096629)'::point) > Total runtime: 2.198 ms > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-performance по дате отправления: