Proximity query with GIST and row estimation
От | Guillaume Smet |
---|---|
Тема | Proximity query with GIST and row estimation |
Дата | |
Msg-id | 1d4e0c10702130909g3be0a10fpf2594574c84866c9@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Proximity query with GIST and row estimation
|
Список | pgsql-performance |
Hi all, Following the work on Mark Stosberg on this list (thanks Mark!), I optimized our slow proximity queries by using cube, earthdistance (shipped with contrib) and a gist index. The result is globally very interesting apart for a specific query and we'd like to be able to fix it too to be more consistent (it's currently faster with a basic distance calculation based on acos, cos and so on but it's slow anyway). The problem is that we have sometimes very few places near a given location (small city) and sometimes a lot of them (in Paris, Bruxelles and so on - it's the case we have here). The gist index I created doesn't estimate the number of rows in the area very well. Table: lieu (100k rows) with wgslat and wgslon as numeric Table: lieugelieu (200k rows, 1k with codegelieu = 'PKG') Index: "idx_lieu_earth" gist (ll_to_earth(wgslat::double precision, wgslon::double precision)) The simplified query is: SELECT DISTINCT l.numlieu, l.nomlieu, ROUND (earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, l.wgslon))) as dist FROM lieu l, lieugelieu lgl WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = l.numlieu ORDER BY dist ASC LIMIT 2; It's used to find the nearest car parks from a given location. The plan is attached plan_earthdistance_nestedloop.txt. It uses a nested loop because the row estimate is pretty bad: (cost=0.00..3.38 rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1). If I disable the nested loop, the plan is different and faster (see plan_earthdistance_hash.txt attached). Is there any way to improve this estimation? I tried to set the statistics of wgslat and wgslon higher but it doesn't change anything (I don't know if the operator is designed to use the statistics). Any other idea to optimize this query is very welcome too. -- Guillaume
Вложения
В списке pgsql-performance по дате отправления: