Re: selectivity function
От | Greg Hennessy |
---|---|
Тема | Re: selectivity function |
Дата | |
Msg-id | e23e9771-463f-c189-8502-f2c3c748f8fb@gmail.com обсуждение исходный текст |
Ответ на | Re: selectivity function (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Thu, May 26, 2022 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Can you do anything useful with attaching selectivity estimates > to the functions it references, instead? I may have been doing down a bad path before. The function I'm working to improve has five argument, the last being "degrees", which is the match radius. Obviously a larger match radius should cause more matches. For a small value of a match radius (0.005 degrees): q3c_test=# explain (analyze, buffers) select * from test as a, test1 as b where q3c_join(a.ra,a.dec,b.ra,b.dec,.005); QUERY PLAN Nested Loop (cost=92.28..22787968818.00 rows=5 width=32) (actual time=7.799..10758.566 rows=31 loops=1) Buffers: shared hit=8005684 -> Seq Scan on test a (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.008..215.570 rows=1000000 loops=1) Buffers: shared hit=5406 -> Bitmap Heap Scan on test1 b (cost=92.28..22785.45 rows=250 width=16) (actual time=0.009..0.009 rows=0 loops=1000000) (note: I deleted some of the output, since I think I'm keeping the important bits) So, the cost of the query is calculated as 2e10, where it expect five rows, found 31, and a hot cache of reading 8 million units of disk space, I'd have to check the fine manual to remind myself of the units of that. When I do the same sort of query on a much larger match radius (5 deg) I get: q3c_test=# explain (analyze, buffers) select * from test as a, test1 as b where q3c_join(a.ra,a.dec,b.ra,b.dec,5); QUERY PLAN Nested Loop (cost=92.28..22787968818.00 rows=4766288 width=32) (actual time=0.086..254995.691 rows=38051626 loops=1) Buffers: shared hit=104977026 -> Seq Scan on test a (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.008..261.425 rows=1000000 loops=1) Buffers: shared hit=5406 -> Bitmap Heap Scan on test1 b (cost=92.28..22785.45 rows=250 width=16) (actual time=0.053..0.247 rows=38 loops=1000000) The "total cost" is the same identical 2e10, this time the number of rows expectd is 4.7 million, the number of rows delivered is 38 million (so the calculation is off by a factor of 8, I'm not sure that is important), but the io is now 104 million units. So while we are doing a lot more IO, and dealing with a lot more rows, the calculated cost is identical. That seems strange me me. Is that a normal thing? Is it possible that the cost calculation isn't including the selectivity calculation? Greg
В списке pgsql-hackers по дате отправления: