Re: Improving N-Distinct estimation by ANALYZE
От | Greg Stark |
---|---|
Тема | Re: Improving N-Distinct estimation by ANALYZE |
Дата | |
Msg-id | 87vewx2h7j.fsf@stark.xeocode.com обсуждение исходный текст |
Ответ на | Re: Improving N-Distinct estimation by ANALYZE ("Jim C. Nasby" <jnasby@pervasive.com>) |
Ответы |
Re: Improving N-Distinct estimation by ANALYZE
Re: Improving N-Distinct estimation by ANALYZE |
Список | pgsql-hackers |
"Jim C. Nasby" <jnasby@pervasive.com> writes: > Before we start debating merits of proposals based on random reads, can > someone confirm that the sampling code actually does read randomly? I > looked at it yesterday; there is a comment that states that blocks to be > scanned are passed to the analyze function in physical order, and AFAICT > the function that chooses blocks does so based strictly on applying a > probability function to block numbers as it increments a counter. It > seems that any reading is actually sequential and not random, which > makes all the random_page_cost hand-waving null and void. Hm. I'm curious just how much that behaves like a sequential scan actually. I think I'll do some experiments. Reading 1% (1267 read, 126733 skipped): 7748264us Reading 2% (2609 read, 125391 skipped): 12672025us Reading 5% (6502 read, 121498 skipped): 19005678us Reading 5% (6246 read, 121754 skipped): 18509770us Reading 10% (12975 read, 115025 skipped): 19305446us Reading 20% (25716 read, 102284 skipped): 18147151us Reading 50% (63656 read, 64344 skipped): 18089229us Reading 100% (128000 read, 0 skipped): 18173003us These numbers don't make much sense to me. It seems like 5% is about as slow as reading the whole file which is even worse than I expected. I thought I was being a bit pessimistic to think reading 5% would be as slow as reading 20% of the table. Anyone see anything wrong my my methodology? -- greg
Вложения
В списке pgsql-hackers по дате отправления: