Re: Should heapam_estimate_rel_size consider fillfactor?
От | Andres Freund |
---|---|
Тема | Re: Should heapam_estimate_rel_size consider fillfactor? |
Дата | |
Msg-id | 20230614195353.kke26ol7vyrykdad@awork3.anarazel.de обсуждение исходный текст |
Ответ на | Should heapam_estimate_rel_size consider fillfactor? (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Список | pgsql-hackers |
Hi, On 2023-06-11 14:41:27 +0200, Tomas Vondra wrote: > While testing some stuff, I noticed heapam_estimate_rel_size (or rather > table_block_relation_estimate_size it calls) ignores fillfactor, so that > for a table without statistics it ends up with reltuple estimate much > higher than reality. For example, with fillfactor=10 the estimate is > about 10x higher. > > I ran into this while doing some tests with hash indexes, where I use > fillfactor to make the table look bigger (as if the tuples were wider), > and I ran into this: > > drop table hash_test; > create table hash_test (a int, b text) with (fillfactor=10); > insert into hash_test select 1 + ((i - 1) / 10000), md5(i::text) > from generate_series(1, 1000000) s(i); > -- analyze hash_test; > create index hash_test_idx on hash_test using hash (a); > > select pg_size_pretty(pg_relation_size('hash_test_idx')); > > If you run it like this (without the analyze), the index will be 339MB. > With the analyze, it's 47MB. > > This only happens if there are no relpages/reltuples statistics yet, in > which case table_block_relation_estimate_size estimates density from > tuple width etc. > > So it seems the easiest "fix" is to do ANALYZE before creating the index > (and not after it, as I had in my scripts). But I wonder how many people > fail to realize this - it sure took me a while to realize the indexes > are too large and even longer what is causing it. I wouldn't be very > surprised if many people had bloated hash indexes after bulk loads. > > So maybe we should make table_block_relation_estimate_size smarter to > also consider the fillfactor in the "no statistics" branch, per the > attached patch. Seems like a good idea - I can't think of a reason why we shouldn't do so. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: