Обсуждение: BUG #17594: conditional hash indexes size (hash index ignore WHERE condition during CREATE INDEX?)

Поиск
Список
Период
Сортировка

BUG #17594: conditional hash indexes size (hash index ignore WHERE condition during CREATE INDEX?)

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17594
Logged by:          Maxim Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 14.4
Operating system:   Linux
Description:

Hi,

I found very weird situation with size of highly selective partial hash
indexes (they has almost same size as full table hash index).
Test case:

create table test_1000000 as select id from generate_series(1, 1000000) as
g(id);
create index test_1000000_hash_full_idx on test_1000000 using hash(id);
create index test_1000000_hash_partial_idx on test_1000000 using hash(id)
where id<11;
create table test_10 as select id from generate_series(1, 10) as g(id);
create index test_10_hash_full  on test_10  using hash(id);

\di+ test_1*
                                                     List of relations
 Schema |             Name              | Type  | Owner  |    Table     |
Persistence | Access method | Size  | Description 

--------+-------------------------------+-------+--------+--------------+-------------+---------------+-------+-------------
 public | test_1000000_hash_full_idx    | index | mboguk | test_1000000 |
permanent   | hash          | 32 MB | 
 public | test_1000000_hash_partial_idx | index | mboguk | test_1000000 |
permanent   | hash          | 28 MB | 
 public | test_10_hash_full             | index | mboguk | test_10      |
permanent   | hash          | 80 kB | 

Expected: the test_1000000_hash_partial_idx  and test_10_hash_full indexes
should have same or at least close size because they index same amount of
rows (10):

select count(*) from test_10;
 count 
-------
    10

select count(*) from test_1000000 where id<11;
 count 
-------
    10


PG Bug reporting form <noreply@postgresql.org> writes:
> I found very weird situation with size of highly selective partial hash
> indexes (they has almost same size as full table hash index).

hashbuild() sets up the initial hash index size based on estimating
the current number of rows in the table, without any correction for
partial-index selectivity.  I don't find this to be a bug particularly.
The odds of making things worse via a bad estimate seem at least as
high as the odds of making things better.

            regards, tom lane



Re: BUG #17594: conditional hash indexes size (hash index ignore WHERE condition during CREATE INDEX?)

От
Maxim Boguk
Дата:


On Thu, Aug 25, 2022 at 11:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> I found very weird situation with size of highly selective partial hash
> indexes (they has almost same size as full table hash index).

hashbuild() sets up the initial hash index size based on estimating
the current number of rows in the table, without any correction for
partial-index selectivity.  I don't find this to be a bug particularly.
The odds of making things worse via a bad estimate seem at least as
high as the odds of making things better.

                        regards, tom lane

Sometimes it lead to unexpected results, for hash indexes most common (imho) use case is indexing long text/varchar columns on big tables where they provide substantial size reduction over usual btree indexes (or in extreme case of very long test columns - btree cannot be used at all). Now in the case of a partial hash index over a huge table (with highly selective conditions thus covering only a tiny part of rows) - size of resulting hash index could be a huge surprise (in the practical case where I found this issue it was 200MB hash index with only 5 rows in it).
Are most pages of hash index in that case will be completely empty and almost never touched by the database (thus wasting only disk space) or the resulting hash index will have very low rows per index page ratio (thus inefficiently use shared buffers as well)?



--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"

Re: BUG #17594: conditional hash indexes size (hash index ignore WHERE condition during CREATE INDEX?)

От
Amit Kapila
Дата:
On Fri, Aug 26, 2022 at 1:34 PM Maxim Boguk <maxim.boguk@gmail.com> wrote:
>
>
> On Thu, Aug 25, 2022 at 11:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> PG Bug reporting form <noreply@postgresql.org> writes:
>> > I found very weird situation with size of highly selective partial hash
>> > indexes (they has almost same size as full table hash index).
>>
>> hashbuild() sets up the initial hash index size based on estimating
>> the current number of rows in the table, without any correction for
>> partial-index selectivity.  I don't find this to be a bug particularly.
>> The odds of making things worse via a bad estimate seem at least as
>> high as the odds of making things better.
>>
>>                         regards, tom lane
>
>
> Sometimes it lead to unexpected results, for hash indexes most common (imho) use case is indexing long text/varchar
columnson big tables where they provide substantial size reduction over usual btree indexes (or in extreme case of very
longtest columns - btree cannot be used at all). Now in the case of a partial hash index over a huge table (with highly
selectiveconditions thus covering only a tiny part of rows) - size of resulting hash index could be a huge surprise (in
thepractical case where I found this issue it was 200MB hash index with only 5 rows in it). 
> Are most pages of hash index in that case will be completely empty and almost never touched by the database (thus
wastingonly disk space) or the resulting hash index will have very low rows per index page ratio (thus inefficiently
useshared buffers as well)? 
>

I think it would be later, less number of rows per index page but you
can check/verify with hash functions provided by pageinspect module
[1]. In the case, you mentioned in your email, I have checked it is
distributed over 10 different bucket pages. I think if the number of
rows are so less, if you create such an index beforehand then you may
not see large number of empty pages.

[1] - https://www.postgresql.org/docs/devel/pageinspect.html#id-1.11.7.34.10

--
With Regards,
Amit Kapila.