Selectivity and row count estimates for JSONB columns

Поиск

Список

Период

Сортировка

От	Joel Perren
Тема	Selectivity and row count estimates for JSONB columns
Дата	20 октября 2021 г. 14:56:43
Msg-id	CACFz3n13sazt00u+_AbGL5Ur9danB56sHPhP61ymnqUHQRZ1uQ@mail.gmail.com обсуждение исходный текст
Ответы	Re: Selectivity and row count estimates for JSONB columns
Список	pgsql-general

Дерево обсуждения

Hi all

I have a Postgres/PostGIS database with two separate database tables each of which has the following column structure:

- identifier (text)

- geometry (geometryz,27700)

- properties (jsonb)

I have created a GIN index (jsonb_path_ops) over the properties column on both tables.

As I understand it, Postgres is unable to calculate statistics on the contents of JSONB columns and so should use hard-coded estimates of selectivity when planning queries. However, despite both tables having identical structures, similar row counts, identical indexes, and both having been recently ANALYZEd, I am finding that the query planner is producing different row count estimates between the tables.

To expand: both tables have a 'description' field within their JSONB columns. I am executing the following query:

EXPLAIN

SELECT identifier, geometry, properties FROM table

WHERE properties @@ '$.description == "test"'

ORDER BY identifier;

I am expecting that the selectivity value used for such queries should be 0.010 which is the default result of the matchingsel selectivity function which the @@ operator uses when operating on jsonb.

For both tables, the planner opts for a Bitmap Index Scan -> Sort -> Gather Merge. However, the estimated number of rows returned from the Index Scan node differs substantially.

- Table A (1,611,752 rows): 159 estimated rows (this is roughly 0.01% and makes sense);

- Table B (1,656,110 rows): 16566 estimated rows (roughly 1% - why??)

This difference is causing the planner to come up with some strange plans for queries on Table B which it does not do for Table A. I know that both estimates are just that and that neither is 'better', but I would really prefer it to be consistent and use a selectivity of 0.01 rather than 0.1 as it should be doing.

I know this is getting long now, so apologies. But one thing I did notice and wondered if it could be related is that the pg_stats table for Table A seems to have values for histogram_bounds and correlation, but nulls for these values for Table B.

Any ideas?

Thanks

Joel

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Selectivity and row count estimates for JSONB columns