BRIN minmax multi - incorrect distance for infinite timestamp/date
От | Tomas Vondra |
---|---|
Тема | BRIN minmax multi - incorrect distance for infinite timestamp/date |
Дата | |
Msg-id | eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com обсуждение исходный текст |
Ответы |
Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
|
Список | pgsql-hackers |
Hi, Ashutosh Bapat reported me off-list a possible issue in how BRIN minmax-multi calculate distance for infinite timestamp/date values. The current code does this: if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2)) PG_RETURN_FLOAT8(0); so means infinite values are "very close" to any other value, and thus likely to be merged into a summary range. That's exactly the opposite of what we want to do, possibly resulting in inefficient indexes. Consider this example create table test (a timestamptz) with (fillfactor=50); insert into test select (now() + ((10000 * random())::int || ' seconds')::interval) from generate_series(1,1000000) s(i); update test set a = '-infinity'::timestamptz where random() < 0.01; update test set a = 'infinity'::timestamptz where random() < 0.01; explain (analyze, timing off, costs off) select * from test where a = '2024-01-01'::timestamptz; QUERY PLAN ------------------------------------------------------------------------------ Bitmap Heap Scan on test (actual rows=0 loops=1) Recheck Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone) Rows Removed by Index Recheck: 680662 Heap Blocks: lossy=6024 -> Bitmap Index Scan on test_a_idx (actual rows=60240 loops=1) Index Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone) Planning Time: 0.075 ms Execution Time: 106.871 ms (8 rows) Clearly, large part of the table gets scanned - this happens because when building the index, we end up with ranges like this: [-infinity,a,b,c,...,x,y,z,infinity] and we conclude that distance for [-infinity,a] is 0, and we combine these values into a range. And the same for [z,infinity]. But we should do exactly the opposite thing - never merge those. Attached is a patch fixing this, with which the plan looks like this: QUERY PLAN ------------------------------------------------------------------------------ Bitmap Heap Scan on test (actual rows=0 loops=1) Recheck Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone) -> Bitmap Index Scan on test_a_idx (actual rows=0 loops=1) Index Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone) Planning Time: 0.289 ms Execution Time: 9.432 ms (6 rows) Which seems much better. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: