Re: Use of additional index columns in rows filtering

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Use of additional index columns in rows filtering
Дата	8 августа 2023 г. 20:49:52
Msg-id	6e71a11f-fbbe-3d50-cc72-49832692833c@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Use of additional index columns in rows filtering (Peter Geoghegan <pg@bowt.ie>)
Ответы	Re: Use of additional index columns in rows filtering
Список	pgsql-hackers

Дерево обсуждения

On 8/8/23 21:15, Peter Geoghegan wrote:
> On Tue, Aug 8, 2023 at 11:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
>> Assuming that that happens, then it immediately gives index scans a
>> huge advantage over bitmap index scans. At that point it seems
>> important to describe (in high level terms) where it is that the
>> advantage is innate, and where it's just because we haven't done the
>> required work for bitmap index scans. I became confused on this point
>> myself yesterday. Admittedly I should have been able to figure it out
>> on my own -- but it is confusing.
> 
> I also have some doubts about the costing. That contributed to my confusion.
> 
> Take my " four = 1 and two != 1" example query, from earlier today. As
> I said, that gets a bitmap index scan, which does a hugely excessive
> amount of heap access. But once I force the planner to use an index
> scan, then (as predicted) there are useful index filters -- filters
> that can eliminate 100% of all heap accesses. Yet the planner still
> thinks that the total cost of the bitmap scan plan is only 415.28,
> versus 714.89 for the index scan plan. Perhaps that's just because
> this is a tricky case, for whatever reason...but it's not obvious what
> that reason really is.
> 

Right. I haven't checked how the costs are calculated in these cases,
but I'd bet it's a combination of having correlated conditions, and the
bitmap costing being fairly rough (with plenty of constants etc).

The correlation seems like an obvious culprit, considering the explain says

 Bitmap Heap Scan on public.tenk1  (cost=31.35..413.85 rows=1250
width=250) (actual time=2.698..2.703 rows=0 loops=1)

So we expect 1250 rows. If that was accurate, the index scan would have
to do 1250 heap fetches. It's just luck the index scan doesn't need to
do that. I don't this there's a chance to improve this costing - if the
inputs are this off, it can't do anything.

Also, I think this is related to the earlier discussion about maybe
costing it according to the worst case - i.e. as if we still needed
fetch the same number of heap tuples as before. Which will inevitably
lead to similar issues, with worse plans looking cheaper.

> You keep pointing out that your patch only makes isolated, local
> changes to certain specific plans. While that is true, it's also true
> that there will be fairly far removed consequences. Why shouldn't I
> treat those things as in scope?
> 

That is certainly true - I'm trying to keep the scope somewhat close to
the original goal. Obviously, there may be additional things the patch
really needs to consider, but I'm not sure this is one of those cases
(perhaps I just don't understand what the issue is - the example seems
like a run-of-the-mill case of poor estimate / costing).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Use of additional index columns in rows filtering