Re: postgres_fdw: using TABLESAMPLE to collect remote sample

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Дата	6 января 2023 г. 14:40:36
Msg-id	4ee816f5-6146-7745-a6b4-f38a1b24105a@enterprisedb.com обсуждение исходный текст
Ответ на	Re: postgres_fdw: using TABLESAMPLE to collect remote sample (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Список	pgsql-hackers

Дерево обсуждения

I've pushed the comment/assert cleanup.

Here's a cleaned up version of the relkind check. This is almost
identical to the patch from yesterday (plus the renames and updates of
comments for modified functions).

The one difference is that I realized the relkind check does not
actually say we can't do sampling - it just means we can't use
TABLESAMPLE to do it. We could still use "random()" ...

Furthermore, I don't think we should silently disable sampling when the
user explicitly requests TABLESAMPLE by specifying bernoulli/system for
the table - IMHO it's less surprising to just fail in that case.

So we now do this:

    if (!can_tablesample && (method == ANALYZE_SAMPLE_AUTO))
        method = ANALYZE_SAMPLE_RANDOM;

Yes, we may still disable sampling when reltuples is -1, 0 or something
like that. But that's a condition that is expected for new relations and
likely to fix itself, which is not the case for relkind.

Of course, all relkinds that don't support TABLESAMPLE currently have
reltuples value that will disable sampling anyway (e.g. views have -1).
So we won't actually fallback to random() anyway, because we can't
calculate the sample fraction.

That's a bit annoying for foreign tables pointing at a view, which is a
more likely use case than table pointing at a sequence. And likely more
of an issue, because views may return a many rows (while sequences have
only a single row).

But I realized we could actually still do "random()" sampling:

    SELECT * FROM t ORDER BY random() LIMIT $X;

where $X is the target number of rows for sample for the table. Which
would result in plans like this (given sufficient work_mem values)

                              QUERY PLAN
  -------------------------------------------------------------------
   Limit (actual rows=30000 loops=1)
     ->  Sort (actual rows=30000 loops=1)
           Sort Key: (random())
           Sort Method: top-N heapsort  Memory: 3916kB
           ->  Append (actual rows=1000000 loops=1)

Even with lower work_mem values this would likely be a win, due to
saving on network transfers.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

0001-Check-relkind-before-using-TABLESAMPLE-in-postgres_f.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

Вложения