Re: postgres_fdw: using TABLESAMPLE to collect remote sample
От | Tomas Vondra |
---|---|
Тема | Re: postgres_fdw: using TABLESAMPLE to collect remote sample |
Дата | |
Msg-id | 4ee816f5-6146-7745-a6b4-f38a1b24105a@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: postgres_fdw: using TABLESAMPLE to collect remote sample (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
|
Список | pgsql-hackers |
I've pushed the comment/assert cleanup. Here's a cleaned up version of the relkind check. This is almost identical to the patch from yesterday (plus the renames and updates of comments for modified functions). The one difference is that I realized the relkind check does not actually say we can't do sampling - it just means we can't use TABLESAMPLE to do it. We could still use "random()" ... Furthermore, I don't think we should silently disable sampling when the user explicitly requests TABLESAMPLE by specifying bernoulli/system for the table - IMHO it's less surprising to just fail in that case. So we now do this: if (!can_tablesample && (method == ANALYZE_SAMPLE_AUTO)) method = ANALYZE_SAMPLE_RANDOM; Yes, we may still disable sampling when reltuples is -1, 0 or something like that. But that's a condition that is expected for new relations and likely to fix itself, which is not the case for relkind. Of course, all relkinds that don't support TABLESAMPLE currently have reltuples value that will disable sampling anyway (e.g. views have -1). So we won't actually fallback to random() anyway, because we can't calculate the sample fraction. That's a bit annoying for foreign tables pointing at a view, which is a more likely use case than table pointing at a sequence. And likely more of an issue, because views may return a many rows (while sequences have only a single row). But I realized we could actually still do "random()" sampling: SELECT * FROM t ORDER BY random() LIMIT $X; where $X is the target number of rows for sample for the table. Which would result in plans like this (given sufficient work_mem values) QUERY PLAN ------------------------------------------------------------------- Limit (actual rows=30000 loops=1) -> Sort (actual rows=30000 loops=1) Sort Key: (random()) Sort Method: top-N heapsort Memory: 3916kB -> Append (actual rows=1000000 loops=1) Even with lower work_mem values this would likely be a win, due to saving on network transfers. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: