Re: PoC: using sampling to estimate joins / complex conditions
| От | Tomas Vondra |
|---|---|
| Тема | Re: PoC: using sampling to estimate joins / complex conditions |
| Дата | |
| Msg-id | 025bbe78-0492-1a5a-76d8-e5d06581ac16@enterprisedb.com обсуждение исходный текст |
| Ответ на | Re: PoC: using sampling to estimate joins / complex conditions (Andres Freund <andres@anarazel.de>) |
| Список | pgsql-hackers |
On 3/22/22 00:35, Andres Freund wrote: > Hi, > > On 2022-01-21 01:06:37 +0100, Tomas Vondra wrote: >> Yeah, I haven't updated some of the test output because some of those >> changes are a bit wrong (and I think that's fine for a PoC patch). I >> should have mentioned that in the message, though. Sorry about that. > > Given that the patch hasn't been updated since January and that it's a PoC in > the final CF, it seems like it should at least be moved to the next CF? Or > perhaps returned? > > I've just marked it as waiting-on-author for now - iirc that leads to fewer > reruns by cfbot once it's failing... > Either option works for me. > >> 2) The correlated samples are currently built using a query, executed >> through SPI in a loop. So given a "driving" sample of 30k rows, we do >> 30k lookups - that'll take time, even if we do that just once and cache >> the results. > > Ugh, yea, that's going to increase overhead by at least a few factors. > > >> I'm sure there there's room for some improvement, though - for example >> we don't need to fetch all columns included in the statistics object, >> but just stuff referenced by the clauses we're estimating. That could >> improve chance of using IOS etc. > > Yea. Even just avoid avoiding SPI / planner + executor seems likely to be a > big win. > > > It seems one more of the cases where we really need logic to recognize "cheap" > vs "expensive" plans, so that we only do sampling when useful. I don't think > that's solved just by having a declarative syntax. > Right. I was thinking about walking the first table, collecting all the values, and then doing a single IN () query for the second table - a bit like a custom join (which seems a bit terrifying, TBH). But even if we manage to make this much cheaper, there will still be simple queries where it's going to be prohibitively expensive. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: