Re: Use merge-based matching for MCVs in eqjoinsel
| От | Tom Lane | 
|---|---|
| Тема | Re: Use merge-based matching for MCVs in eqjoinsel | 
| Дата | |
| Msg-id | 383402.1762206945@sss.pgh.pa.us обсуждение исходный текст  | 
		
| Ответ на | Re: Use merge-based matching for MCVs in eqjoinsel (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>) | 
| Список | pgsql-hackers | 
Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes: > On 27.10.2025 18:50, David Geier wrote: >> On 10.09.2025 15:56, Ilia Evdokimov wrote: >>> I'll need some time to check both join and semi join cases with small >>> and large default_statistics_target. I'll share the results later. >> Have you had a chance to test above things? > Yes, I wrote about this here: > https://www.postgresql.org/message-id/c3dbf2ab-d72d-4033-822a-60ad8023f499%40tantorlabs.com Hmm. Those results sure look like there is a performance regression up to at least 100 MCVs ... not a large one, but consistently a few percent. That's a bit sad for a patch purporting to improve performance. It looks to me like perhaps we should stick to the old algorithm up to 100 or possibly even more MCVs. Certainly the threshold needs to be higher than 1, as you have it now. I eyeballed the patch itself very briefly, and have a couple quick comments: * Is hash_msv a typo for hash_mcv? If not, maybe spell out what it's supposed to mean. * The patch would be easier to read if it didn't reindent a couple large chunks of existing code. Can we change the factorization to avoid that? If not, I'd recommend submitting without that reindentation, and reminding the committer to reindent at the last moment. * The calculation loops in eqjoinsel_inner and eqjoinsel_semi are not identical, which makes it look quite weird to be writing just one function that conditionally replaces both. I wonder if we should refactor to have just one copy (and just eat the extra cycles of calculating matchprodfreq). * In fact ... I wonder if we should try harder to not do essentially identical calculations twice, remembering that eqjoinsel_semi is always used alongside eqjoinsel_inner. (Of course, we could only do that if clamped_nvalues2 is the same as sslot2->nvalues, but that's frequently true.) I think the reason it's duplicative right now is that we regarded this semijoin calculation as an experiment and so didn't want to invest a lot of effort in it ... but this patch is exactly a lot of effort, so maybe it's time to deal with that unfinished business. regards, tom lane
В списке pgsql-hackers по дате отправления: