Re: WIP: parallel GiST index builds
От | Andrey M. Borodin |
---|---|
Тема | Re: WIP: parallel GiST index builds |
Дата | |
Msg-id | CACAE478-9276-461D-970C-6A04D414BC8B@yandex-team.ru обсуждение исходный текст |
Ответ на | Re: WIP: parallel GiST index builds (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: WIP: parallel GiST index builds
|
Список | pgsql-hackers |
> On 21 Jul 2024, at 23:42, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > >> >> 1. Do I get it right that is_parallel argument for gistGetFakeLSN() >> is only needed for assertion? And this assertion can be ensured just >> by inspecting code. Is it really necessary? > > Yes, in the patch it's only used for an assert. But it's actually > incorrect - just as I speculated in my initial message (in the section > about gistGetFakeLSN), it sometimes fails to detect a page split. I > noticed that while testing the patch adding GiST to amcheck, and I > reported that in that thread [1]. But I apparently forgot to post an > updated version of this patch :-( Oops, I just though that it's a version with solved FakeLSN problem. > > I'll post a new version tomorrow, but in short it needs to update the > fake LSN even if (lastlsn != currlsn) if is_parallel=true. It's a bit > annoying this means we generate a new fake LSN on every call, and I'm > not sure that's actually necessary. But I've been unable to come up with > a better condition when to generate a new LSN. Why don't we just use an atomic counter withtin shared build state? > > [1] > https://www.postgresql.org/message-id/79622955-6d1a-4439-b358-ec2b6a9e7cbf%40enterprisedb.com Yes, I'll be back in that thread soon. I'm still on vacation and it's hard to get continuous uninterrupted time here. Youdid a great review, and I want to address all issues there wholistically. Thank you! >> 2. gistBuildParallelCallback() updates indtuplesSize, but it seems to be >> not used anywhere. AFAIK it's only needed to buffered build. 3. I >> think we need a test that reliably triggers parallel and serial >> builds. >> > > Yeah, it's possible the variable is unused. Agreed on the testing. > >> As far as I know, there's a well known trick to build better GiST >> over PostGIS data: randomize input. I think parallel scan is just >> what is needed, it will shuffle tuples enough... >> > > I'm not sure I understand this comment. What do you mean by "better > GiST" or what does that mean for this patch? I think parallel build indexes will have faster IndexScans. Best regards, Andrey Borodin.
В списке pgsql-hackers по дате отправления: