Re: WIP: parallel GiST index builds
От | Tomas Vondra |
---|---|
Тема | Re: WIP: parallel GiST index builds |
Дата | |
Msg-id | 67bbf244-63f8-4de0-bad7-562b5c979fbd@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: WIP: parallel GiST index builds ("Andrey M. Borodin" <x4mmm@yandex-team.ru>) |
Ответы |
Re: WIP: parallel GiST index builds
|
Список | pgsql-hackers |
On 7/22/24 11:00, Andrey M. Borodin wrote: > > >> On 21 Jul 2024, at 23:42, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> >>> >>> 1. Do I get it right that is_parallel argument for gistGetFakeLSN() >>> is only needed for assertion? And this assertion can be ensured just >>> by inspecting code. Is it really necessary? >> >> Yes, in the patch it's only used for an assert. But it's actually >> incorrect - just as I speculated in my initial message (in the section >> about gistGetFakeLSN), it sometimes fails to detect a page split. I >> noticed that while testing the patch adding GiST to amcheck, and I >> reported that in that thread [1]. But I apparently forgot to post an >> updated version of this patch :-( > > Oops, I just though that it's a version with solved FakeLSN problem. > >> >> I'll post a new version tomorrow, but in short it needs to update the >> fake LSN even if (lastlsn != currlsn) if is_parallel=true. It's a bit >> annoying this means we generate a new fake LSN on every call, and I'm >> not sure that's actually necessary. But I've been unable to come up with >> a better condition when to generate a new LSN. > > Why don't we just use an atomic counter withtin shared build state? > I don't understand how would that solve the problem, can you elaborate? Which of the values are you suggesting should be replaced with the shared counter? lastlsn? >> >> [1] >> https://www.postgresql.org/message-id/79622955-6d1a-4439-b358-ec2b6a9e7cbf%40enterprisedb.com > Yes, I'll be back in that thread soon. I'm still on vacation and it's hard to get continuous uninterrupted time here. Youdid a great review, and I want to address all issues there wholistically. Thank you! > >>> 2. gistBuildParallelCallback() updates indtuplesSize, but it seems to be >>> not used anywhere. AFAIK it's only needed to buffered build. 3. I >>> think we need a test that reliably triggers parallel and serial >>> builds. >>> >> >> Yeah, it's possible the variable is unused. Agreed on the testing. >> >>> As far as I know, there's a well known trick to build better GiST >>> over PostGIS data: randomize input. I think parallel scan is just >>> what is needed, it will shuffle tuples enough... >>> >> >> I'm not sure I understand this comment. What do you mean by "better >> GiST" or what does that mean for this patch? > > I think parallel build indexes will have faster IndexScans. > That's interesting. I haven't thought about measuring stuff like that (and it hasn't occurred to me it might have this benefit, or why would that be the case). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: