Re: WIP: Fast GiST index build
От | Heikki Linnakangas |
---|---|
Тема | Re: WIP: Fast GiST index build |
Дата | |
Msg-id | 4E64AE3E.9000901@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: WIP: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>) |
Ответы |
Re: WIP: Fast GiST index build
Re: WIP: Fast GiST index build |
Список | pgsql-hackers |
On 01.09.2011 12:23, Alexander Korotkov wrote: > On Thu, Sep 1, 2011 at 12:59 PM, Heikki Linnakangas< > heikki.linnakangas@enterprisedb.com> wrote: > >> So I changed the test script to generate the table as: >> >> CREATE TABLE points AS SELECT random() as x, random() as y FROM >> generate_series(1, $NROWS); >> >> The unordered results are in: >> >> testname | nrows | duration | accesses >> -----------------------------+**-----------+-----------------+**---------- >> points unordered buffered | 250000000 | 05:56:58.575789 | 2241050 >> points unordered auto | 250000000 | 05:34:12.187479 | 2246420 >> points unordered unbuffered | 250000000 | 04:38:48.663952 | 2244228 >> >> Although the buffered build doesn't lose as badly as it did with more >> overlap, it still doesn't look good :-(. Any ideas? > > But it's still a lot of overlap. It's about 220 accesses per small area > request. It's about 10 - 20 times greater than should be without overlaps. > If we roughly assume that 10 times more overlap makes 1/10 of tree to be > used for actual inserts, then that part of tree can easily fit to the cache. > You can try my splitting algorithm on your test setup (it this case I advice > to start from smaller number of rows, 100 M for example). > I'm requesting real-life datasets which makes troubles in real life from > Oleg. Probably those datasets is even larger or new linear split produce > less overlaps on them. I made a small tweak to the patch, and got much better results (this is with my original method of generating the data): testname | nrows | duration | accesses -----------------------------+-----------+-----------------+---------- points unordered buffered | 250000000 | 03:34:23.488275 | 3945486 points unordered auto | 250000000 | 02:55:10.248722 | 3767548 points unordered unbuffered | 250000000 | 04:02:04.168138 | 4564986 The tweak I made was to the way buffers are emptied in the final emptying phase. Previously, it repeatedly looped through all the buffers at a level, until there were no more non-empty buffers at the level. When a buffer was split while it was being emptied, processing that buffer stopped, and the emptying process moved on to the next buffer. I changed it so that when a buffer splits, we continue emptying that buffer until it's completely empty. That behavior is much more cache-friendly, which shows as much better overall performance. I probably changed that behavior for the worse in previous my rounds of cleanup. Anyway, attached is the patch I used to get the above numbers. Now that the performance problem is fixed, I'll continue reviewing and cleaning up the patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: