Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
От | Heikki Linnakangas |
---|---|
Тема | Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build) |
Дата | |
Msg-id | 4E0F6568.6010809@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build) (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
|
Список | pgsql-hackers |
On 02.07.2011 21:07, Tom Lane wrote: > I wrote: >> I tweaked the patch a bit further (don't pessimize the boundary case >> where there's exactly 4n+1 trigrams, avoid forcing trg1 into memory, >> improve the comment) and attach that version below. This is a little >> bit faster but I still wonder if it's worth the price of adding an >> obscure dependency on the header size. > > It occurred to me that we could very easily remove that objection by > making the code dynamically detect when it's reached a suitably aligned > trigram. The attached version of the patch does it that way. It seems > to be a percent or so slower than my previous version, but I think that > making the function robust against header changes is probably well worth > that price. Ah, that opens the door to do something I considered earlier but rejected because of alignment: instead of three 32-bit word fetches, we could fetch one 64-bit word and 32-bit word. Might shave a few more cycles... Meanwhile, I experimented with optimizing the other part of the loop: the HASH() macros for setting the right bits in the signature. With the default compile-time settings, the signature array is 95 bits. Currently, it's stored in a BITVEC, which is a byte array, but we could store it in two 64-bit integers, which makes it possible to write SETBIT differently. I experimented with a few approaches, first was essentially this: + /* Set the nth bit in the signature, in s1 or s2 */ + #define HASH_S(h) \ + do { \ + unsigned int n = HASHVAL(h); \ + if (((uint64) (n)) < 64) \ + s1 |= (uint64) 1<<(n); \ + else \ + s2 |= (uint64) 1<<((n) - 64); \ + } while(0) That was a bit faster on my x64 laptop, but slightly slower on my ia64 HP-UX box. My second try was to use lookup tables, patch attached. That was yet faster on x64, and a small win on the ia64 box too. I'm not sure it's worth the added code complexity, though. Here's a summary of the timings I got with different versions: ia64 HP-UX (anole): unpatched: 11194.038 ms fast_makesign-tom: 10064.980 ms fast_makesign-2int: 10649.726 ms fast_makesign-tbl: 9951.547 ms x64 laptop: unpatched: 4595,209 ms fast_makesign-tom: 3346,548 ms fast_makesign-2int: 3102,874 ms fast_makesign-tbl: 2997,854 ms I used the same "REINDEX INDEX i_words" test I used earlier, repeated each run a couple of times, and took the lowest number. fast_makesign-tom is the first patch you posted, I haven't tested your latest one. fast_makesign-2int is with the HASH_S() macro above, and has_makesign-tbl is with the attached patch. PS. in case you wonder why the HP-UX box is so much slower than my laptop; this box isn't really meant for performance testing. It's just something I happen to have access to, I think it's a virtual machine of some sort. The numbers are very repeatable, however. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: