Re: speed up unicode normalization quick check
От | Mark Dilger |
---|---|
Тема | Re: speed up unicode normalization quick check |
Дата | |
Msg-id | 60213221-2D0A-4BBA-9242-68A784617541@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: speed up unicode normalization quick check (John Naylor <john.naylor@2ndquadrant.com>) |
Ответы |
Re: speed up unicode normalization quick check
|
Список | pgsql-hackers |
> On May 28, 2020, at 8:54 PM, John Naylor <john.naylor@2ndquadrant.com> wrote: > > On Fri, May 29, 2020 at 5:59 AM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: >> >>> On May 21, 2020, at 12:12 AM, John Naylor <john.naylor@2ndquadrant.com> wrote: > >>> very picky in general. As a test, it also successfully finds a >>> function for the OS "words" file, the "D" sets of codepoints, and for >>> sets of the first n built-in OIDs, where n > 5. >> >> Prior to this patch, src/tools/gen_keywordlist.pl is the only script that uses PerfectHash. Your patch adds a second. I'm not convinced that modifying the PerfectHash code directly each time a new caller needs different multipliersis the right way to go. I forgot in my first round of code review to mention, "thanks for the patch". I generally like what you are doing here,and am trying to review it so it gets committed. > Calling it "each time" with a sample size of two is a bit of a > stretch. The first implementation made a reasonable attempt to suit > future uses and I simply made it a bit more robust. In the text quoted > above you can see I tested some scenarios beyond the current use > cases, with key set sizes as low as 6 and as high as 250k. I don't really have an objection to what you did in the patch. I'm not going to lose any sleep if it gets committed thisway. The reason I gave this feedback is that I saved the *kwlist_d.h files generated before applying the patch, and compared themwith the same files generated after applying the patch, and noticed a very slight degradation. Most of the files changedwithout any expansion, but the largest of them, src/common/kwlist_d.h, changed from static const int16 h[901] to static const int16 h[902] suggesting that even with your reworking of the parameters for PerfectHash, you weren't able to find a single set of numbersthat worked for the two datasets quite as well as different sets of numbers each tailored for a particular data set. I started to imagine that if we wanted to use PerfectHash for yet more stuff, the problem of finding numbers that workedacross all N data sets (even if N is only 3 or 4) might be harder still. That's all I was referring to. 901 -> 902is such a small expansion that it might not be worth worrying about. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: