Re: Unicode normalization SQL functions
От | Andreas Karlsson |
---|---|
Тема | Re: Unicode normalization SQL functions |
Дата | |
Msg-id | 26150b35-240f-941c-e5a7-24f2d489b316@proxel.se обсуждение исходный текст |
Ответ на | Re: Unicode normalization SQL functions (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) |
Ответы |
Re: Unicode normalization SQL functions
Re: Unicode normalization SQL functions |
Список | pgsql-hackers |
On 1/28/20 9:21 PM, Peter Eisentraut wrote: > You're right, this didn't make any sense. Here is a new patch set with > that fixed. Thanks for this patch. This is a feature which has been on my personal todo list for a while and something which I have wished to have a couple of times. I took a quick look at the patch and here is some feedback: A possible concern is increased binary size from the new tables for the quickcheck but personally I think they are worth it. A potential optimization would be to merge utf8_to_unicode() and pg_utf_mblen() into one function in unicode_normalize_func() since utf8_to_unicode() already knows length of the character. Probably not worth it though. It feels a bit wasteful to measure output_size in unicode_is_normalized() since unicode_normalize() actually already knows the length of the buffer, it just does not return it. A potential optimization for the normalized case would be to abort the quick check on the first maybe and normalize from that point on only. If I can find the time I might try this out and benchmark it. Nitpick: "split/\s*;\s*/, $line" in generate-unicode_normprops_table.pl should be "split /\s*;\s*/, $line". What about using else if in the code below for clarity? + if (check == UNICODE_NORM_QC_NO) + return UNICODE_NORM_QC_NO; + if (check == UNICODE_NORM_QC_MAYBE) + result = UNICODE_NORM_QC_MAYBE; Remove extra space in the line below. + else if (quickcheck == UNICODE_NORM_QC_NO ) Andreas
В списке pgsql-hackers по дате отправления: