Re: Hostnames, IDNs, Punycode and Unicode Case Folding

Поиск
Список
Период
Сортировка
От David G Johnston
Тема Re: Hostnames, IDNs, Punycode and Unicode Case Folding
Дата
Msg-id CAKFQuwY7APvDZD5Yr5cAYS+ix42Aruhf4DJ5XscagMeBUffndQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hostnames, IDNs, Punycode and Unicode Case Folding  (Mike Cardwell <pgsql@lists.grepular.com>)
Ответы Re: Hostnames, IDNs, Punycode and Unicode Case Folding  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Hostnames, IDNs, Punycode and Unicode Case Folding  (Mike Cardwell <pgsql@lists.grepular.com>)
Список pgsql-general
On Mon, Dec 29, 2014 at 4:51 PM, Mike Cardwell [via PostgreSQL] <[hidden email]> wrote:
* on the Mon, Dec 29, 2014 at 04:23:22PM -0700, David G Johnston wrote:

>>>    WHERE hostname='nißan.com'
>>>
>>
>> _IF_ Postgres had a punycode function, then you could use:
>> WHERE punycode(hostname) = punycode('nißan.com')
>
> If the OP wraps what he is doing up into a function that is what you end up
> getting: a memoized punycode function.
>
> http://en.wikipedia.org/wiki/Memoization
>
> It has to be defined as volatile but basically write the function to check
> for the provided input on the indexed table and if it doesn't exist the
> function will calculate the punycode value and store it onto the table
> before returning the punycode value to the caller.
I'm not sure all that is necessary. It could be quite a simple function,
like the lower() function. So what I would do is this:

CREATE UNIQUE INDEX hostnames_hostname_key ON hostnames (lower(punycode_encode(hostname)));

That would prevent adding more than one representation for the same hostname
to the column.
 
​Except two different hostname can resolve to the same punycode_encode(hostname) value so the unique index won't work.
It was also mentioned that using the Perl encoding function was non-performant; which is why caching the data into a memoization table has value.

WHERE lower(punycode_encode(hostname)) = lower(punycode_encode('any-representation'))

I'm not for knowing the rules of punycode but I'm not seeing what value lower() provides here...
 

There doesn't need to be any extra table storage for the punycode encoded
version.


​David J.​
 


View this message in context: Re: Hostnames, IDNs, Punycode and Unicode Case Folding
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

В списке pgsql-general по дате отправления:

Предыдущее
От: Andrew Sullivan
Дата:
Сообщение: Re: Hostnames, IDNs, Punycode and Unicode Case Folding
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Hostnames, IDNs, Punycode and Unicode Case Folding