Re: Unicode grapheme clusters
От | Bruce Momjian |
---|---|
Тема | Re: Unicode grapheme clusters |
Дата | |
Msg-id | Y8wrKdVl/HpKDYrP@momjian.us обсуждение исходный текст |
Ответ на | Re: Unicode grapheme clusters (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: Unicode grapheme clusters
|
Список | pgsql-hackers |
On Sat, Jan 21, 2023 at 12:37:30PM -0500, Bruce Momjian wrote: > Well, as one of the URLs I quoted said: > > This is by design. wcwidth() is utterly broken. Any terminal or > terminal application that uses it is also utterly broken. Forget > about emoji wcwidth() doesn't even work with combining characters, > zero width joiners, flags, and a whole bunch of other things. > > So, either we have to find a function in the library that will do the > looping over the string for us, or we need to identify the special > Unicode characters that create grapheme clusters and handle them in our > code. I just checked if wcswidth() would honor graphene clusters, though wcwidth() does not, but it seems wcswidth() treats characters just like wcwidth(): $ LANG=en_US.UTF-8 grapheme_test wcswidth len=7 bytes_consumed=4, wcwidth len=2 bytes_consumed=4, wcwidth len=2 bytes_consumed=3, wcwidth len=0 bytes_consumed=3, wcwidth len=1 bytes_consumed=3, wcwidth len=0 bytes_consumed=4, wcwidth len=2 C test program attached. This is on Debian 11. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Вложения
В списке pgsql-hackers по дате отправления: