Re: [18] Unintentional behavior change in commit e9931bfb75

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: [18] Unintentional behavior change in commit e9931bfb75
Дата
Msg-id 667d3b3f730a97f71ebecb74f917167d8ffba427.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: [18] Unintentional behavior change in commit e9931bfb75  (Noah Misch <noah@leadboat.com>)
Ответы Re: [18] Unintentional behavior change in commit e9931bfb75
Список pgsql-hackers
On Sat, 2025-04-12 at 05:34 -0700, Noah Misch wrote:
> I think the code for (2) and for "I/i in Turkish" haven't returned. 
> Given
> commit e3fa2b0 restored the v17 "I/i in Turkish" treatment for plain
> lower(),
> the regex code likely needs a similar restoration.  If not, the regex
> comments
> would need to change to match the code.

Great find, thank you! I'm curious how you came about this difference,
was it through testing or code inspection?

Patch attached. I also updated the top of the comment so that it's
clear that it's referring to the libc provider specifically, and that
ICU still has an issue with non-UTF8 encodings.

Also, the force-to-ASCII-behavior special case is different for
pg_wc_tolower/uppper vs LOWER()/UPPER: the former depends only on
whether it's the default locale, whereas the latter depends on whether
it's the default locale and the encoding is single-byte. Therefore the
results in the tr_TR.UTF-8 locale for the libc provider are
inconsistent:

  => select 'i' ~* 'I', 'I' ~* 'i', lower('I') = 'i', upper('i') = 'I';
   ?column? | ?column? | ?column? | ?column?
  ----------+----------+----------+----------
   t        | t        | f        | f

That behavior goes back a long way, so I'm not suggesting that we
change it.

Regards,
    Jeff Davis


Вложения

В списке pgsql-hackers по дате отправления: