Re: C11: should we use char32_t for unicode code points?
| От | Jeff Davis | 
|---|---|
| Тема | Re: C11: should we use char32_t for unicode code points? | 
| Дата | |
| Msg-id | ee88559cc739098a67f8d90d867f8dae9b00f82a.camel@j-davis.com обсуждение исходный текст  | 
		
| Ответ на | Re: C11: should we use char32_t for unicode code points? (Thomas Munro <thomas.munro@gmail.com>) | 
| Список | pgsql-hackers | 
On Thu, 2025-10-30 at 04:25 +1300, Thomas Munro wrote: > Here are some sketch-quality patches to try out some of these ideas, > for discussion. I gave them .txt endings so as not to hijack your > thread's CI. I like the direction this is going. I will commit the char32_t work anyway, so afterward feel free to hijack the thread (there's a lot of good information here so continuing here might be more productive than starting a new thread). Regarding 0002, IIUC, for PG_WCHAR_UTF32, surrogates are forbidden, but the comment about UTF-16 is a bit vague. I think we should add some asserts to make it clear. The basic communication mechanism between the modules is the database encoding: it determines PgWcharEncodingScheme in both wchar.c and pg_locale_libc.c. That seems reasonable to me, and doesn't interfere with the other providers. I'm still not quite sure how this fits with ICU in a single-byte encoding, but doesn't seem worse than what we do currently. Also, tangentially, I'm a bit anxious to do a permanent setlocale(LC_CTYPE, "C"), and we are very close. If these two threads are successful, I believe we can do it: https://www.postgresql.org/message-id/90f176c5b85b9da26a3265b2630ece3552068566.camel%40j-davis.com https://www.postgresql.org/message-id/d9657a6e51aa20702447bb2386b32fea6218670f.camel@j-davis.com That would be a big simplification because it would isolate libc ctype behavior to pg_locale_libc.c. That would make me feel generally more comfortable with additional work in this area. Regards, Jeff Davis
В списке pgsql-hackers по дате отправления: