Re: BUG #19354: JOHAB rejects valid byte sequences
| От | Tom Lane |
|---|---|
| Тема | Re: BUG #19354: JOHAB rejects valid byte sequences |
| Дата | |
| Msg-id | 2393116.1765899706@sss.pgh.pa.us обсуждение исходный текст |
| Ответ на | Re: BUG #19354: JOHAB rejects valid byte sequences (Robert Haas <robertmhaas@gmail.com>) |
| Ответы |
Re: BUG #19354: JOHAB rejects valid byte sequences
Re: BUG #19354: JOHAB rejects valid byte sequences |
| Список | pgsql-bugs |
Robert Haas <robertmhaas@gmail.com> writes: > ... So I went looking for > where we got the mapping tables from. UCS_to_JOHAB.pl expects to read > from a file JOHAB.TXT, of which the latest version seems to be found > here: > https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/JOHAB.TXT > And indeed, if I run UCS_to_JOHAB.pl on that JOHAB.txt file, it > regenerates the current mapping files. Thanks for doing that research! > So apparently we've > got the "right" mappings, but you can only actually the ones that > match the code's rules for something to be a valid multi-byte > character, which aren't actually in sync with the mapping table. Yeah. Looking at the code in wchar.c, it's clear that it thinks that JOHAB has the same character-length rules as EUC_KR, which is something that one might guess based on available documentation that says it's related to that encoding. So I can see how we got here. However, that doesn't mean we can fix pg_johab_mblen() and we're done. I'm still quite afraid that we'd be introducing security-grade inconsistencies of interpretation between different PG versions. > I'm > left with the conclusions that (1) nobody ever actually tried using > this encoding for anything real until 3 days ago and (2) we don't have > any testing infrastructure that verifies that the characters in the > mapping tables are actually accepted by pg_verifymbstr(). I wonder how > many other encodings we have that don't actually work? Indeed. Anyone want to do some testing? regards, tom lane
В списке pgsql-bugs по дате отправления: