Re: GB18030-2022 Support in PostgreSQL

Поиск
Список
Период
Сортировка
От Chao Li
Тема Re: GB18030-2022 Support in PostgreSQL
Дата
Msg-id CAEoWx2=BWDFXpB9OhfoKJGsU-Lk+7oQ8SW7a5GyoufLiFTWO8g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GB18030-2022 Support in PostgreSQL  (John Naylor <johncnaylorls@gmail.com>)
Список pgsql-hackers

On Mon, Sep 29, 2025 at 12:03 PM John Naylor <johncnaylorls@gmail.com> wrote:
On Wed, Sep 24, 2025 at 4:18 PM Chao Li <li.evan.chao@gmail.com> wrote:
> I am not sure if you should also upgrade the UCM file to 2022 version, but if we need, let’s do it with a separate commit.

If they can all use the same file, we should just do that for the sake
of simplicity, in which case a separate commit is just extra noise.


In v3, I have updated EUC_CN to use gb18030-2022.ucm. Fortunately, the map files are unchanged, so we don't have to do much testing for EUC_CN.

For UHC, in the icu master branch https://github.com/unicode-org/icu/tree/main/icu4c/source/data/mappings, there is still windows-949-2000.ucm, thus only download URL is changed, file content is unchanged.

```
% make utf8_to_uhc.map utf8_to_euc_cn.map
wget -O windows-949-2000.ucm --no-use-server-timestamps https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/windows-949-2000.ucm
--2025-09-29 16:00:40--  https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/windows-949-2000.ucm
HTTP request sent, awaiting response... 200 OK
Length: 356253 (348K) [text/plain]
Saving to: ‘windows-949-2000.ucm’

windows-949-2000.ucm                             100%[=========================================================================================================>] 347.90K   222KB/s    in 1.6s

2025-09-29 16:00:43 (222 KB/s) - ‘windows-949-2000.ucm’ saved [356253/356253]

'/usr/bin/perl' -I . UCS_to_UHC.pl
- Writing UTF8=>UHC conversion table: utf8_to_uhc.map
- Writing UHC=>UTF8 conversion table: uhc_to_utf8.map
wget -O gb18030-2022.ucm --no-use-server-timestamps https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/gb18030-2022.ucm
--2025-09-29 16:00:43--  https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/mappings/gb18030-2022.ucm
HTTP request sent, awaiting response... 200 OK
Length: 675312 (659K) [text/plain]
Saving to: ‘gb18030-2022.ucm’

gb18030-2022.ucm                                 100%[=========================================================================================================>] 659.48K  1.33MB/s    in 0.5s

2025-09-29 16:00:44 (1.33 MB/s) - ‘gb18030-2022.ucm’ saved [675312/675312]

'/usr/bin/perl' -I . UCS_to_EUC_CN.pl
- Writing UTF8=>EUC_CN conversion table: utf8_to_euc_cn.map
- Writing EUC_CN=>UTF8 conversion table: euc_cn_to_utf8.map
% git diff
%
```

Please note, I didn't include the deletion of gb-18030-2000.xml in v3, because that will cause the patch file to be too big, thus requiring an approval process for the email to land in the Mail Archive. Please delete the xml file when you push the commit.

Best regards,
Chao Li (Evan)
---------------------
HighGo Software Co., Ltd.
Вложения

В списке pgsql-hackers по дате отправления: