Re: Mac OS: invalid byte sequence for encoding "UTF8"
От | Artur Zakirov |
---|---|
Тема | Re: Mac OS: invalid byte sequence for encoding "UTF8" |
Дата | |
Msg-id | 56BB3D95.7030502@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Mac OS: invalid byte sequence for encoding "UTF8" (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Mac OS: invalid byte sequence for encoding "UTF8"
Re: Mac OS: invalid byte sequence for encoding "UTF8" |
Список | pgsql-hackers |
On 09.02.2016 20:13, Tom Lane wrote: > I do not like this patch much. It is basically "let's stop using sscanf() > because it seems to have a bug on one platform". There are at least two > things wrong with that approach: > > 1. By my count there are about 80 uses of *scanf() in our code. Are we > going to replace every one of them with hand-rolled code? If not, why > is only this instance vulnerable? How can we know whether future uses > will have a problem? It seems that *scanf() with %s format occures only here: - check.c - get_bin_version() - server.c - get_major_server_version() - filemap.c - isRelDataFile() - pg_backup_directory.c - _LoadBlobs() - xlog.c - do_pg_stop_backup() - mac.c - macaddr_in() I think here sscanf() do not works with the UTF-8 characters. And probably this is only spell.c issue. I agree that previous patch is wrong. Instead of using new parse_ooaffentry() function maybe better to use sscanf() with %ls format. The %ls format is used to read a wide character string. > > 2. We're not being very good citizens of the software universe if we > just install a hack in Postgres rather than nagging Apple to fix the > bug at its true source. > > I think the appropriate next step to take is to dig into the OS X > sources (see http://www.opensource.apple.com, I think probably the > relevant code is in the Libc package) and identify exactly what is > causing the misbehavior. That would both allow an informed answer > to point #1 and greatly increase the odds of getting action on a > bug report to Apple. Even if we end up applying this patch verbatim, > I think we need that information first. > > regards, tom lane > I think this is not a bug. It is a normal behavior. In Mac OS sscanf() with the %s format reads the string one character at a time. The size of letter 'х' is 2. And sscanf() separate it into two wrong characters. In conclusion, I think in spell.c should be used sscanf() with %ls format. -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
В списке pgsql-hackers по дате отправления: