Re: [GENERAL] psql weird behaviour with charset encodings
От | Noah Misch |
---|---|
Тема | Re: [GENERAL] psql weird behaviour with charset encodings |
Дата | |
Msg-id | 20150523174306.GA3974893@tornado.leadboat.com обсуждение исходный текст |
Ответ на | Re: [GENERAL] psql weird behaviour with charset encodings (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [GENERAL] psql weird behaviour with charset encodings
|
Список | pgsql-hackers |
On Sat, May 08, 2010 at 09:24:45PM -0400, Tom Lane wrote: > hgonzalez@gmail.com writes: > > http://sources.redhat.com/bugzilla/show_bug.cgi?id=649 > > > The last explains why they do not consider it a bug: > > > ISO C99 requires for %.*s to only write complete characters that fit below > > the > > precision number of bytes. If you are using say UTF-8 locale, but ISO-8859-1 > > characters as shown in the input file you provided, some of the strings are > > not valid UTF-8 strings, therefore sprintf fails with -1 because of the > > encoding error. That's not a bug in glibc. > > Yeah, that was about the position I thought they'd take. GNU libc eventually revisited that conclusion and fixed the bug through commit 715a900c9085907fa749589bf738b192b1a2bda5. RHEL 7.1 is fixed, but RHEL 6.6 and RHEL 5.11 are still affected; the bug will be relevant for another 8+ years. > So the bottom line here is that we're best off to avoid %.*s because > it may fail if the string contains data that isn't validly encoded > according to libc's idea of the prevailing encoding. Yep. Immediate precisions like %.10s trigger the bug as effectively as %.*s, so tarCreateHeader() [_tarWriteHeader() in 9.2 and earlier] is also affected. Switching to strlcpy(), as attached, fixes the bug while simplifying the code. The bug symptom is error 'pg_basebackup: unrecognized link indicator "0"' when the name of a file in the data directory is not a valid multibyte string. Commit 6dd9584 introduced a new use of .*s, to pg_upgrade. It works reliably for now, because it always runs in the C locale. pg_upgrade never calls set_pglocale_pgservice() or otherwise sets its permanent locale. It would be natural for us to fix that someday, at which point non-ASCII database names would perturb this status output. It would be good to purge the code of precisions on "s" conversion specifiers, then Assert(!pointflag) in fmtstr() to catch new introductions. I won't plan to do it myself, but it would be a nice little defensive change.
Вложения
В списке pgsql-hackers по дате отправления: