On Thu, 2009-10-15 at 00:43 +0300, Peter Eisentraut wrote:
> On Sun, 2009-10-04 at 10:48 -0400, Tom Lane wrote:
> > Peter Eisentraut <peter_e@gmx.net> writes:
> > > I understand the annoyance, but I think we do need to have an organized
> > > way to do testing of non-ASCII data and in particular UTF8 data, because
> > > there are an increasing number of special code paths for those.
> >
> > Well, if you want to keep the test, we should put in the variant with
> > \200, because it is now clear that that is in fact the right answer
> > in a nontrivial number of environments (arguably *more* cases than
> > in which "\u0080" is correct).
>
> I put in a new variant file. Let's see if it works.
[http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpython/expected/plpython_unicode_0.out]
Actually, what I committed was really the output I got. Now with your
commit my tests started failing again.
The difference turns out to be caused by glibc. When you print an
invalid UTF-8 byte sequence using "%.*s" when LC_CTYPE is a UTF-8 locale
(e.g., en_US.utf8), it prints nothing. Presumably, it gets confused
counting the characters for aligning the field width.
Test program:
#include <locale.h>
#include <stdio.h>
int
main()
{ setlocale(LC_ALL, ""); printf("%.*s", 1, "\200"); return 0;
}
This prints nothing (check with od) when LC_CTYPE is en_US.utf8.
I think this can be filed under trouble caused by mismatching LC_CTYPE
and client encoding and doesn't need further fixing, but it's good to
keep in mind.
Let's see what the Solaris builds say now.