Обсуждение: Clear up strxfrm() in UTF-8 with locale on Windows

Поиск
Список
Период
Сортировка

Clear up strxfrm() in UTF-8 with locale on Windows

От
ITAGAKI Takahiro
Дата:
The attached patch clears up the usage of strxfrm() on Windows. If the
server encoding is UTF-8 and the locale is not C, we should use wcsxfrm()
instead of strxfrm() because UTF-8 locale are not supported on Windows.
We've already have a special version of strcoll() for Windows, but the
usage of strxfrm() was still broken.

When we are caught up in the bug, we see the next error message.
| ERROR:  invalid memory alloc request size 2147483648
If the server is wrong configured between the server encoding and the
locale, strxfrm() could be failed and return values like INT_MAX or
(size_t)-1. We've passed the result+1 straight to palloc(), so the server
tried to allocale more than 1GB of memory and gave up.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Вложения

Re: Clear up strxfrm() in UTF-8 with locale on Windows

От
"Hiroshi Saito"
Дата:
From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp>

> The attached patch clears up the usage of strxfrm() on Windows. If the
> server encoding is UTF-8 and the locale is not C, we should use wcsxfrm()
> instead of strxfrm() because UTF-8 locale are not supported on Windows.
> We've already have a special version of strcoll() for Windows, but the
> usage of strxfrm() was still broken.
>
> When we are caught up in the bug, we see the next error message.
> | ERROR:  invalid memory alloc request size 2147483648
> If the server is wrong configured between the server encoding and the
> locale, strxfrm() could be failed and return values like INT_MAX or
> (size_t)-1. We've passed the result+1 straight to palloc(), so the server
> tried to allocale more than 1GB of memory and gave up.

Ahh..., Certainly, the bug lurked there. probably, your patch will help it.
It was not pursued in Japan for the reasons that the locale had been
recommended to be used by C up to now. however, It seems to have
caused the user's confusion. In that sense, I vote.+1

But, I am skeptic the locale setting still functions correctly.

However, I think it is great work.:-)

Thanks.

Regards,
Hiroshi Saito



Re: Clear up strxfrm() in UTF-8 with locale on Windows

От
Bruce Momjian
Дата:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


ITAGAKI Takahiro wrote:
> The attached patch clears up the usage of strxfrm() on Windows. If the
> server encoding is UTF-8 and the locale is not C, we should use wcsxfrm()
> instead of strxfrm() because UTF-8 locale are not supported on Windows.
> We've already have a special version of strcoll() for Windows, but the
> usage of strxfrm() was still broken.
>
> When we are caught up in the bug, we see the next error message.
> | ERROR:  invalid memory alloc request size 2147483648
> If the server is wrong configured between the server encoding and the
> locale, strxfrm() could be failed and return values like INT_MAX or
> (size_t)-1. We've passed the result+1 straight to palloc(), so the server
> tried to allocale more than 1GB of memory and gave up.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Clear up strxfrm() in UTF-8 with locale on Windows

От
Magnus Hagander
Дата:
ITAGAKI Takahiro wrote:
> The attached patch clears up the usage of strxfrm() on Windows. If the
> server encoding is UTF-8 and the locale is not C, we should use wcsxfrm()
> instead of strxfrm() because UTF-8 locale are not supported on Windows.
> We've already have a special version of strcoll() for Windows, but the
> usage of strxfrm() was still broken.
>
> When we are caught up in the bug, we see the next error message.
> | ERROR:  invalid memory alloc request size 2147483648
> If the server is wrong configured between the server encoding and the
> locale, strxfrm() could be failed and return values like INT_MAX or
> (size_t)-1. We've passed the result+1 straight to palloc(), so the server
> tried to allocale more than 1GB of memory and gave up.

I was just about to commit this with the following two changes:
* wcsxfrm() sets errno, so you can't use GetLastError() to report problems
* The code added a check for return value >= INT_MAX on Unix as well,
but the spec for strxfrm() says that there is no specific return value
for failure.

Put those in there for reference. But I also recalled a previous
discussion, and found this:
http://archives.postgresql.org/pgsql-hackers/2005-08/msg00760.php

Given this, perhaps the proper approach should instead be to just check
the return value, and go from there? Should be a simple enough patch,
something like the attached.

Tom, can you comment?

Takahiro, can you test if this patch fixes your problem?

//Magnus
Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.233
diff -c -r1.233 selfuncs.c
*** src/backend/utils/adt/selfuncs.c    21 Apr 2007 21:01:45 -0000    1.233
--- src/backend/utils/adt/selfuncs.c    2 May 2007 20:38:58 -0000
***************
*** 3152,3157 ****
--- 3152,3165 ----
  #else
          xfrmlen = strxfrm(NULL, val, 0);
  #endif
+ #ifdef WIN32
+         /* On win32, if strxfrm fails (for example in UTF8 encoding, since
+          * it's not properly supported), return the original string instead
+          * of trying to allocate 2Gb memory.
+          */
+         if (xfrmlen >= INT_MAX)
+             return val;
+ #endif
          xfrmstr = (char *) palloc(xfrmlen + 1);
          xfrmlen2 = strxfrm(xfrmstr, val, xfrmlen + 1);
          Assert(xfrmlen2 <= xfrmlen);

Re: Clear up strxfrm() in UTF-8 with locale on Windows

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> Given this, perhaps the proper approach should instead be to just check
> the return value, and go from there? Should be a simple enough patch,
> something like the attached.

> Tom, can you comment?

Testing against INT_MAX seems like a type pun, or something.  Maybe use
MaxAllocSize instead?

    if (xfrmlen >= MaxAllocSize)
        return val;

Also, since as you note returning (size_t) -1 is not at all standard,
it would be helpful to readers to note that that's what Windows does
on failure and that's what you're testing for.  In fact you could
make a good case that the test should be just

    if (xfrmlen == (size_t) -1)
        return val;

            regards, tom lane

Re: Clear up strxfrm() in UTF-8 with locale on Windows

От
Magnus Hagander
Дата:
On Wed, May 02, 2007 at 05:25:39PM -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > Given this, perhaps the proper approach should instead be to just check
> > the return value, and go from there? Should be a simple enough patch,
> > something like the attached.
>
> > Tom, can you comment?
>
> Testing against INT_MAX seems like a type pun, or something.  Maybe use
> MaxAllocSize instead?

The windows API documentation specifically says:
On an error, each function sets errno and returns INT_MAX.

So actually an equality test against INT_MAX would be correct. But making
that clear in the comment would probably not be a bad idea :-)

//Magnus


Re: Clear up strxfrm() in UTF-8 with locale on Windows

От
Magnus Hagander
Дата:
Magnus Hagander wrote:
> On Wed, May 02, 2007 at 05:25:39PM -0400, Tom Lane wrote:
>> Magnus Hagander <magnus@hagander.net> writes:
>>> Given this, perhaps the proper approach should instead be to just check
>>> the return value, and go from there? Should be a simple enough patch,
>>> something like the attached.
>>> Tom, can you comment?
>> Testing against INT_MAX seems like a type pun, or something.  Maybe use
>> MaxAllocSize instead?
>
> The windows API documentation specifically says:
> On an error, each function sets errno and returns INT_MAX.
>
> So actually an equality test against INT_MAX would be correct. But making
> that clear in the comment would probably not be a bad idea :-)

I have applied a fix for this, because it obviously needed fixing
regardless of if it fixes the original issue all the way. Still looking
for confirmation if it does, though.

//Magnus

Re: Clear up strxfrm() in UTF-8 with locale on Windows

От
ITAGAKI Takahiro
Дата:
Magnus Hagander <magnus@hagander.net> wrote:

> > So actually an equality test against INT_MAX would be correct. But making
> > that clear in the comment would probably not be a bad idea :-)
>
> I have applied a fix for this, because it obviously needed fixing
> regardless of if it fixes the original issue all the way. Still looking
> for confirmation if it does, though.

The fix works well. I tested it with UTF-8 encoding and Japanese_Japan.932
(SJIS) locale and I didn't see any errors. I think it it is enough to cover
the buggy UTF-8 treatment on Windows.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: Clear up strxfrm() in UTF-8 with locale on Windows

От
"Magnus Hagander"
Дата:
> > > So actually an equality test against INT_MAX would be correct. But making
> > > that clear in the comment would probably not be a bad idea :-)
> >
> > I have applied a fix for this, because it obviously needed fixing
> > regardless of if it fixes the original issue all the way. Still looking
> > for confirmation if it does, though.
>
> The fix works well. I tested it with UTF-8 encoding and Japanese_Japan.932
> (SJIS) locale and I didn't see any errors. I think it it is enough to cover
> the buggy UTF-8 treatment on Windows.
>

Thanks for testing.

Bruce, that means the original patch is not needed, so you can take it off the patch queue. Thanks.

/Magnus