Обсуждение: Clear up strxfrm() in UTF-8 with locale on Windows
The attached patch clears up the usage of strxfrm() on Windows. If the server encoding is UTF-8 and the locale is not C, we should use wcsxfrm() instead of strxfrm() because UTF-8 locale are not supported on Windows. We've already have a special version of strcoll() for Windows, but the usage of strxfrm() was still broken. When we are caught up in the bug, we see the next error message. | ERROR: invalid memory alloc request size 2147483648 If the server is wrong configured between the server encoding and the locale, strxfrm() could be failed and return values like INT_MAX or (size_t)-1. We've passed the result+1 straight to palloc(), so the server tried to allocale more than 1GB of memory and gave up. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Вложения
From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp> > The attached patch clears up the usage of strxfrm() on Windows. If the > server encoding is UTF-8 and the locale is not C, we should use wcsxfrm() > instead of strxfrm() because UTF-8 locale are not supported on Windows. > We've already have a special version of strcoll() for Windows, but the > usage of strxfrm() was still broken. > > When we are caught up in the bug, we see the next error message. > | ERROR: invalid memory alloc request size 2147483648 > If the server is wrong configured between the server encoding and the > locale, strxfrm() could be failed and return values like INT_MAX or > (size_t)-1. We've passed the result+1 straight to palloc(), so the server > tried to allocale more than 1GB of memory and gave up. Ahh..., Certainly, the bug lurked there. probably, your patch will help it. It was not pursued in Japan for the reasons that the locale had been recommended to be used by C up to now. however, It seems to have caused the user's confusion. In that sense, I vote.+1 But, I am skeptic the locale setting still functions correctly. However, I think it is great work.:-) Thanks. Regards, Hiroshi Saito
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- ITAGAKI Takahiro wrote: > The attached patch clears up the usage of strxfrm() on Windows. If the > server encoding is UTF-8 and the locale is not C, we should use wcsxfrm() > instead of strxfrm() because UTF-8 locale are not supported on Windows. > We've already have a special version of strcoll() for Windows, but the > usage of strxfrm() was still broken. > > When we are caught up in the bug, we see the next error message. > | ERROR: invalid memory alloc request size 2147483648 > If the server is wrong configured between the server encoding and the > locale, strxfrm() could be failed and return values like INT_MAX or > (size_t)-1. We've passed the result+1 straight to palloc(), so the server > tried to allocale more than 1GB of memory and gave up. > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
ITAGAKI Takahiro wrote: > The attached patch clears up the usage of strxfrm() on Windows. If the > server encoding is UTF-8 and the locale is not C, we should use wcsxfrm() > instead of strxfrm() because UTF-8 locale are not supported on Windows. > We've already have a special version of strcoll() for Windows, but the > usage of strxfrm() was still broken. > > When we are caught up in the bug, we see the next error message. > | ERROR: invalid memory alloc request size 2147483648 > If the server is wrong configured between the server encoding and the > locale, strxfrm() could be failed and return values like INT_MAX or > (size_t)-1. We've passed the result+1 straight to palloc(), so the server > tried to allocale more than 1GB of memory and gave up. I was just about to commit this with the following two changes: * wcsxfrm() sets errno, so you can't use GetLastError() to report problems * The code added a check for return value >= INT_MAX on Unix as well, but the spec for strxfrm() says that there is no specific return value for failure. Put those in there for reference. But I also recalled a previous discussion, and found this: http://archives.postgresql.org/pgsql-hackers/2005-08/msg00760.php Given this, perhaps the proper approach should instead be to just check the return value, and go from there? Should be a simple enough patch, something like the attached. Tom, can you comment? Takahiro, can you test if this patch fixes your problem? //Magnus Index: src/backend/utils/adt/selfuncs.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v retrieving revision 1.233 diff -c -r1.233 selfuncs.c *** src/backend/utils/adt/selfuncs.c 21 Apr 2007 21:01:45 -0000 1.233 --- src/backend/utils/adt/selfuncs.c 2 May 2007 20:38:58 -0000 *************** *** 3152,3157 **** --- 3152,3165 ---- #else xfrmlen = strxfrm(NULL, val, 0); #endif + #ifdef WIN32 + /* On win32, if strxfrm fails (for example in UTF8 encoding, since + * it's not properly supported), return the original string instead + * of trying to allocate 2Gb memory. + */ + if (xfrmlen >= INT_MAX) + return val; + #endif xfrmstr = (char *) palloc(xfrmlen + 1); xfrmlen2 = strxfrm(xfrmstr, val, xfrmlen + 1); Assert(xfrmlen2 <= xfrmlen);
Magnus Hagander <magnus@hagander.net> writes: > Given this, perhaps the proper approach should instead be to just check > the return value, and go from there? Should be a simple enough patch, > something like the attached. > Tom, can you comment? Testing against INT_MAX seems like a type pun, or something. Maybe use MaxAllocSize instead? if (xfrmlen >= MaxAllocSize) return val; Also, since as you note returning (size_t) -1 is not at all standard, it would be helpful to readers to note that that's what Windows does on failure and that's what you're testing for. In fact you could make a good case that the test should be just if (xfrmlen == (size_t) -1) return val; regards, tom lane
On Wed, May 02, 2007 at 05:25:39PM -0400, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > Given this, perhaps the proper approach should instead be to just check > > the return value, and go from there? Should be a simple enough patch, > > something like the attached. > > > Tom, can you comment? > > Testing against INT_MAX seems like a type pun, or something. Maybe use > MaxAllocSize instead? The windows API documentation specifically says: On an error, each function sets errno and returns INT_MAX. So actually an equality test against INT_MAX would be correct. But making that clear in the comment would probably not be a bad idea :-) //Magnus
Magnus Hagander wrote: > On Wed, May 02, 2007 at 05:25:39PM -0400, Tom Lane wrote: >> Magnus Hagander <magnus@hagander.net> writes: >>> Given this, perhaps the proper approach should instead be to just check >>> the return value, and go from there? Should be a simple enough patch, >>> something like the attached. >>> Tom, can you comment? >> Testing against INT_MAX seems like a type pun, or something. Maybe use >> MaxAllocSize instead? > > The windows API documentation specifically says: > On an error, each function sets errno and returns INT_MAX. > > So actually an equality test against INT_MAX would be correct. But making > that clear in the comment would probably not be a bad idea :-) I have applied a fix for this, because it obviously needed fixing regardless of if it fixes the original issue all the way. Still looking for confirmation if it does, though. //Magnus
Magnus Hagander <magnus@hagander.net> wrote: > > So actually an equality test against INT_MAX would be correct. But making > > that clear in the comment would probably not be a bad idea :-) > > I have applied a fix for this, because it obviously needed fixing > regardless of if it fixes the original issue all the way. Still looking > for confirmation if it does, though. The fix works well. I tested it with UTF-8 encoding and Japanese_Japan.932 (SJIS) locale and I didn't see any errors. I think it it is enough to cover the buggy UTF-8 treatment on Windows. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
> > > So actually an equality test against INT_MAX would be correct. But making > > > that clear in the comment would probably not be a bad idea :-) > > > > I have applied a fix for this, because it obviously needed fixing > > regardless of if it fixes the original issue all the way. Still looking > > for confirmation if it does, though. > > The fix works well. I tested it with UTF-8 encoding and Japanese_Japan.932 > (SJIS) locale and I didn't see any errors. I think it it is enough to cover > the buggy UTF-8 treatment on Windows. > Thanks for testing. Bruce, that means the original patch is not needed, so you can take it off the patch queue. Thanks. /Magnus