Обсуждение: Path case sensitivity on windows
Bug #4694 (http://archives.postgresql.org/message-id/200903050848.n258mVgm046178@wwwmaster.postgresql.org) shows a very strange behaviour on windows when you use a different case PATH >From what I can tell, this is because dir_strcmp() is case sensitive, but paths on windows are really case-insensitive. Attached patch fixes this in my testcase. Can anybody spot something wrong with it? If not, I'll apply once I've finished my test runs:-) //Magnus diff --git a/src/port/path.c b/src/port/path.c index 708306d..d7bd353 100644 --- a/src/port/path.c +++ b/src/port/path.c @@ -427,7 +427,12 @@ dir_strcmp(const char *s1, const char *s2) { while (*s1 && *s2) { +#ifndef WIN32 if (*s1 != *s2 && +#else + /* On windows, paths are case-insensitive */ + if (tolower(*s1) != tolower(*s2) && +#endif !(IS_DIR_SEP(*s1) && IS_DIR_SEP(*s2))) return (int) *s1 - (int) *s2; s1++, s2++;
Magnus Hagander <magnus@hagander.net> writes: > Attached patch fixes this in my testcase. Can anybody spot something > wrong with it? It depends on tolower(), which is going to have LC_CTYPE-dependent behavior, which is surely wrong? regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Attached patch fixes this in my testcase. Can anybody spot something >> wrong with it? > > It depends on tolower(), which is going to have LC_CTYPE-dependent > behavior, which is surely wrong? Not sure, really :) That's the encoding we'd get the paths in in the first place, is it not? Or are you just saying we should be using pg_tolower()? (which I forgot about yet again) //Magnus
Magnus Hagander <magnus@hagander.net> writes: > Tom Lane wrote: >> It depends on tolower(), which is going to have LC_CTYPE-dependent >> behavior, which is surely wrong? > Or are you just saying we should be using pg_tolower()? (which I forgot > about yet again) Well, I'd be happier with pg_tolower, because I know what it does. But the real question here is what does "case insensitivity" on file names actually mean in Windows --- ie, what happens to non-ASCII letters? regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Tom Lane wrote: >>> It depends on tolower(), which is going to have LC_CTYPE-dependent >>> behavior, which is surely wrong? > >> Or are you just saying we should be using pg_tolower()? (which I forgot >> about yet again) > > Well, I'd be happier with pg_tolower, because I know what it does. > But the real question here is what does "case insensitivity" on > file names actually mean in Windows --- ie, what happens to non-ASCII > letters? The filesystem itself is UTF-16. I would assume the "system default" locale controls the case insensitivity, but I'm not sure about that. Reading up some, it seems the collation is actually stored in a hidden file on the NTFS volume... It seems to differ between different versions of windows from what I can tell, but since this is written to the fs, it's ok. I have not found a way to actually *get* the locale.. Or even to compare two filenames. There is a function called GetFullPathName(), but I'm not sure how to use it for this. However. I don't think it's really critical that we deal with all corner cases for this. It's not likely that the user would be using any really weird locale-specific combinations *differently* in the PATH variable vs the commandline, or something like that... And this only shows up when the binary is found in the PATH and not through a fully specified directory. This is, AFAICT, the only case where they can differ. This is the reason why we haven't had any reports of this before - nobody using the installer, or doing even a "normal style" install would ever end up in this situation. //Magnus
Magnus Hagander <magnus@hagander.net> writes: > And this only shows up when the binary is found in the PATH and not > through a fully specified directory. This is, AFAICT, the only case > where they can differ. This is the reason why we haven't had any reports > of this before - nobody using the installer, or doing even a "normal > style" install would ever end up in this situation. Hmm. Well, if we use pg_tolower then it will only do the right thing for ASCII letters, but it seems like non-ASCII in the path leading to the postgres binaries would be pretty dang unusual. (And I am not convinced tolower() would get it right either --- it certainly won't if the encoding is multibyte.) On balance I'd suggest just using pg_tolower and figuring it's close enough. regards, tom lane
On Thursday 02 April 2009 18:29:45 Tom Lane wrote: > Hmm. Well, if we use pg_tolower then it will only do the right thing > for ASCII letters, but it seems like non-ASCII in the path leading to > the postgres binaries would be pretty dang unusual. Well, Windows localizes the directory names like C:\Program Files, so it is entirely plausible to have non-ASCII path names across the board in certain locales.