Re: OK, that's one LOCALE bug report too many...

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: OK, that's one LOCALE bug report too many...
Дата
Msg-id 17693.975105090@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: OK, that's one LOCALE bug report too many...  (Peter Eisentraut <peter_e@gmx.net>)
Ответы Re: OK, that's one LOCALE bug report too many...  (Peter Eisentraut <peter_e@gmx.net>)
Re: OK, that's one LOCALE bug report too many...  (Karel Zak <zakkr@zf.jcu.cz>)
Список pgsql-hackers
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> I propose, therefore, that in an --enable-locale installation, initdb
>> should save its values for LC_COLLATE and LC_CTYPE in pg_control, and
>> backend startup should restore these settings from pg_control.

> Note that when these are unset there might still be a "catch-all" locale
> value coming from the LANG env. var. (or LC_ALL on some systems).

Actually, what I intend to do while writing pg_control is read the
current effective values via "setlocale(category, NULL)" --- then it
shouldn't matter where they came from, no?

This brings up a question I had just come across while doing further
research: backend/main/main.c does 

#ifdef USE_LOCALE   setlocale(LC_CTYPE, "");    /* take locale information from an                                *
environment*/   setlocale(LC_COLLATE, "");   setlocale(LC_MONETARY, "");
 
#endif

which seems a little odd --- why not setlocale(LC_ALL, "") ?  Karel
Zak said in a thread around 8/15/00 that this is deliberate, but
I don't quite see why.

>> Also, since "LC_COLLATE=en_US" seems to misbehave rather spectacularly
>> on recent RedHat releases, I propose that initdb change "en_US" to "C"
>> if it finds that setting.  (Are there any platforms where there are
>> non-bogus differences between the two?)

> There *should* be differences and it is definitely not okay to mix them
> up.

I have now received positive proof that en_US sort order on RedHat is
broken.  For example, it asserts'/root/' < '/root0'
but'/root/t' > '/root0'
I defy you to find anyone in the US who will say that that is a
reasonable definition of string collation.  

Of course, if you prefer the notion of disabling LIKE optimization
on a default RedHat installation, we can go ahead and accept en_US.
But I say it's broken and we shouldn't use it.

>> Finally, until we have a really bulletproof solution for LIKE indexing
>> optimization, I will disable that optimization if --enable-locale is
>> compiled *and* LC_COLLATE is not C.  Better to get "LIKE is slow" bug
>> reports than "LIKE gives wrong answers" bug reports.

> (C or POSIX)

Do you think there are cases where setlocale(,NULL) will give back
"POSIX" rather than "C"?  We can certainly test for either.

> I have a question about that optimization:  If you have X LIKE 'foo%',
> wouldn't it be enough to use X >= 'foo' (which certainly works for any
> locale I've ever heard of)?  Why do you need the X <= 'foo???' at all?

Because you need a two-sided index constraint, not a one-sided one.
Otherwise you're probably better off doing a sequential scan ---
scanning 50% of the table (on average) via an index will be slower
than sequential.

>> Comments?  Anyone think that initdb should lock down more categories
>> than just these two?

> Not sure whether LC_CTYPE is necessary.

I'm not either, but I'm afraid to leave it float...
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: OK, that's one LOCALE bug report too many...
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: OK, that's one LOCALE bug report too many...