Re: Rework of collation code, extensibility
От | Jeff Davis |
---|---|
Тема | Re: Rework of collation code, extensibility |
Дата | |
Msg-id | 052a5ed874d110be2f3ae28752e363306b10966d.camel@j-davis.com обсуждение исходный текст |
Ответ на | Re: Rework of collation code, extensibility (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Rework of collation code, extensibility
|
Список | pgsql-hackers |
On Fri, 2023-01-13 at 11:57 -0800, Peter Geoghegan wrote: > You're adding a layer of indirection that's going to set things up > for > later patches that add a layer of indirection for version ICU > libraries (and even libc itself), and some of the details only make > sense in that context. This isn't just refactoring work that could > just as easily have happened in some quite different context. Right, well put. I have two goals and felt that they merged into one patchset, but I think that caused more confusion. The first goal I had was simply that the code was really hard to understand and work on, and refactoring was justified to improve the situation. The second goal, which is somewhat dependent on the first goal, is that we really need an ability to support multiple ICU libraries, and I wanted to do some common groundwork that would be needed for any approach we choose there, and provide some hooks to get us there. You are right that this goal influenced the first goal. I attached new patches: v7-0001: pg_strcoll and pg_strxfrm patches combined, your comments addressed v7-0002: add pg_locale_internal.h (and other refactoring) I will post the other patches in the other thread. > That means that pg_strcoll()'s relationship to pg_strxfrm()'s isn't > the same as the well known strcoll()/strxfrm() relationship. That's a really good point. I changed tiebreaking to be the caller's responsibility. > * varstrfastcmp_locale() is no longer capable of calling > ucol_strcollUTF8() through the shim interface, meaning that it has to > determine string length based on NUL-termination, when in principle > it > could just use the known length of the string. I think you misread, it still calls ucol_strcollUTF8() when applicable, which is impoartant because otherwise it would require a conversion to a UChar string. ucol_strcollUTF8() accepts -1 to mean "nul-terminated". I did some basic testing and it doesn't seem like it's slower than using the length. If passing the length is faster for some reason, it would complicate the API because we'd need an entry point that's expecting nul-termination and lengths, which is awkward (as Peter E. pointed out). > * I don't see much point in this new varstr_abbrev_convert() > variable: > > + const size_t max_prefix_bytes = sizeof(Datum); > > varstr_abbrev_convert() is concerned with packing abbreviated key > bytes into Datums, so it's perfectly reasonable to deal with > Datums/sizeof(Datum) directly. I felt it was a little clearer amongst the other code, to a casual reader, but I suppose it's a style thing. I will change it if you insist. > * Having a separate pg_strxfrm_prefix_libc() function just to throw > an > error doesn't really add much IMV. Removed. > Comments on 0003-*: > > I suggest that some of the very trivial functions you have here (such > as pg_locale_deterministic()) be made inline functions. I'd have to expose the pg_locale_t struct, which didn't seem desirable to me. Do you think it's enough of a performance concern to be worth some ugliness there? -- Jeff Davis PostgreSQL Contributor Team - AWS
Вложения
В списке pgsql-hackers по дате отправления: