Re: sortsupport for text

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: sortsupport for text
Дата
Msg-id CAEYLb_VjJijP4B-ZvZjKvfF=weJSvn76M6BiWOdoRDfH2p8FHg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: sortsupport for text  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: sortsupport for text  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 14 June 2012 20:32, Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, but *it doesn't matter*.  If you test this on strings that are
> long enough that they get pushed out to TOAST, you'll find that it
> doesn't measurably improve performance, because the overhead of
> detoasting so completely dominates any savings on the palloc side that
> you can't pick them out of the inter-run noise.

That's probably true, but it's also beside the point. As recently as a
few hours ago, you yourself said "my guess is that most values people
sort by are pretty short, making this concern mostly academic". Why
are you getting hung up on toasting now?

> Here we know that it doesn't matter, so the application of Knuth's first law
> of optimization is appropriate.

I'm not advocating some Byzantine optimisation, or even something that
could reasonably be described as an optimisation at all here. I'm
questioning why you've unnecessarily complicated the code by having
the buffer size just big enough to fit the biggest value seen so far,
but arbitrarily aligned to a value that is completely irrelevant to
bttextfastcmp_locale(), rather than using simple geometric expansion,
which is more or less the standard way of managing the growth of a
dynamic array.

You have to grow the array in some way. The basic approach I've
outlined has something to recommend it - why does it make sense to
align the size of the buffer to TEXTBUFLEN in particular though? It's
quite easy to imagine what you've done here resulting in an excessive
number of allocations (and pfree()s), which *could* be expensive. If
you're so conservative about allocating memory, don't grow the array
at quite so aggressive a rate as doubling it each time.

There is a trade-off between space and time to be made here, but I
don't know why you think that the right choice is to use almost the
smallest possible amount of memory in all cases.

>> Another concern is that it seems fairly pointless to have two buffers.
>> Wouldn't it be more sensible to have a single buffer that was
>> partitioned to make two logical, equally-sized buffers, given that in
>> general each buffer is expected to grow at exactly the same rate?
>
> Sure, but it would be making the code more complicated in return for
> no measurable performance benefit.  We generally avoid that.

Fair enough.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: WAL format changes
Следующее
От: Peter Eisentraut
Дата:
Сообщение: transforms