Re: Abbreviated keys for Numeric

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: Abbreviated keys for Numeric
Дата	21 февраля 2015 г. 18:57:35
Msg-id	CAM3SWZR0eWv9zix4pAnTFP3=4ajyoXio5txcGUsZV9W+10X0Cg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Abbreviated keys for Numeric (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: Abbreviated keys for Numeric Re: Abbreviated keys for Numeric
Список	pgsql-hackers

Дерево обсуждения

On Fri, Feb 20, 2015 at 9:18 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> The gains for text are also very nice, although in this case that only
> happens for the smallest scale (1M rows), and for larger scales it's
> actually slower than current master :-(

That's odd. I have a hard time thinking of why the datum sort patch
could be at fault, though. I bet the cost model of the text
sortsupport routine is somehow hitting a snag on those larger sized
sets. They should be just as accelerated, and probably more so, than
your 1M sized set that was sped up 4x here.

Can you see what is output with debugging of text abbreviation turned
on? Put "#define DEBUG_ABBREV_KEYS" at the top of varlena.c and
rebuild. Report on the debug1 output, and see if and when abbreviation
is aborted.

I suspected that the cost model was too conservative (or, more
lightly, just too simplistic). I ought to revisit my patch to give the
ad-hoc cost model a sense of proportion about how far along we are,
which was previously deferred [1]. When there is a strong
physical/logical correlation, that can be essential.

Did you first index the text field, and then run CLUSTER for the
larger sized sets on that index (to test abbreviation)? That would
cause there to be a lot of abbreviated keys that seemed to poorly
capture the entropy of their underlying values, when in fact that was
entirely down to our only considering the first 10 tuples in a 100
million tuple set. Having some patience is important there, and a hint
at how far in we are gives the ad-hoc cost model a much better sense
of proportion...it then has a sense of how patient it should be.

[1] http://www.postgresql.org/message-id/CA+TgmoaSXpD73cOj-vSFRfk0nmxjAN6WOQ_Hd9SkmZbOTi+6CQ@mail.gmail.com
--
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Abbreviated keys for Numeric