Re: Abbreviated keys for Numeric
От | Peter Geoghegan |
---|---|
Тема | Re: Abbreviated keys for Numeric |
Дата | |
Msg-id | CAM3SWZR0eWv9zix4pAnTFP3=4ajyoXio5txcGUsZV9W+10X0Cg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Abbreviated keys for Numeric (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: Abbreviated keys for Numeric
Re: Abbreviated keys for Numeric |
Список | pgsql-hackers |
On Fri, Feb 20, 2015 at 9:18 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > The gains for text are also very nice, although in this case that only > happens for the smallest scale (1M rows), and for larger scales it's > actually slower than current master :-( That's odd. I have a hard time thinking of why the datum sort patch could be at fault, though. I bet the cost model of the text sortsupport routine is somehow hitting a snag on those larger sized sets. They should be just as accelerated, and probably more so, than your 1M sized set that was sped up 4x here. Can you see what is output with debugging of text abbreviation turned on? Put "#define DEBUG_ABBREV_KEYS" at the top of varlena.c and rebuild. Report on the debug1 output, and see if and when abbreviation is aborted. I suspected that the cost model was too conservative (or, more lightly, just too simplistic). I ought to revisit my patch to give the ad-hoc cost model a sense of proportion about how far along we are, which was previously deferred [1]. When there is a strong physical/logical correlation, that can be essential. Did you first index the text field, and then run CLUSTER for the larger sized sets on that index (to test abbreviation)? That would cause there to be a lot of abbreviated keys that seemed to poorly capture the entropy of their underlying values, when in fact that was entirely down to our only considering the first 10 tuples in a 100 million tuple set. Having some patience is important there, and a hint at how far in we are gives the ad-hoc cost model a much better sense of proportion...it then has a sense of how patient it should be. [1] http://www.postgresql.org/message-id/CA+TgmoaSXpD73cOj-vSFRfk0nmxjAN6WOQ_Hd9SkmZbOTi+6CQ@mail.gmail.com -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: