Re: TABLESAMPLE patch

Поиск

Список

Период

Сортировка

От	Petr Jelinek
Тема	Re: TABLESAMPLE patch
Дата	10 марта 2015 г. 10:05:40
Msg-id	54FEC1EB.9070503@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: TABLESAMPLE patch (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: TABLESAMPLE patch
Список	pgsql-hackers

Дерево обсуждения

On 10/03/15 10:54, Amit Kapila wrote:
> On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>  >
>  > Ok now I think I finally understand what you are suggesting - you are
> saying let's go over whole page while tsmnexttuple returns something,
> and do the visibility check and other stuff in that code block under the
> buffer lock and cache the resulting valid tuples in some array and then
> return those tuples one by one from that cache?
>  >
>
> Yes, this is what I am suggesting.
>
>  >>  > And if the caller will try to do it in one step and cache the
>  >> visibility info then we'll end up with pretty much same structure as
>  >> rs_vistuples - there isn't saner way to cache this info other than
>  >> ordered vector of tuple offsets, unless we assume that most pages have
>  >> close to MaxOffsetNumber of tuples which they don't, so why not just use
>  >> the heapgetpage directly and do the binary search over rs_vistuples.
>  >>  >
>  >>
>  >> The downside of doing it via heapgetpage is that it will do
>  >> visibility test for tuples which we might not even need (I think
>  >> we should do visibility test for tuples retrurned by tsmnexttuple).
>  >>
>  >
>  > Well, heapgetpage can either read visibility data for whole page or
> not, depending on if we want pagemode reading or not. So we can use the
> pagemode for sampling methods where it's feasible (like system) and not
> use pagemode where it's not (like bernoulli) and then either use the
> rs_vistuples or call HeapTupleSatisfiesVisibility individually again
> depending if the method is using pagemode or not.
>  >
>
> Yeah, but as mentioned above, this has some downside, but go
> for it only if you feel that above suggestion is making code complex,
> which I think should not be the case as we are doing something similar
> in acquire_sample_rows().
>

I think your suggestion is actually simpler code wise, I am just 
somewhat worried by the fact that no other scan node uses that kind of 
caching and there is probably reason for that.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: TABLESAMPLE patch