Re: TABLESAMPLE patch
| От | Petr Jelinek |
|---|---|
| Тема | Re: TABLESAMPLE patch |
| Дата | |
| Msg-id | 54FEC1EB.9070503@2ndquadrant.com обсуждение исходный текст |
| Ответ на | Re: TABLESAMPLE patch (Amit Kapila <amit.kapila16@gmail.com>) |
| Ответы |
Re: TABLESAMPLE patch
|
| Список | pgsql-hackers |
On 10/03/15 10:54, Amit Kapila wrote: > On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr@2ndquadrant.com > <mailto:petr@2ndquadrant.com>> wrote: > > > > Ok now I think I finally understand what you are suggesting - you are > saying let's go over whole page while tsmnexttuple returns something, > and do the visibility check and other stuff in that code block under the > buffer lock and cache the resulting valid tuples in some array and then > return those tuples one by one from that cache? > > > > Yes, this is what I am suggesting. > > >> > And if the caller will try to do it in one step and cache the > >> visibility info then we'll end up with pretty much same structure as > >> rs_vistuples - there isn't saner way to cache this info other than > >> ordered vector of tuple offsets, unless we assume that most pages have > >> close to MaxOffsetNumber of tuples which they don't, so why not just use > >> the heapgetpage directly and do the binary search over rs_vistuples. > >> > > >> > >> The downside of doing it via heapgetpage is that it will do > >> visibility test for tuples which we might not even need (I think > >> we should do visibility test for tuples retrurned by tsmnexttuple). > >> > > > > Well, heapgetpage can either read visibility data for whole page or > not, depending on if we want pagemode reading or not. So we can use the > pagemode for sampling methods where it's feasible (like system) and not > use pagemode where it's not (like bernoulli) and then either use the > rs_vistuples or call HeapTupleSatisfiesVisibility individually again > depending if the method is using pagemode or not. > > > > Yeah, but as mentioned above, this has some downside, but go > for it only if you feel that above suggestion is making code complex, > which I think should not be the case as we are doing something similar > in acquire_sample_rows(). > I think your suggestion is actually simpler code wise, I am just somewhat worried by the fact that no other scan node uses that kind of caching and there is probably reason for that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: