Re: tablesample performance
От | Simon Riggs |
---|---|
Тема | Re: tablesample performance |
Дата | |
Msg-id | CANP8+jJ=Mct7a6jD5iXLO2rTBrKeu+0dBEX5u_kTZ7NGKLRCyg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: tablesample performance (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-general |
On 18 October 2016 at 22:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> On 18 October 2016 at 19:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> If you don't want to have an implicit bias towards earlier blocks, >>> I don't think that either standard tablesample method is really what >>> you want. >>> >>> The contrib/tsm_system_rows tablesample method is a lot closer, in >>> that it will start at a randomly chosen block, but if you just do >>> "tablesample system_rows(1)" then you will always get the first row >>> in whichever block it lands on, so it's still not exactly unbiased. > >> Is there a reason why we can't fix the behaviours of the three methods >> mentioned above by making them all start at a random block and a >> random item between min and max? > > The standard tablesample methods are constrained by other requirements, > such as repeatability. I am not sure that loading this one on top of > that is a good idea. The bias I referred to above is *not* the fault > of the sample methods, rather it's the fault of using "LIMIT 1". Hmm, yeh, that would make it a little too much of a special case. > It does seem like maybe it'd be nice for tsm_system_rows to start at a > randomly chosen entry in the first block it visits, rather than always > dumping that entire block. Then "tablesample system_rows(1)" would > actually give you a pretty random row, and I think we aren't giving up > any useful properties it has now. OK, will patch that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-general по дате отправления: