Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
От | Petr Jelinek |
---|---|
Тема | Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows |
Дата | |
Msg-id | 55C3C207.8040608@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows (Josh Berkus <josh@agliodbs.com>) |
Список | pgsql-hackers |
On 2015-08-06 22:17, Josh Berkus wrote: > On 08/06/2015 01:14 PM, Josh Berkus wrote: >> On 08/06/2015 01:10 PM, Simon Riggs wrote: >>> Given, user-stated probability of accessing a block of P and N total >>> blocks, there are a few ways to implement block sampling. >>> >>> 1. Test P for each block individually. This gives a range of possible >>> results, with 0 blocks being possible outcome, though decreasing in >>> probability as P increases for fixed N. This is the same way BERNOULLI >>> works, we just do it for blocks rather than rows. >>> >>> 2. We calculate P/N at start of scan and deliver this number blocks by >>> random selection from N available blocks. >>> >>> At present we do (1), exactly as documented. (2) is slightly harder >>> since we'd need to track which blocks have been selected already so we >>> can use a random selection with no replacement algorithm. On a table >>> with uneven distribution of rows this would still return a variable >>> sample size, so it didn't seem worth changing. >> >> Aha, thanks! >> >> So, seems like this is just a doc issue? That is, we just need to >> document that using SYSTEM on very small sample sizes may return >> unexpected numbers of results ... and maybe also how the algorithm >> actually works. > Yes, it's expected behavior on very small sample size so doc patch seems best fix. > Following up on this ... where is TABLESAMPLE documented other than in > the SELECT command? Doc search on the website is having issues right > now. I'm happy to write a doc patch. > The user documentation is only in SELECT page, the rest is API docs. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: