Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
От | Josh Berkus |
---|---|
Тема | Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows |
Дата | |
Msg-id | 55C3B222.1060301@agliodbs.com обсуждение исходный текст |
Ответы |
Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows |
Список | pgsql-hackers |
Version: 9.5alpha2 Issue: when requesting small samples, SYSTEM often returns zero rows, and sometimes returns unexpected numbers of rows. Example: create table thous ( id int, val text ); insert into thous select i, i::text || '-val' from generate_series(1,100000) as gs(i); analyze; This is a 100,000 row table, so a 0.01% sample should be 10 rows. Since 10 rows is far less than what's on one data page, the documentation suggests that I should get 1 data page worth of rows. However: postgres=# select * from thous tablesample system ( 0.01 );id | val ----+----- (0 rows) Time: 0.636 ms ... this query consistently returns 0 rows, in 20 runs. Ok,let's try a million-row table, which is the example we have in the docs. postgres=# select * from mil tablesample system ( 0.01 );id | val ----+----- (0 rows) Hmmm? On testing, the query against the million-row table returns 0 rows around 50% of the time. This table has around 185 rows per page. As the sample size goes up, the number times I get zero rows goes down, but those results seem to still include data pages with zero rows. For example, here's a series of results from a 0.04 sample against the million-row table. 370 370 370 555 555 185 0 925 Since this is a synthetic table I just generated, it contains almost exactly 185 rows per data page for every data page. So on a 0.04% sample, the variation between 370 rows and 555 rows (whether we have 2 or 3 data pages) is expected, since 0.04% of 5406 data pages is 2.16 pages. The results of 0, 185 and 925 are not. It really seems like SYSTEM is treating 0.04% as a maximum, but taking a random number of data pages somewhere around that maximum, using math which can choose numbers of pages far outside of the % requested by the user, and which includes 0. Speaking from a user perspective, SYSTEM seems broken to me. I can't imagine using it for anything with a that degree of variation in the number of results returned, especially if it's possible to return zero rows from a populated table. BERNOULLI works as expected. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
В списке pgsql-hackers по дате отправления: