Re: TABLESAMPLE patch
От | Petr Jelinek |
---|---|
Тема | Re: TABLESAMPLE patch |
Дата | |
Msg-id | 54B18493.5030303@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: TABLESAMPLE patch (Michael Paquier <michael.paquier@gmail.com>) |
Ответы |
Re: TABLESAMPLE patch
|
Список | pgsql-hackers |
On 09/01/15 09:27, Michael Paquier wrote: > > Some comments about the 1st patch: > 1) Nitpicking: > + * Block sampling routines shared by ANALYZE and TABLESAMPLE. > TABLESAMPLE is not added yet. You may as well mention simplify this > description with something like "Sampling routines for relation > blocks". > Changed. > 2) file_fdw and postgres_fdw do not compile correctly as they still > use anl_random_fract. This function is still mentioned in vacuum.h as > well. Gah, didn't notice this, fixed. And also since the Vitter's reservoir sampling methods are used by other component than just analyze, I moved those to sampling.c/h. > > 3) Not really an issue of this patch, but I'd think that this comment > should be reworked: > + * BlockSampler is used for stage one of our new two-stage tuple > + * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject > + * "Large DB"). It selects a random sample of samplesize blocks out of > + * the nblocks blocks in the table. If the table has less than > + * samplesize blocks, all blocks are selected. > I changed the wording slightly, it's still not too great though. > 4) As a refactoring patch, why is the function providing a random > value changed? Shouldn't sample_random_fract be consistent with > anl_random_fract? > Yeah I needed that for TABLESAMPLE and it should not really affect the randomness but you are correct it should be part of second patch of the patch-set. > 5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and > RAND48_SEED_2 instead of being hardcoded? > Regards, > Removed this part from the first patch as it's not needed there anymore. In second patch which implements the TABLESAMPLE itself I changed the implementation of random generator because when I looked at the code again I realized the old one would produce wrong results if there were multiple TABLESAMPLE statements in same query or multiple cursors in same transaction. In addition to the above changes I added test for cursors and test for the issue with random generator I mentioned above. Also fixed some typos in comments and function name. And finally I added note to docs saying that same REPEATABLE might produce different results in subsequent queries. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: