Re: PATCH: pgbench - random sampling of transaction written into log
От | Tomas Vondra |
---|---|
Тема | Re: PATCH: pgbench - random sampling of transaction written into log |
Дата | |
Msg-id | 503A5723.20703@fuzzy.cz обсуждение исходный текст |
Ответ на | Re: PATCH: pgbench - random sampling of transaction written into log (Tomas Vondra <tv@fuzzy.cz>) |
Ответы |
Re: PATCH: pgbench - random sampling of transaction written
into log
|
Список | pgsql-hackers |
On 26.8.2012 02:48, Tomas Vondra wrote: > On 26.8.2012 00:19, Jeff Janes wrote: >> On Fri, Aug 24, 2012 at 2:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >>> Hi, >>> >>> attached is a patch that adds support for random sampling in pgbench, when >>> it's executed with "-l" flag. You can do for example this: >>> >>> $ pgbench -l -T 120 -R 1 db >>> >>> and then only 1% of transactions will be written into the log file. If you >>> omit the tag, all the transactions are written (i.e. it's backward >>> compatible). >> >> Hi Tomas, >> >> You use the rand() function. Isn't that function not thread-safe? >> Or, if it is thread-safe, does it accomplish that with a mutex? That >> was a problem with a different rand function used in pgbench that >> Robert Haas fixed a while ago, 4af43ee3f165c8e4b332a7e680. > > Hi Jeff, > > Aha! Good catch. I've used rand() which seems to be neither reentrant or > thread-safe (unless the man page is incorrect). Anyway, using pg_erand48 > or getrand seems like an appropriate fix. > >> Also, what benefit is had by using modulus on rand(), rather than just >> modulus on an incrementing counter? > > Hmm, I was thinking about that too, but I wasn't really sure how would > that behave with multiple SQL files etc. But now I see the files are > actually chosen randomly, so using a counter seems like a good idea. Attached is an improved patch, with a call to rand() replaced with getrand(). I was thinking about the counter but I'm not really sure how to handle cases like "39%" - I'm not sure a plain (counter % 100 < 37) is not a good sampling, because it always keeps continuous sequences of transactions. Maybe there's a clever way to use a counter, but let's stick to a getrand() unless we can prove is't causing issues. Especially considering that a lot of data won't be be written at all with low sampling rates. kind regards Tomas
Вложения
В списке pgsql-hackers по дате отправления: