Re: PATCH: pgbench - random sampling of transaction written into log

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: PATCH: pgbench - random sampling of transaction written into log
Дата	26 августа 2012 г. 17:04:41
Msg-id	503A5723.20703@fuzzy.cz обсуждение исходный текст
Ответ на	Re: PATCH: pgbench - random sampling of transaction written into log (Tomas Vondra <tv@fuzzy.cz>)
Ответы	Re: PATCH: pgbench - random sampling of transaction written into log
Список	pgsql-hackers

Дерево обсуждения

On 26.8.2012 02:48, Tomas Vondra wrote:
> On 26.8.2012 00:19, Jeff Janes wrote:
>> On Fri, Aug 24, 2012 at 2:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> Hi,
>>>
>>> attached is a patch that adds support for random sampling in pgbench, when
>>> it's executed with "-l" flag. You can do for example this:
>>>
>>>   $ pgbench -l -T 120 -R 1 db
>>>
>>> and then only 1% of transactions will be written into the log file. If you
>>> omit the tag, all the transactions are written (i.e. it's backward
>>> compatible).
>>
>> Hi Tomas,
>>
>> You use the rand() function.  Isn't that function not thread-safe?
>> Or, if it is thread-safe, does it accomplish that with a mutex?  That
>> was a problem with a different rand function used in pgbench that
>> Robert Haas fixed a while ago, 4af43ee3f165c8e4b332a7e680.
>
> Hi Jeff,
>
> Aha! Good catch. I've used rand() which seems to be neither reentrant or
> thread-safe (unless the man page is incorrect). Anyway, using pg_erand48
> or getrand seems like an appropriate fix.
>
>> Also, what benefit is had by using modulus on rand(), rather than just
>> modulus on an incrementing counter?
>
> Hmm, I was thinking about that too, but I wasn't really sure how would
> that behave with multiple SQL files etc. But now I see the files are
> actually chosen randomly, so using a counter seems like a good idea.

Attached is an improved patch, with a call to rand() replaced with
getrand().

I was thinking about the counter but I'm not really sure how to handle
cases like "39%" - I'm not sure a plain (counter % 100 < 37) is not a
good sampling, because it always keeps continuous sequences of
transactions. Maybe there's a clever way to use a counter, but let's
stick to a getrand() unless we can prove is't causing issues. Especially
considering that a lot of data won't be be written at all with low
sampling rates.

kind regards
Tomas

Вложения

pgbench-sampling-v2.diff

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: PATCH: pgbench - random sampling of transaction written into log

Вложения