Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Дата
Msg-id 547632CD.9080609@fuzzy.cz
обсуждение исходный текст
Ответ на Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)  (Maxim Boguk <maxim.boguk@gmail.com>)
Список pgsql-bugs
On 26.11.2014 19:48, Maxim Boguk wrote:
>
>
> On Wed, Nov 26, 2014 at 9:08 PM, Maxim Boguk <maxim.boguk@gmail.com
> <mailto:maxim.boguk@gmail.com>> wrote:
>
>
> On one of my servers stat collector went crasy on idle system. It
> constantly write around 50MB/s on idle database (no activity at
> all).
>
> OK there are results of some low-level gdb kung-fu:
>
> problem point is:
>             /*
>              * Write the stats file if a new request has arrived that is not
>              * satisfied by existing file.
>              */
>             if (last_statwrite < last_statrequest)
>                 pgstat_write_statsfile(false);
>
> on some loop iteration difference between last_statrequest and
> last_statwrite is around 23000-29000 (e.g. 20-30ms).
>
> Hm may be there some clock drift between CPU cores?

Maybe, although I'd expect much smaller difference in such case. 20-30ms
is a bit too much, IMHO. Which clock source are you using?

cat /sys/devices/system/clocksource/clocksource0/available_clocksource

Do you have NTP on the machine? Is it working correctly?

There's code in pgstat.c that should handle such clock skew cases,
around line 3560:

https://github.com/postgres/postgres/blob/REL9_2_STABLE/src/backend/postmaster/pgstat.c#L3564

Do you get those messages in the log?

Can you perform more gdb-fu? Are the last_statwrite / last_statrequest
timestamps advance "equally"?

Do you know what processes are sending the requests to write the file?

The only source of 'write statfile' requests I can think of at the
moment is the autovacuum launcher (who needs stats for all databases).
Let's assume there really is ~30ms clock skew between the postmaster and
the launcher. If I read the code correctly, then:

 1) the launcher accepts only stats not older than 10ms

 2) the timestamp is read from the file, and is representing timestamp
    when writing the file started

 3) if the clock skew is >10ms, or if writing the file takes >10ms (or
    combination of those two times), a request is sent

 4) this is done repeatedly for 10 seconds (requests 10ms apart)

Maybe the stats file grew a bit, and now the difference got slightly
over 10ms?

The other thing that might happen is inability to write the new file.
But that's inconsistent with small file size (I've seen that e.g. for
files exceeding the tmpfs size limit, but if you only have ~5MB
pgstat.stat file that's unlikely).

Also, it would cause "could not write temporary statistics" in the log,
and the difference between the timestamps would grow (while you're
saying it's ~30ms).

regards
Tomas

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Maxim Boguk
Дата:
Сообщение: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Следующее
От: Maxim Boguk
Дата:
Сообщение: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)