Re: BUG #16331: segfault in checkpointer with full disk

Поиск
Список
Период
Сортировка
От Julien Rouhaud
Тема Re: BUG #16331: segfault in checkpointer with full disk
Дата
Msg-id 20200401090455.GB82418@nol
обсуждение исходный текст
Ответ на BUG #16331: segfault in checkpointer with full disk  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #16331: segfault in checkpointer with full disk  (Jozef Mlich <jmlich83@gmail.com>)
Список pgsql-bugs
Hi,

On Wed, Apr 01, 2020 at 08:51:56AM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      16331
> Logged by:          Jozef Mlich
> Email address:      jmlich83@gmail.com
> PostgreSQL version: 12.2
> Operating system:   CentOS
> Description:        
> 
> I can see segfaults on CentOS 7 with postgresql 12.2-2PGDG.rhel7 (from
> yum.postgresql.org). I am using multiple extensions  (cstore, postgres_fdw,
> pgcrypto,dblink, etc.). It seems crash is related to disk run out of space
> (I am using separate partion for / and for /var/lib/pgsql). It occurs few
> times a day. According to backtrace it seems to be related to checkpointer.
> Replication is not configured. 
> 
> 
> [New LWP 26290]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `postgres: checkpointer                               
>  '.
> Program terminated with signal 6, Aborted.
> #0  0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> 55      return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> 
> Thread 1 (Thread 0x7fe462e148c0 (LWP 26290)):
> #0  0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
>         resultvar = 0
>         pid = 26290
>         selftid = 26290
> #1  0x00007fe4604c28f8 in __GI_abort () at abort.c:90
>         save_stage = 2
>         act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0},
> sa_mask = {__val = {0, 0, 0, 0, 0, 9268713, 70403103920717,
> 39808819211026438, 20126216749056, 70394513997832, 9268713, 70403103920719,
> 17316096998686159616, 20134806683648, 140618848608704, 140618848592800}},
> sa_flags = 1615828275, sa_restorer = 0x0}
>         sigs = {__val = {32, 0 <repeats 15 times>}}
> #2  0x000000000087840a in errfinish (dummy=<optimized out>) at elog.c:552
>         edata = 0xd47040 <errordata>
>         elevel = 22
>         oldcontext = 0x171a6d0
>         econtext = 0x0
>         __func__ = "errfinish"
> #3  0x0000000000706b24 in CheckPointReplicationOrigin () at origin.c:562
>         tmppath = 0x9e6fa8 "pg_logical/replorigin_checkpoint.tmp"
>         path = 0x9e6fd0 "pg_logical/replorigin_checkpoint"
>         tmpfd = <optimized out>
>         i = <optimized out>
>         magic = 307747550
>         crc = 4294967295
>         __func__ = "CheckPointReplicationOrigin"


That's not a bug (nor a segfault) but the expected behavior if the checkpointer
is not able to do its work.  As data durability can't be guaranteed in such
case, the checkpointer raises a PANIC level message, which raises an abort so
that the whole instance do an emergency restart cycle.

Do you have monitoring for this filesystem?  Do you see spikes in disk usage or
other strange behavior?



В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #16331: segfault in checkpointer with full disk
Следующее
От: Jozef Mlich
Дата:
Сообщение: Re: BUG #16331: segfault in checkpointer with full disk