Обсуждение: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault
BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault
От
harukat@sraoss.co.jp
Дата:
The following bug has been logged on the website:
Bug reference: 8397
Logged by: TAKATSUKA Haruka
Email address: harukat@sraoss.co.jp
PostgreSQL version: 9.2.4
Operating system: Linux (CentOS6)
Description:
Hi.
I report a small bug.
pg_basebackup -x from new standby server sometimes causes Segmentation
fault.
(1) create new standby server dir by pg_basebackup without -x
(2) start new standby server
(3) pg_basebackup from new standby server with -x
(!) when new standby has no WAL files in pg_xlog,
new standby's wal sender crash
new standby server's core file:
Core was generated by `postgres: wal sender process postgres ::1(55210)
sending backup "pg_basebackup'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-4.el6.x86_64
zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
#1 0x0000003b73675990 in _IO_str_init_static_internal () from
/lib64/libc.so.6
#2 0x0000003b73669935 in vsscanf () from /lib64/libc.so.6
#3 0x0000003b736639a8 in sscanf () from /lib64/libc.so.6
#4 0x0000000000622351 in perform_base_backup (opt=0x7fffc2e22300,
tblspcdir=0xd424c0) at basebackup.c:304
#5 0x0000000000622c50 in SendBaseBackup (cmd=<value optimized out>)
at basebackup.c:558
#6 0x000000000061f5b0 in HandleReplicationCommand () at walsender.c:482
#7 WalSndHandshake () at walsender.c:257
#8 WalSenderMain () at walsender.c:181
#9 0x0000000000650b12 in PostgresMain (argc=1, argv=<value optimized out>,
dbname=0xc82a90 "", username=0xc82a70 "postgres") at postgres.c:3715
#10 0x000000000060c4f1 in BackendRun () at postmaster.c:3614
#11 BackendStartup () at postmaster.c:3304
#12 ServerLoop () at postmaster.c:1367
#13 0x000000000060f031 in PostmasterMain (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1127
#14 0x00000000005ae140 in main (argc=5, argv=0xc80bb0) at main.c:199
./backend/replication/basebackup.c:304
XLogFromFileName(walFiles[0], &tli, &logid, &logseg);
In this case, nWalFiles = 0 and walFiles[] palloced zero size.
Though pg_basebackup does not have to work in this rare case,
we should insert something like "if (nWalFiles <= 0) ereport(...);".
regards,
Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault
От
Magnus Hagander
Дата:
On Sat, Aug 24, 2013 at 1:46 PM, <harukat@sraoss.co.jp> wrote: > The following bug has been logged on the website: > > Bug reference: 8397 > Logged by: TAKATSUKA Haruka > Email address: harukat@sraoss.co.jp > PostgreSQL version: 9.2.4 > Operating system: Linux (CentOS6) > Description: > > Hi. > > > I report a small bug. > pg_basebackup -x from new standby server sometimes causes Segmentation > fault. > > > (1) create new standby server dir by pg_basebackup without -x > (2) start new standby server > (3) pg_basebackup from new standby server with -x > (!) when new standby has no WAL files in pg_xlog, > new standby's wal sender crash > > > new standby server's core file: > > > Core was generated by `postgres: wal sender process postgres ::1(55210) > sending backup "pg_basebackup'. > Program terminated with signal 11, Segmentation fault. > #0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-4.el6.x86_64 > zlib-1.2.3-27.el6.x86_64 > (gdb) bt > #0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6 > #1 0x0000003b73675990 in _IO_str_init_static_internal () from > /lib64/libc.so.6 > #2 0x0000003b73669935 in vsscanf () from /lib64/libc.so.6 > #3 0x0000003b736639a8 in sscanf () from /lib64/libc.so.6 > #4 0x0000000000622351 in perform_base_backup (opt=0x7fffc2e22300, > tblspcdir=0xd424c0) at basebackup.c:304 > #5 0x0000000000622c50 in SendBaseBackup (cmd=<value optimized out>) > at basebackup.c:558 > #6 0x000000000061f5b0 in HandleReplicationCommand () at walsender.c:482 > #7 WalSndHandshake () at walsender.c:257 > #8 WalSenderMain () at walsender.c:181 > #9 0x0000000000650b12 in PostgresMain (argc=1, argv=<value optimized out>, > dbname=0xc82a90 "", username=0xc82a70 "postgres") at postgres.c:3715 > #10 0x000000000060c4f1 in BackendRun () at postmaster.c:3614 > #11 BackendStartup () at postmaster.c:3304 > #12 ServerLoop () at postmaster.c:1367 > #13 0x000000000060f031 in PostmasterMain (argc=<value optimized out>, > argv=<value optimized out>) at postmaster.c:1127 > #14 0x00000000005ae140 in main (argc=5, argv=0xc80bb0) at main.c:199 > > > > > ./backend/replication/basebackup.c:304 > XLogFromFileName(walFiles[0], &tli, &logid, &logseg); > > > In this case, nWalFiles = 0 and walFiles[] palloced zero size. > > > Though pg_basebackup does not have to work in this rare case, > we should insert something like "if (nWalFiles <= 0) ereport(...);". Yes, we definitely need better error checking there - a crash is never the right answer. Does this happen only when you take a backup "really quickly" after setting up the new standby, or is there some scenario further in it's lifetime when it can happen? In the first case, throwing a hard error seems quite reasonable, but if it's repeatable, perhaps there is something better we can do? Also, while we definitely need a sanity check at this point, might it be worth it to put a second check earlier in the process as well - since AFAICT this error gets thrown only after all the data has been sent arlready. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault
От
TAKATSUKA Haruka
Дата:
Thanks for the response. On Sat, 24 Aug 2013 17:04:21 +0200 Magnus Hagander <magnus@hagander.net> wrote: > > (1) create new standby server dir by pg_basebackup without -x > > (2) start new standby server > > (3) pg_basebackup from new standby server with -x > > (!) when new standby has no WAL files in pg_xlog, > > new standby's wal sender crash (snip) > > Though pg_basebackup does not have to work in this rare case, > > we should insert something like "if (nWalFiles <= 0) ereport(...);". > > Yes, we definitely need better error checking there - a crash is never > the right answer. > > Does this happen only when you take a backup "really quickly" after > setting up the new standby, It's just this first case. Therefore, we recognize that it is the problem of how to use. regards, > or is there some scenario further in it's > lifetime when it can happen? In the first case, throwing a hard error > seems quite reasonable, but if it's repeatable, perhaps there is > something better we can do? > > Also, while we definitely need a sanity check at this point, might it > be worth it to put a second check earlier in the process as well - > since AFAICT this error gets thrown only after all the data has been > sent arlready. > > -- > Magnus Hagander > Me: http://www.hagander.net/ > Work: http://www.redpill-linpro.com/ ______________________________________________________ harukat@sraoss.co.jp (SRA OSS, Inc. http://www.sraoss.co.jp)
Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault
От
Magnus Hagander
Дата:
On Sun, Aug 25, 2013 at 9:05 AM, TAKATSUKA Haruka <harukat@sraoss.co.jp> wrote: > Thanks for the response. > > On Sat, 24 Aug 2013 17:04:21 +0200 > Magnus Hagander <magnus@hagander.net> wrote: > >> > (1) create new standby server dir by pg_basebackup without -x >> > (2) start new standby server >> > (3) pg_basebackup from new standby server with -x >> > (!) when new standby has no WAL files in pg_xlog, >> > new standby's wal sender crash > (snip) >> > Though pg_basebackup does not have to work in this rare case, >> > we should insert something like "if (nWalFiles <= 0) ereport(...);". >> >> Yes, we definitely need better error checking there - a crash is never >> the right answer. >> >> Does this happen only when you take a backup "really quickly" after >> setting up the new standby, > > It's just this first case. > Therefore, we recognize that it is the problem of how to use. Yeah. Ok, for now I have the patch I applied yesterday that makes it an error instead of a crash per your suggestion. And if I failed to mention it, thanks for the report! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/