Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption
От | TAKATSUKA Haruka |
---|---|
Тема | Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption |
Дата | |
Msg-id | 20191220101952.1e07a9d6113896d3be1a31ea@sraoss.co.jp обсуждение исходный текст |
Ответ на | BUG #16172: failure of vacuum file truncation can cause permanent data corruption (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption
|
Список | pgsql-bugs |
I also tested PostgreSQL with the attached patch avoided this data corruption. The patch just removes DropRelFileNodeBuffers() from smgrtruncate(). On Thu, 19 Dec 2019 07:14:42 +0000 PG Bug reporting form <noreply@postgresql.org> wrote: > The following bug has been logged on the website: > > Bug reference: 16172 > Logged by: TAKATSUKA Haruka > Email address: harukat@sraoss.co.jp > PostgreSQL version: 12.1 > Operating system: Windows/Linux > Description: > > Hello, pgsql hackers, > > I found that failure of vacuum file truncation can cause permanent data > corruption. > I am reporting the reproduce steps below. > > In Windows installation, the truncation sometime fails by permission > denied error because of anti-virus software. It has caused just ERROR > and people have offen dismissed it. > > Truncation failure can also make the standby panic with the following > messages when replaying Heap2/VISIBLE or Heap2/CLEAN, because truncation > wal is emitted even if it doesn't complete actually in the primary. > > WARNING: page .. of relation base/..../.... does not exist > CONTEXT: WAL redo at ..... for ....: cutoff xid ... flags ... > PANIC: WAL contains references to invalid pages > > I think truncation failure is to be handled as more severe level. > Any thoughts? > > with best regards, > Haruka Takatsuka / SRA OSS, Inc. Japan > > > reproduce steps (PG12) > ====================== > > $ psql -U postgres -d db1 > Pager usage is off. > psql (12.1) > Type "help" for help. > > db1=# > > $ gdb -p {its backend process} > > (gdb) b FileTruncate > Breakpoint 1 at 0x73d320: file fd.c, line 2057. > (gdb) c > Continuing. > > db1=# SHOW autovacuum; > autovacuum > ------------ > off > (1 row) > > db1=# CREATE TABLE t1 (id int primary key, v text); > CREATE > > db1=# INSERT INTO t1 SELECT g, md5(g::text) FROM generate_series(1, 10000) > as g; > INSERT 0 10000 > > db1=# CHECKPOINT; > > Program received signal SIGUSR1, User defined signal 1. > 0x00000036caae91a3 in __epoll_wait_nocancel () from /lib64/libc.so.6 > (gdb) c > Continuing. > > CHECKPOINT > > db1=# DELETE FROM t1 WHERE id > 50; > DELETE 9950 > > db1=# VACUUM t1; > > Breakpoint 1, FileTruncate (file=59, offset=8192, > wait_event_info=167772175) > at fd.c:2057 > 2057 { > (gdb) n > 2065 returnCode = FileAccess(file); > (gdb) n > 2066 if (returnCode < 0) > (gdb) p returnCode = -100 > $6 = -100 > (gdb) c > Continuing. > > ERROR: could not truncate file "base/16384/16645" to 1 blocks: Success > > db1=# SELECT count(*) FROM t1; > count > ------- > 9930 > (1 row) > (snip)
Вложения
В списке pgsql-bugs по дате отправления: