Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
От | Heikki Linnakangas |
---|---|
Тема | Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" |
Дата | |
Msg-id | ac119d1e-05d1-f050-b92a-0a524d68b848@iki.fi обсуждение исходный текст |
Ответ на | BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
|
Список | pgsql-bugs |
On 18/06/2021 18:00, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 17064 > Logged by: Alexander Lakhin > Email address: exclusion@gmail.com > PostgreSQL version: 14beta1 > Operating system: Ubuntu 20.04 > Description: > > The following script: > === > for i in `seq 100`; do > createdb db$i > done > > # Based on the contents of the regression test "vacuum" > echo " > CREATE TABLE pvactst (i INT); > INSERT INTO pvactst SELECT i FROM generate_series(1,10000) i; > DELETE FROM pvactst; > VACUUM pvactst; > DROP TABLE pvactst; > > VACUUM FULL pg_database; > " >/tmp/vacuum.sql > > for n in `seq 10`; do > echo "iteration $n" > for i in `seq 100`; do > ( { for f in `seq 100`; do cat /tmp/vacuum.sql; done } | psql -d db$i ) >> psql-$i.log 2>&1 & > done > wait > grep -C5 FATAL psql*.log && break; > done > === > detects sporadic FATAL errors: > iteration 1 > psql-56.log-DROP TABLE > psql-56.log-VACUUM > psql-56.log-CREATE TABLE > psql-56.log-INSERT 0 10000 > psql-56.log-DELETE 10000 > psql-56.log:FATAL: relation mapping file "global/pg_filenode.map" contains > incorrect checksum > psql-56.log-server closed the connection unexpectedly > psql-56.log- This probably means the server terminated abnormally > psql-56.log- before or while processing the request. > psql-56.log-connection to server was lost Hmm, the simplest explanation would be that the read() or write() on the relmapper file is not atomic. We assume that it is, and don't use a lock in load_relmap_file() because of that. Is there anything unusual about the filesystem, mount options or the kernel you're using? I could not reproduce this on my laptop. Does the attached patch fix it for you? If that's the cause, it is easy to fix by taking the RelationMappingLock in load_relmap_file(), like in the attached patch. But if the write is not atomic, you might have a bigger problem: we also rely on the atomicity when writing the pg_control file. If that becomes corrupt because of a partial write, the server won't start up. If it's just a race condition between the read/write, or only the read() is not atomic, maybe pg_control is OK, but I'd like to understand the issue better before just adding a lock to load_relmap_file(). - Heikki
Вложения
В списке pgsql-bugs по дате отправления: