Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
От | David Rowley |
---|---|
Тема | Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records |
Дата | |
Msg-id | CAApHDvrDg2rJ-sqa7c=wPoHeEGrox46sQ=CFj=FkXqBx26dr0A@mail.gmail.com обсуждение исходный текст |
Ответ на | Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records (Dmitriy Kuzmin <kuzmin.db4@gmail.com>) |
Ответы |
Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
|
Список | pgsql-bugs |
On Mon, 5 Sept 2022 at 22:38, Dmitriy Kuzmin <kuzmin.db4@gmail.com> wrote: > One of our clients experienced a crash of startup process with an error "invalid memory alloc request size 1073741824"on a hot standby, which ended in replica reinit. > > According to logs, startup process crashed while trying to replay "Standby/LOCK" record with a huge list of locks(see attachedreplicalog_tail.tgz): > > FATAL: XX000: invalid memory alloc request size 1073741824 > CONTEXT: WAL redo at 7/327F9248 for Standby/LOCK: xid 1638575 db 7550635 rel 8500880 xid 1638575 db 7550635 rel 10324499... > LOCATION: repalloc, mcxt.c:1075 > BACKTRACE: > postgres: startup recovering 000000010000000700000033(repalloc+0x61) [0x8d7611] > postgres: startup recovering 000000010000000700000033() [0x691c29] > postgres: startup recovering 000000010000000700000033() [0x691c74] > postgres: startup recovering 000000010000000700000033(lappend+0x16) [0x691e76] This must be the repalloc() in enlarge_list(). 1073741824 / 8 is 134,217,728 (2^27). That's quite a bit more than 1 lock per your 950k tables. I wonder why the RecoveryLockListsEntry.locks list is getting so long. from the file you attached, I see: $ cat replicalog_tail | grep -Eio "rel\s([0-9]+)" | wc -l 950000 So that confirms there were 950k relations in the xl_standby_locks. The contents of that message seem to be produced by standby_desc(). That should be the same WAL record that's processed by standby_redo() which adds the 950k locks to the RecoveryLockListsEntry. I'm not seeing why 950k becomes 134m. David
В списке pgsql-bugs по дате отправления: