Re: START_REPLICATION SLOT causing a crash in an assert build
От | Andres Freund |
---|---|
Тема | Re: START_REPLICATION SLOT causing a crash in an assert build |
Дата | |
Msg-id | 20221008025633.5njv7ufypkggsbuf@awork3.anarazel.de обсуждение исходный текст |
Ответ на | Re: START_REPLICATION SLOT causing a crash in an assert build (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: START_REPLICATION SLOT causing a crash in an assert build
|
Список | pgsql-hackers |
Hi, On 2022-10-07 12:00:56 -0700, Andres Freund wrote: > On 2022-10-07 15:30:43 +0900, Kyotaro Horiguchi wrote: > > The key point of this is this: > > > > + * XXX: I think there cannot actually be data from an older slot > > + * here. After a crash we throw away the old stats data and if a slot is > > + * dropped while shut down, we'll not load the slot data at startup. > > > > I think this is true. Assuming that we don't recreate or rename > > objects that have stats after writing out stats, we won't have stats > > for a different object with the same name. > > Thanks for thinking through this! > > If we can rely on that fact, the existing check in pgstat_acquire_replslot() > > becomes useless. Thus we don't need to store object name in stats entry. I > > agree to that. > > I don't agree with this aspect. I think it's better to ensure that the stats > object exists when acquiring a slot, rather than later, when reporting. It's a > lot simpler to think through the lock nesting etc that way. > > I'd also eventually like to remove the stats that are currently kept > separately in ReorderBuffer, and that will be easier/cheaper if we can rely on > the stats objects to have already been created. Here's a cleaned up version of my earlier prototype. - I wrapped the access to ReplicationSlotCtl->replication_slots[i].data.name in a new function bool ReplicationSlotName(index, name). I'm not entirely happy with that name, as it sounds like a more general accessor than it is - I toyed with ReplicationSlotNameForIndex(), but that seemed somewhat unnecessary, I don't forsee a need for another name accessor. Anyone wants to weigh in? - Substantial comment and commit message polishing - I'm planning to drop PgStat_StatReplSlotEntry.slotname and a PGSTAT_FILE_FORMAT_ID bump in master and to rename slotname to slotname_unused in 15. - Self-review found a bug, I copy-pasted create=false in the call to pgstat_get_entry_ref() in pgstat_acquire_replslot(). This did *NOT* cause any test failures - clearly our test coverage in this area is woefully inadequate. - This patch does not contain a test for the fix. I think this can only be tested by a tap test starting pg_recvlogical in the background and checking pg_recvlogical's output. That type of test is notoriously hard to be reliable, so committing it shortly before the release is wrapped seems like a bad idea. I manually verified that: - a continually active slot reporting stats after pgstat_reset_replslot() works correctly (this is what crashed before) - replslot stats reporting works correctly after stats were removed - upgrading from pre-fix to post-fix preserves stats (when keeping slotname / not increasing the stats version, of course) I'm planning to push this either later tonight (if I feel up to it after cooking dinner) or tomorrow morning PST, due to the release wrap deadline. Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: