Re: [HACKERS] Restricting maximum keep segments by repslots
От | Alvaro Herrera |
---|---|
Тема | Re: [HACKERS] Restricting maximum keep segments by repslots |
Дата | |
Msg-id | 20200406221555.GA16211@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: [HACKERS] Restricting maximum keep segments by repslots (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Ответы |
Re: [HACKERS] Restricting maximum keep segments by repslots
|
Список | pgsql-hackers |
On 2020-Apr-06, Alvaro Herrera wrote: > Lastly, I noticed that we're now changing the slot's restart_lsn to > Invalid without being the slot's owner, which goes counter to what is > said in slot.h: > > * - Individual fields are protected by mutex where only the backend owning > * the slot is authorized to update the fields from its own slot. The > * backend owning the slot does not need to take this lock when reading its > * own fields, while concurrent backends not owning this slot should take the > * lock when reading this slot's data. > > What this means is that if the slot owner walsender updates the > restart_lsn to a newer value just as we (checkpointer) are trying to set > it to Invalid, the owner's value might persist and our value would be > lost. > > AFAICT if we were really stressed about getting this exactly correct, > then we would have to kill the walsender, wait for it to die, then > ReplicationSlotAcquire and *then* update > MyReplicationSlot->data.restart_lsn. So I had cold feet about the whole business of trying to write a non-owned replication slot, so I tried to implemented the "exactly correct" idea above. That's v25 here. I think there's a race condition in this: if we kill a walsender and it restarts immediately before we (checkpoint) can acquire the slot, we will wait for it to terminate on its own. Fixing this requires changing the ReplicationSlotAcquire API so that it knows not to wait but not raise error either (so we can use an infinite loop: "acquire, if busy send signal") I also include a separate diff for a change that might or might not be necessary, where xmins reserved by slots with restart_lsn=invalid are ignored. I'm not yet sure that we should include this, but we should keep an eye on it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: