Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
От | Michael Paquier |
---|---|
Тема | Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced |
Дата | |
Msg-id | 20180605042856.GB5840@paquier.xyz обсуждение исходный текст |
Ответ на | Re: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced (Petr Jelinek <petr.jelinek@2ndquadrant.com>) |
Ответы |
Re: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced
|
Список | pgsql-hackers |
On Mon, Jun 04, 2018 at 11:51:35AM +0200, Petr Jelinek wrote: > On 01/06/18 21:13, Michael Paquier wrote: >> - startlsn =3D MyReplicationSlot->data.confirmed_flush; >> + if (OidIsValid(MyReplicationSlot->data.database)) >> + startlsn =3D MyReplicationSlot->data.confirmed_flush; >> + else >> + startlsn =3D MyReplicationSlot->data.restart_lsn; >> + >> if (moveto < startlsn) >> { >> ReplicationSlotRelease(); > > This part looks correct for the checking that we are not moving > backwards. However, there is another existing issue with this code > which > is that we are later using the confirmed_flush (via startlsn) as start > point of logical decoding (XLogReadRecord parameter in > pg_logical_replication_slot_advance) which is not correct. The > restart_lsn should be used for that. I think it would make sense to > fix > that as part of this patch as well. I am not sure I understand what you are coming at here. Could you explain your point a bit more please as 9c7d06d is yours? When creating the decoding context, all other code paths use the confirmed LSN as a start point if not explicitely defined by the caller of CreateDecodingContext, as it points to the last LSN where a transaction has been committed and decoded. The backward check is also correct to me, for which I propose to add a comment block like that: + /* + * Check if the slot is not moving backwards. Physical slots rely + * simply on restart_lsn as a minimum point, while logical slots + * have confirmed consumption up to confirmed_lsn, meaning that + * in both cases data older than that is not available anymore. + */ + if (OidIsValid(MyReplicationSlot->data.database)) + minlsn = MyReplicationSlot->data.confirmed_flush; + else + minlsn = MyReplicationSlot->data.restart_lsn; Any tests I do are showing me that using confirmed_lsn would not matter much? as we want the slot's consumer to still decode transactions whose commits happened after the point where the slot has been advanced to. So let's make sure that we are on the same page for the starting LSN used. On top of that, the locking issues in CreateInitDecodingContext() and DecodingContextFindStartpoint go back to 9.4. So I would be inclined to get 0001 applied first as a bug fix on all branches, still that's a minor issue so there could be arguments for just doing it on HEAD. I am as well fully open to suggestions for the extra comments which document the use of ReplicationSlotControlLock and mutex for in-memory slot data. Any thoughts about those two last points? -- Michael
Вложения
В списке pgsql-hackers по дате отправления: