Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Logical decoding slots can go backwards when used from SQL, docs are wrong
Дата
Msg-id CANP8+jL4SVA7OVKL-FkH6T18O+5eAtY1MoAuLjZnj8OAXJp6XQ@mail.gmail.com
обсуждение исходный текст
Ответ на Logical decoding slots can go backwards when used from SQL, docs are wrong  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: Logical decoding slots can go backwards when used from SQL, docs are wrong  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On 11 March 2016 at 08:19, Craig Ringer <craig@2ndquadrant.com> wrote:
Hi all

I think I found a couple of logical decoding issues while writing tests for failover slots.

Despite the docs' claim that a logical slot will replay data "exactly once", a slot's confirmed_lsn can go backwards and the SQL functions can replay the same data more than once.We don't mark a slot as dirty if only its confirmed_lsn is advanced, so it isn't flushed to disk. For failover slots this means it also doesn't get replicated via WAL. After a master crash, or for failover slots after a promote event, the confirmed_lsn will go backwards.  Users of the SQL interface must keep track of the safely locally flushed slot position themselves and throw the repeated data away. Unlike with the walsender protocol it has no way to ask the server to skip that data.

Yeh, I read that and thought it was wrong. "At least once, in order" is what we want, how it works and how it should be described.
 
Worse, because we don't dirty the slot even a *clean shutdown* causes slot confirmed_lsn to go backwards. That's a bug IMO. We should force a flush of all slots at the shutdown checkpoint, whether dirty or not, to address it.

Given the earlier understanding, above, I'm not sure I see why that must happen. But I agree it should happen.
 
Barring objections I intend to submit a fix to:

- Document that slots can replay data more than once
- Force flush of all slots on a shutdown checkpoint

+1
 
Also, pg_logical_slot_get_changes and its _peek_ variant should have a param specifying the starting LSN to read and return. If this is lower than the restart_lsn but non-null it should ERROR; if it's greater than or equal it should use this position instead of starting at the confirmed_lsn.

Maybe. I don't really like changing APIs. Can we add new funcs? Make sure a NULL LSN can be passed as well.

Is the return type of pg_logical_slot_peek_changes() incorrect in the docs?
 
Time permitting I'd like to add a pg_logical_slot_confirm function, so you can aternate _peek_changes and _confirm, making it possible to get walsender-interface-like behaviour via the SQL interface.

I thought thats what replorigins do.
 
Right now you can't reliably use the SQL interface because _get_changes can succeed locally and advance the slot but the client can fail to receive all the changes due to network issues etc. Sure, the SQL interface is meant mainly for testing, but especially for !postgresql downstreams I strongly suspect people will want to use it for "real" work rather than have to modify each client driver to support replication protocol extensions.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Refectoring of receivelog.c
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: The plan for FDW-based sharding