Re: Applying logical replication changes by more than one process

Поиск
Список
Период
Сортировка
От konstantin knizhnik
Тема Re: Applying logical replication changes by more than one process
Дата
Msg-id 54C38871-CE2C-410C-B37F-1E191F9A41F9@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Applying logical replication changes by more than one process  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers

On Mar 22, 2016, at 10:10 AM, Craig Ringer wrote:

On 22 March 2016 at 14:32, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:
 
Ah you mean because with wal_log=true the origin advance is in different WAL record than commit? OK yeah you might be one transaction behind then, true.

It actually means that we can not enforce database consistency. If we do replorigin_advance  before commit and then crash happen, then we will loose some changes.
If we call replorigin_advance after commit but crash happen before, then some changes can be applied multiple times. For example we can insert some record twice (if there are no unique constraints).
Look likes the only working scenario is to setup replication session for each commit and use locking to prevent concurrent session setup for the same slot by multiple process,  doesn't it?

Yes.

How would you expect it to work if you attempted to replorigin_advance without a session? From multiple concurrent backends?

I would not work. But I wonder why I would need to call replorigin_advance without a session.
Please excuse me, I am not thinking now about the general case of using logical replication, I just focused on multimaster.
What I need is some efficient, durable and atomic mechanism for applying changes.
I see only two ways to provide atomicity:
1. Tracking of origins should be done inside xact as part of normal commit.
2. Use custom WAL records.

1) is easier now and it really works if I correctly synchronize access to slots. And surprisingly it even doesn't add substantial overhead.


Parallel apply is complicated business. You have to make sure you apply xacts in an order that's free from deadlocks and from insert/delete anomalies - though you can at least detect those, ERROR that xact and all subsequent ones, and retry.

Well, this is exactly what our multimaster does. We do not try to enforce order of applying xacts. But we detect global deadlocks and use 2PC to provide data consistency.
So it is not task of logical replication, it is done by DTM overriding  visibility checks and transaction commit protocol using XTM.


For progress tracking to be consistent and correct you'd have to make sure you committed strictly in the same order as upstream. Just before each commit you can set the origin LSN and advance the replication origin, which will commit atomically along with the commit it confirms. I don't really see the problem.

Sorry, I do not completely understand you. What you mean by "will commit atomically along with the commit it confirms"? How this atomicity will be enforced?

 
I have tried it, fortunately it doesn't cause any noticeable performance degradation. But unfortunately  can't consider such approach as elegant.
Why it is actually necessary to bind replication slot to process? Why it is not possible to have multiple concurrent sessions for the same slot?

Especially since most slot changes LWLock- and/or spinlock-protected already.

The client would have to manage replay confirmations appropriately so that it doesn't confirm past the point where some other connection still needs it.

We'd have to expose a "slot" column in pg_stat_replication and remove the "pid" column from pg_replication_slots to handle the 1:n relationship between slot clients and slots, and it'd be a pain to show which normal user backends were using a slot. Not really sure how to handle that.

To actually make this useful would require a lot more though. A way to request that replay start from a new LSN without a full disconnect/reconnect each time. Client-side parallel consume/apply. Inter-transaction ordering information so the client can work out a viable xact apply order (possibly using SSI information per the discussion with Kevin?). Etc.

I haven't really looked into this and I suspect there are some hairy areas involved in replaying a slot from more than one client. The reason I'm interested in it personally is for initial replica state setup as Oleksandr prototyped and described earlier. We could attach to the slot's initial snapshot then issue a new replication command that, given a table name or oid, scans the table from the snapshot and passes each tuple to a new callback (like, but not the same as, the insert callback) on the output plugin.

That way clients could parallel-copy the initial state of the DB across the same replication protocol they then consume new changes from, with no need to make normal libpq connections and COPY initial state.

I'm interested in being able to do parallel receive of new changes from the slot too, but suspect that'd be a bunch harder.

  
Also I concern about using sequential search for slot location in replorigin_session_setup and many other functions - there is loop through all  max_replication_slots.
It seems to be not a problem when number of slots is less than 10. For multimaster this assumption is true - even Oracle RAC rarely has two-digit number of nodes.
But if we want to perform sharding and use logical replication for providing redundancy, then number of nodes and slots can be essentially larger.

Sounds like premature optimisation. Deal with it if it comes up in profiles in scale testing with 100 clients. I'll be surprised if it does.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Petr Jelinek
Дата:
Сообщение: Re: Applying logical replication changes by more than one process
Следующее
От: konstantin knizhnik
Дата:
Сообщение: Re: Applying logical replication changes by more than one process