RE: Logical replication timeout problem
От | wangw.fnst@fujitsu.com |
---|---|
Тема | RE: Logical replication timeout problem |
Дата | |
Msg-id | OS3PR01MB6275FEB9F83081F1C87539B99E019@OS3PR01MB6275.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Logical replication timeout problem (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
RE: Logical replication timeout problem
|
Список | pgsql-hackers |
On Wed, Feb 22, 2022 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Thanks for your review. > On Tue, Feb 22, 2022 at 9:17 AM wangw.fnst@fujitsu.com > <wangw.fnst@fujitsu.com> wrote: > > > > On Fri, Feb 18, 2022 at 10:51 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > Some comments: > > Thanks for your review. > > > > > I see you only track skipped Inserts/Updates and Deletes. What about > > > DDL operations that are skipped, what about truncate. > > > What about changes made to unpublished tables? I wonder if you could > > > create a test script that only did DDL operations > > > and truncates, would this timeout happen? > > According to your suggestion, I tested with DDL and truncate. > > While testing, I ran only 20,000 DDLs and 10,000 truncations in one > > transaction. > > If I set wal_sender_timeout and wal_receiver_timeout to 30s, it will time out. > > And if I use the default values, it will not time out. > > IMHO there should not be long transactions that only contain DDL and > > truncation. I'm not quite sure, do we need to handle this kind of use case? > > > > I think it is better to handle such cases as well and changes related > to unpublished tables as well. BTW, it seems Kuroda-San has also given > some comments [1] which I am not sure are addressed. Add handling of related use cases. > I think instead of keeping the skipping threshold w.r.t > wal_sender_timeout, we can use some conservative number like 10000 or > so which we are sure won't impact performance and won't lead to > timeouts. Yes, it would be better. Set the threshold conservatively to 10000. > * > + /* > + * skipped_changes_count is reset when processing changes that do not need > to > + * be skipped. > + */ > + skipped_changes_count = 0 > > When the skipped_changes_count is reset, the sendTime should also be > reset. Can we reset it whenever the UpdateProgress function is called > with send_keep_alive as false? Fixed. Attached a new patch that addresses following improvements I have got so far as comments: 1. Consider other changes that need to be skipped(truncate, DDL and function calls pg_logical_emit_message). [suggestion by Ajin, Amit] (Add a new function SendKeepaliveIfNecessary for trying to send keepalive message.) 2. Set the threshold conservatively to a static value of 10000.[suggestion by Amit, Kuroda-San] 3. Reset sendTime in function WalSndUpdateProgress when send_keep_alive is false. [suggestion by Amit] Regards, Wang wei
Вложения
В списке pgsql-hackers по дате отправления: