Re: Logical Replica ReorderBuffer Size Accounting Issues
От | Alex Richman |
---|---|
Тема | Re: Logical Replica ReorderBuffer Size Accounting Issues |
Дата | |
Msg-id | CAMnUB3pARWPi0Gq6ZYOKvfkNGOAU9xTYq1R69e37T=qdxD9WJg@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: Logical Replica ReorderBuffer Size Accounting Issues ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>) |
Список | pgsql-bugs |
I think I reproduced this problem as you suggested
(Update the entire table in parallel). And I can reproduce this problem on both
current HEAD and REL_15_1. The memory used in rb->tup_context can reach 350M
in HEAD and reach 600MB in REL_15_1.
Great, thanks for your help in reproducing this.
But there's one more thing I'm not sure about. You mentioned in [2] that
pg_stat_replication_slots shows 0 spilled or streamed bytes for any slots. I
think this may be due to the timing of viewing pg_stat_replication_slots. In
the function ReorderBufferCheckMemoryLimit , after invoking the function
ReorderBufferSerializeTXN, even without actually freeing any used memory in
rb->tup_context, I could see spilled-related record in
pg_stat_replication_slots. Could you please help to confirm this point if
possible?
So on the local reproduction using the test scripts we have in the last two emails, I do see some streamed bytes on the test slot. However in production I still see 0 streamed or spilled bytes, and the walsenders there regularly reach some gigabytes of RSS. I think it is the same root bug but with a far greater scale in production (millions of tiny updates instead of 16 large ones). I should also note that in production we have ~40 subscriptions/walsenders rather than 1 in the test reproduction here, so there's a lot of extra CPU churning through the work.
Thanks for your continued analysis of the GenerationAlloc/Free stuff - I'm afraid I'm out of my depth there but let me know if you need any more information on reproducing the issue or testing patches etc.
Thanks,
- Alex.
В списке pgsql-bugs по дате отправления: