Re: Reduce useless changes before reassembly during logical replication
От | Andy Fan |
---|---|
Тема | Re: Reduce useless changes before reassembly during logical replication |
Дата | |
Msg-id | 87cysoevyv.fsf@163.com обсуждение исходный текст |
Ответ на | Re: Reduce useless changes before reassembly during logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Reduce useless changes before reassembly during logical replication
|
Список | pgsql-hackers |
Hi, > > This is all true but note that in successful cases (where the table is > published) all the work done by FilterByTable(accessing caches, > transaction-related stuff) can add noticeable overhead as anyway we do > that later in pgoutput_change(). I think I gave the same comment > earlier as well but didn't see any satisfactory answer or performance > data for successful cases to back this proposal. I did some benchmark yesterday at [1] and found it adds 20% cpu time. then come out a basic idea, I think it deserves a share. "transaction related stuff" comes from the syscache/systable access except the HistorySansphot. and the syscache is required in the following sistuations: 1. relfilenode (from wal) -> relid. 2. relid -> namespaceid (to check if the relid is a toast relation). 3. if toast, get its origianl relid. 4. access the data from pg_publication_tables. 5. see if the relid is a partition, if yes, we may get its root relation. Acutally we already has a RelationSyncCache for #4, and it *only* need to access syscache when replicate_valid is false, I think this case should be rare, but the caller doesn't know it, so the caller must prepare the transaction stuff in advance even in the most case they are not used. So I think we can get a optimization here. then the attached patch is made. Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com> Date: Wed Feb 21 18:40:03 2024 +0800 Make get_rel_sync_entry less depending on transaction state. get_rel_sync_entry needs transaction only a replicate_valid = false entry is found, this should be some rare case. However the caller can't know if a entry is valid, so they have to prepare the transaction state before calling this function. Such preparation is expensive. This patch makes the get_rel_sync_entry can manage a transaction stage only if necessary. so the callers don't need to prepare it blindly. Then comes to #1, acutally we have RelfilenumberMapHash as a cache, when the cache is hit (suppose this is a usual case), no transaction stuff related. I have two ideas then: 1. Optimize the cache hit sistuation like what we just did for get_rel_sync_entry for the all the 5 kinds of data and only pay the effort for cache miss case. for the data for #2, #3, #5, all the keys are relid, so I think a same HTAB should be OK. 2. add the content for #1, #2, #3, #5 to wal when wal_level is set to logical. In either case, the changes for get_rel_sync_entry should be needed. > Note, users can > configure to stream_in_progress transactions in which case they > shouldn't see such a big problem. People would see the changes is spilled to disk, but the CPU cost for Reorder should be still paid I think. [1] https://www.postgresql.org/message-id/87o7cadqj3.fsf%40163.com -- Best Regards Andy Fan
Вложения
В списке pgsql-hackers по дате отправления: