Re: POC: Cleaning up orphaned files using undo logs
| От | Amit Kapila |
|---|---|
| Тема | Re: POC: Cleaning up orphaned files using undo logs |
| Дата | |
| Msg-id | CAA4eK1LXuzg7MN0=ws2tZYnraQ8EXzpEF1=EW3qUDA5RUWG4vQ@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: POC: Cleaning up orphaned files using undo logs (Antonin Houska <ah@cybertec.at>) |
| Ответы |
Re: POC: Cleaning up orphaned files using undo logs
|
| Список | pgsql-hackers |
On Fri, Nov 13, 2020 at 6:02 PM Antonin Houska <ah@cybertec.at> wrote: > > Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Thu, Nov 12, 2020 at 2:45 PM Antonin Houska <ah@cybertec.at> wrote: > > > > If you want to track at undo record level, then won't it lead to > > performance overhead and probably additional WAL overhead considering > > this action needs to be WAL-logged. I think recording at page-level > > might be a better idea. > > I'm not worried about WAL because the undo execution needs to be WAL-logged > anyway - see smgr_undo() in the 0005- part of the patch set. What needs to be > evaluated regarding performance is the (exclusive) locking of the page that > carries the progress information. > That is just for one kind of smgr, think how you will do it for something like zheap. Their idea is to collect all the undo records (unless the undo for a transaction is very large) for one zheap-page and apply them together, so maintaining the status at each undo record level will surely lead to a large amount of additional WAL. See below how and why we have decided to do it differently. > I'm still not sure whether this info should > be on every page or only in the chunk header. In either case, we have a > problem if there are two or more chunks created by different transactions on > the same page, and if more than on of these transactions need to perform > undo. I tend to believe that this should happen rarely though. > I think we need to maintain this information at the transaction level and need to update it after processing a few blocks, at least that is what was decided and implemented earlier. We also need to update it when the log is switched or all the actions of the transaction were applied. The reasoning is that for short transactions it won't matter and for larger transactions, it is good to update it after a few pages to avoid WAL and locking overhead. Also, it is better if we collect the undo in bulk, this is proved to be beneficial for large transactions. The earlier version of the patch having all these ideas implemented is attached (Infrastructure-to-execute-pending-undo-actions and Provide-interfaces-to-store-and-fetch-undo-records). The second one has some APIs used by the first one but the main concepts were implemented in the first one (Infrastructure-to-execute-pending-undo-actions). I see that in the current version these can't be used as it is but still it can give us a good start point and we might be able to either re-use some code and or ideas from these patches. -- With Regards, Amit Kapila.
Вложения
В списке pgsql-hackers по дате отправления: