Re: .ready and .done files considered harmful
От | Bossart, Nathan |
---|---|
Тема | Re: .ready and .done files considered harmful |
Дата | |
Msg-id | 65F427BD-6390-47E3-8F6C-2872BCFEE005@amazon.com обсуждение исходный текст |
Ответ на | Re: .ready and .done files considered harmful (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Список | pgsql-hackers |
On 9/20/21, 1:42 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote: > On 2021-Sep-20, Robert Haas wrote: > >> I was thinking that this might increase the number of directory scans >> by a pretty large amount when we repeatedly catch up, then 1 new file >> gets added, then we catch up, etc. > > I was going to say that perhaps we can avoid repeated scans by having a > bitmap of future files that were found by a scan; so if we need to do > one scan, we keep track of the presence of the next (say) 64 files in > our timeline, and then we only have to do another scan when we need to > archive a file that wasn't present the last time we scanned. However: This sounds a bit like the other approach discussed earlier in this thread [0]. >> But I guess your thought process is that such directory scans, even if >> they happen many times per second, can't really be that expensive, >> since the directory can't have much in it. Which seems like a fair >> point. I wonder if there are any situations in which there's not much >> to archive but the archive_status directory still contains tons of >> files. > > (If we take this stance, which seems reasonable to me, then we don't > need to optimize.) But perhaps we should complain if we find extraneous > files in archive_status -- Then it'd be on the users' heads not to leave > tons of files that would slow down the scan. The simplest situation I can think of that might be a problem is when checkpointing is stuck and the .done files are adding up. However, after the lengthy directory scan, you should still be able to archive several files without a scan of archive_status. And if you are repeatedly catching up, the extra directory scans probably aren't hurting anything. At the very least, this patch doesn't make things any worse in this area. BTW I attached a new version of the patch with a couple of small changes. Specifically, I adjusted some of the comments and moved the assignment of last_dir_scan to after the directory scan completes. Before, we were resetting it before the directory scan, so if the directory scan took too long, you'd still end up scanning archive_status for every file. I think that's still possible if your archive_command is especially slow, but archiving isn't going to keep up anyway in that case. Nathan [0] https://www.postgresql.org/message-id/attachment/125980/0001-Improve-performance-of-pgarch_readyXlog-with-many-st.patch
Вложения
В списке pgsql-hackers по дате отправления: