Re: Making pg_rewind faster
| От | Robert Haas | 
|---|---|
| Тема | Re: Making pg_rewind faster | 
| Дата | |
| Msg-id | CA+TgmobtNkXDMSi+n0TO2O4cpSGZTVn9414xsHOfeiJc_gwPuA@mail.gmail.com обсуждение исходный текст  | 
		
| Ответ на | Re: Making pg_rewind faster (Michael Paquier <michael@paquier.xyz>) | 
| Список | pgsql-hackers | 
On Tue, Oct 28, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote: > I was thinking about this argument over the weekend, and I am > wondering if we could not do better here to detect if a file should be > copied or not. What if we included a checksum of each file if both > exist on the target and source, and just not copy them if the > checksums match? Well, that would require reading the entire file on both sides to compute the checksum, which sounds pretty expensive. I mean, a copy would only be a read on one side and a write on the other. Even granting that writes are more expensive than reads, a read of both sides would still be a substantial percentage of the total cost, I think. Also, I don't think we really want to reinvent a worse version of rsync. If you want to use checksums or file timestamps to decide what to copy, there are already good tools for that which probably handle that task better than our code ever will. What we can bring to the table is PG-specific logic, where we're able to reason about the behavior of PG in a way that a general-purpose tool can't. That's why for example we use the WAL to decide what data blocks need to be copied, rather than checksums -- it's an optimization that rsync can't do, and we can. The rule implemented here is similar: rsync can't know that WAL from before the divergence point should be the same on both sides, but we can. Now, of course, if in a specific situation the assumptions on which pg_rewind relies are not valid, e.g. because manual data directory modification has occurred, then pg_rewind should not be used. And if on the other hand we find some flaw that will keep pg_rewind from delivering correct results even when nothing strange has happened, then that's a bug or a design problem that we need to fix. But if we just start second-guessing ourselves by adding overhead to protect against can't-happen scenarios, we'll end up making pg_rewind useless. -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: