Re: [HACKERS] Fix mdsync never-ending loop problem
От | Heikki Linnakangas |
---|---|
Тема | Re: [HACKERS] Fix mdsync never-ending loop problem |
Дата | |
Msg-id | 461E4C5E.8050509@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Fix mdsync never-ending loop problem (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] Fix mdsync never-ending loop problem
|
Список | pgsql-patches |
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> Ok. Perhaps we should not use the canceled-flag but just remove the >> entry from pendingOpsTable like we used to when mdsync_in_progress isn't >> set. > > I'm not thrilled about that; it seems overly intricate, and won't the > LDC patch make it mostly useless anyway (because of time-extended > checkpointing)? Not quite useless, but definitely less useful, depending on how long the checkpoints are stretched and how much of the time is allocated to fsyncing (the defaults in the latest LDC patch was 20%). OTOH, this doesn't seem like a very performance sensitive codepath anyway, so we should just stick to the simplest thing that works. >> I think there's one little bug in the patch: > >> 1. AbsorbFsyncRequests is called. A FORGET message is received, and an >> entry in the hash table is marked as canceled >> 2. Another relation with the same relfilenode is created. This can >> happen after OID wrap-around >> 3. RememberFsyncRequest is called for the new relation. The old entry is >> still in the hash table, marked with the canceled-flag, so it's not touched. > > Good point. I was wondering what to do with an already-canceled entry, > but didn't think of that scenario. I think your fix is not quite right: > if we clear a pre-existing cancel flag then we do need to set cycle_ctr, > because this is effectively an all-new request. Hmm, I guess. Though not setting it just makes us fsync the file earlier than necessary, which isn't too bad either. I believe Itagaki-san's motivation for tackling this in the LDC patch was the fact that it can fsync the same file many times, and in the worst case go to an endless loop, and adding delays inside the loop makes it much more likely. After that is fixed, I doubt any of the optimizations of trying to avoid extra fsyncs make any difference in real applications, and we should just keep it simple, especially if we back-patch it. That said, I'm getting tired of this piece of code :). I'm happy to have any of the discussed approaches committed soon and move on. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-patches по дате отправления: