Re: the s_lock_stuck on perform_spin_delay
От | Andy Fan |
---|---|
Тема | Re: the s_lock_stuck on perform_spin_delay |
Дата | |
Msg-id | 871qaxp3ly.fsf@163.com обсуждение исходный текст |
Ответ на | Re: the s_lock_stuck on perform_spin_delay (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Список | pgsql-hackers |
Hi Matthias and Robert, Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > On Thu, 4 Jan 2024 at 08:09, Andy Fan <zhihuifan1213@163.com> wrote: >> >> My question is if someone doesn't obey the rule by mistake (everyone >> can make mistake), shall we PANIC on a production environment? IMO I >> think it can be a WARNING on a production environment and be a stuck >> when 'ifdef USE_ASSERT_CHECKING'. >> [...] >> I think a experienced engineer like Thomas can make this mistake and the >> patch was reviewed by 3 peoples, the bug is still there. It is not easy >> to say just don't do it. >> >> the attached code show the prototype in my mind. Any feedback is welcome. > > While I understand your point and could maybe agree with the change > itself (a crash isn't great), It's great that both of you agree that the crash is not great. > I don't think it's an appropriate fix > for the problem of holding a spinlock while waiting for a LwLock, as > spin.h specifically mentions the following (and you quoted the same): > > """ > Keep in mind the coding rule that spinlocks must not be held for more > than a few instructions. > """ Yes, I agree that the known [LW]LockAcquire after holding a Spin lock should be fixed at the first chance rather than pander to it with my previous patch. My previous patch just take care of the *unknown* cases (and I cced thomas in the hope that he knows the bug). I also agree that the special case about [LW]LockAcquire should be detected more effective as you suggested below. So v2 comes and commit 2 is for this suggestion. > > I suspect that we'd be better off with stronger protections against > waiting for LwLocks while we hold any spin lock. More specifically, > I'm thinking about something like tracking how many spin locks we > hold, and Assert()-ing that we don't hold any such locks when we start > to wait for an LwLock or run CHECK_FOR_INTERRUPTS-related code (with > potential manual contextual overrides in specific areas of code where > specific care has been taken to make it safe to hold spin locks while > doing those operations - although I consider their existence unlikely > I can't rule them out as I've yet to go through all lock-touching > code). This would probably work in a similar manner as > START_CRIT_SECTION/END_CRIT_SECTION. > > Kind regards, > > Matthias van de Meent > Neon (https://neon.tech) -- Best Regards Andy Fan
Вложения
В списке pgsql-hackers по дате отправления: