the s_lock_stuck on perform_spin_delay
От | Andy Fan |
---|---|
Тема | the s_lock_stuck on perform_spin_delay |
Дата | |
Msg-id | 87plyhvama.fsf@163.com обсуждение исходный текст |
Ответы |
Re: the s_lock_stuck on perform_spin_delay
Re: the s_lock_stuck on perform_spin_delay Re: the s_lock_stuck on perform_spin_delay |
Список | pgsql-hackers |
Hi, from src/backend/storage/lmgr/README: """ Spinlocks. These are intended for *very* short-term locks. If a lock is to be held more than a few dozen instructions, or across any sort of kernel call (or even a call to a nontrivial subroutine), don't use a spinlock. Spinlocks are primarily used as infrastructure for lightweight locks. """ I totally agree with this and IIUC spin lock is usually used with the following functions. #define init_local_spin_delay(status) .. void perform_spin_delay(SpinDelayStatus *status); void finish_spin_delay(SpinDelayStatus *status); During the perform_spin_delay, we have the following codes: void perform_spin_delay(SpinDelayStatus *status) /* Block the process every spins_per_delay tries */ if (++(status->spins) >= spins_per_delay) { if (++(status->delays) > NUM_DELAYS) s_lock_stuck(status->file, status->line, status->func); the s_lock_stuck will PAINC the entire system. My question is if someone doesn't obey the rule by mistake (everyone can make mistake), shall we PANIC on a production environment? IMO I think it can be a WARNING on a production environment and be a stuck when 'ifdef USE_ASSERT_CHECKING'. People may think spin lock may consume too much CPU, but it is not true in the discussed scene since perform_spin_delay have pg_usleep in it, and the MAX_DELAY_USEC is 1 second and MIN_DELAY_USEC is 0.001s. I notice this issue actually because of the patch "Cache relation sizes?" from Thomas Munro [1], where the latest patch[2] still have the following code. + sr = smgr_alloc_sr(); <-- HERE a spin lock is hold + + /* Upgrade to exclusive lock so we can create a mapping. */ + LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex operation is needed. it may take a long time. Our internal testing system found more misuses on our own PG version. I think a experienced engineer like Thomas can make this mistake and the patch was reviewed by 3 peoples, the bug is still there. It is not easy to say just don't do it. the attached code show the prototype in my mind. Any feedback is welcome. [1] https://www.postgresql.org/message-id/CA%2BhUKGJg%2BgqCs0dgo94L%3D1J9pDp5hKkotji9A05k2nhYQhF4%2Bw%40mail.gmail.com [2] https://www.postgresql.org/message-id/attachment/123659/v5-0001-WIP-Track-relation-sizes-in-shared-memory.patch -- Best Regards Andy Fan
Вложения
В списке pgsql-hackers по дате отправления: