Re: group locking: incomplete patch, just for discussion
От | Robert Haas |
---|---|
Тема | Re: group locking: incomplete patch, just for discussion |
Дата | |
Msg-id | CA+TgmoZZRz61LdjyT129=w_UJq+7QXHHwvSQsQAxLaK5anfBCg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: group locking: incomplete patch, just for discussion (Simon Riggs <simon@2ndQuadrant.com>) |
Ответы |
Re: group locking: incomplete patch, just for discussion
Re: group locking: incomplete patch, just for discussion Re: group locking: incomplete patch, just for discussion |
Список | pgsql-hackers |
On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The procgloballist stuff should be the subject of a separate patch > which I agree with. Yes, I think that's probably a net improvement in robustness quite apart from what we decide to do about any of the rest of this. I've attached it here as revise-procglobal-tracking.patch and will commit that bit if nobody objects. The remainder is reattached without change as group-locking-v0.1.patch. Per your other comment, I've developed the beginnings of a testing framework which I attached here as test_group_locking-v0.patch. That doesn't look to have much hope of evolving into something we'd want even in contrib, but I think it'll be rather useful for debugging. It works like this: rhaas=# create table foo (a int); CREATE TABLE rhaas=# select test_group_locking('1.0:start,2.0:start,1.0:lock:AccessExclusiveLock:foo,2.0:lock:AccessExclusiveLock:foo'); NOTICE: starting worker 1.0 NOTICE: starting worker 2.0 NOTICE: instructing worker 1.0 to acquire AccessExclusiveLock on relation with OID 16387 NOTICE: instructing worker 2.0 to acquire AccessExclusiveLock on relation with OID 16387 ERROR: could not obtain AccessExclusiveLock on relation with OID 16387 CONTEXT: background worker, group 2, task 0 The syntax is a little arcane, I guess, but it's "documented" in the comments within. In this case I asked it to start up two background workers and have them both try to take AccessExclusiveLock on table foo. As expected, the second one fails. The idea is that workers are identified by a pair of numbers X.Y; two workers with the same X-value are in the same locking group. So if I call the second worker 1.1 rather than 2.0, it'll join the same locking group as worker 1.0 and ... then it does the wrong thing, and then it crashes the server, because my completely-untested code is unsurprisingly riddled with bugs. Eventually, this needs to be generalized a bit so that we can use it to test deadlock detection. That's tricky, because what you really want to do is tell worker A to wait for some lock and then, once you're sure it's on the wait queue, tell worker B to go take some other lock and check that you see the resulting deadlock. There doesn't seem to be a good API for the user backend to find out whether some background worker is waiting for some particular lock, so I may have to resort to the hacky expedient of having the driver process wait for a few seconds and assume that's long enough that the background worker will be on the wait queue by then. Or maybe I can drum up some solution, but anyway it's not done yet. The value of this test code is that we can easily reproduce locking scenarios which would be hard to reproduce in a real workload - e.g. because they're timing-dependent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: