Re: BF mamba failure
От | Michael Paquier |
---|---|
Тема | Re: BF mamba failure |
Дата | |
Msg-id | Zm-8Xo93K9yD9fy7@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BF mamba failure (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: BF mamba failure
|
Список | pgsql-hackers |
On Fri, Jun 14, 2024 at 02:31:37PM +0900, Michael Paquier wrote: > I don't think that this is going to fly far except if we introduce a > concept of "generation" or "age" in the stats entries. The idea is > simple: when a stats entry is reinitialized because of a drop&create, > increment a counter to tell that this is a new generation, and keep > track of it in *both* PgStat_EntryRef (local backend reference to the > shmem stats entry) *and* PgStatShared_HashEntry (the central one). > When releasing an entry, if we know that the shared entry we are > pointing at is not of the same generation as the local reference, it > means that the entry has been reused for something else with the same > hash key, so give up. It should not be that invasive, still it means > ABI breakage in the two pgstats internal structures I am mentioning, > which is OK for a backpatch as this stuff is internal. On top of > that, this problem means that we can silently and randomly lose stats, > which is not cool. > > I'll try to give it a go on Monday. Here you go, the patch introduces what I've named an "age" counter attached to the shared entry references, and copied over to the local references. The countner is initialized at 0 and incremented each time an entry is reused, then when attempting to drop an entry we cross-check the version hold locally with the shared one. While looking at the whole, this is close to a concept patch sent previously, where a counter is used in the shared entry with a cross-check done with the local reference, that was posted here (noticed that today): https://www.postgresql.org/message-id/20230603.063418.767871221863527769.horikyota.ntt@gmail.com The logic is different though, as we don't need to care about the contents of the local cache when cross-checking the "age" count when retrieving the contents, just the case where a backend would attempt to drop an entry it thinks is OK to operate on, that got reused because of the effect of other backends doing creates and drops with the same hash key. This idea needs more eyes, so I am adding that to the next CF for now. I've played with it for a few hours and concurrent replication slot drops/creates, without breaking it. I have not implemented an isolation test for this case, as it depends on where we are going with their integration with injection points. -- Michael
Вложения
В списке pgsql-hackers по дате отправления: