Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
От | Alvaro Herrera |
---|---|
Тема | Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby |
Дата | |
Msg-id | 20131209190031.GK6777@eldon.alvh.no-ip.org обсуждение исходный текст |
Ответ на | Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby (Andres Freund <andres@2ndquadrant.com>) |
Ответы |
Re: BUG #8673: Could not open file
"pg_multixact/members/xxxx" on slave during hot_standby
|
Список | pgsql-bugs |
Andres Freund wrote: > Hi, > > On 2013-12-09 17:49:34 +0200, Serge Negodyuck wrote: > > On master there are files from 0000 to 14078 > > > > On slave there were absent files from A1xx to FFFF > > They were the oldest ones. (October, November) > > Some analysis later, I am pretty sure that the origin is a longstanding > problem and not connected to 9.3.[01] vs 9.3.2. > > The above referenced 14078 file is exactly the last page before a > members wraparound: > (gdb) p/x (1L<<32)/(MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT) > $10 = 0x14078 As a note, the SlruScanDirectory code has a flaw because it only looks at four-digit files; the reason only files up to 0xFFFF are missing and not the following ones is because those got ignored. This needs a fix as well. > So, what happened is that enough multixacts where created, that the > members slru wrapped around. It's not unreasonable for the members slru > to wrap around faster then the offsets one - after all we create at > least two entries into members for every offset entry. Also in 9.3+ > there fit more xids on a offset than a members page. > When truncating, we first read the offset, to know where we currently > are in members, and then truncate both from their respective > point. Since we've wrapped around in members we very well might remove > content we actually need. Yeah, on 9.3 each member Xid occupies five bytes in pg_multixact/members, whereas each offset only occupies four bytes in pg_multixact/offsets. It's rare that a multixact only contains one member; typically they will have at least two (so for each multixact we would have 4 bytes in offsets and a minimum of 10 bytes in members). So wrapping around is easy for members, even with the protections we have in place for offsets. > I've recently remarked that I find it dangerous that we only do > anti-wraparound stuff for pg_multixact/offsets, not for /members. So, > here we have the proof that that's bad. It's hard to see how to add this post-facto, though. I mean, I am thinking we would need some additional pg_control info etc. We'd better figure out a way to add such controls without having to add that. > This is an issue in <9.3 as well. It might, in some sense, even be worse > there, because we never vacuum old multis away. But on the other hand, > the growths of multis is slower there and we look into old multis less > frequently. > > The only reason that you saw the issue on the standby first is that the > truncation code is called more frequently there. Afaics it will happen, > sometime in the future, on the master as well. > > I think problems should be preventable if you issue a systemwide VACUUM > FREEZE, but please let others chime in before you execute it. I wouldn't freeze anything just yet, at least until the patch to fix multixact freezing is in. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-bugs по дате отправления: