Обсуждение: Chronic performance issue with Replication Failover and FSM.
All, I've discovered a built-in performance issue with replication failover at one site, which I couldn't find searching the archives. I don't really see what we can do to fix it, so I'm posting it here in case others might have clever ideas. 1. The Free Space Map is not replicated between servers. 2. Thus, when we fail over to a replica, it starts with a blank FSM. 3. I believe replica also starts with zero counters for autovacuum. 4. On a high-UPDATE workload, this means that the replica assumes tables have no free space until it starts to build a new FSM or autovacuum kicks in on some of the tables, much later on. 5. If your hosting is such that you fail over a lot (such as on AWS), then this causes cumulative table bloat which can only be cured by a VACUUM FULL. I can't see any way around this which wouldn't also bog down replication. Clever ideas, anyone? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Mar 13, 2012 at 4:53 PM, Josh Berkus <josh@agliodbs.com> wrote: > All, > > I've discovered a built-in performance issue with replication failover > at one site, which I couldn't find searching the archives. I don't > really see what we can do to fix it, so I'm posting it here in case > others might have clever ideas. > > 1. The Free Space Map is not replicated between servers. > > 2. Thus, when we fail over to a replica, it starts with a blank FSM. > > 3. I believe replica also starts with zero counters for autovacuum. > > 4. On a high-UPDATE workload, this means that the replica assumes tables > have no free space until it starts to build a new FSM or autovacuum > kicks in on some of the tables, much later on. > > 5. If your hosting is such that you fail over a lot (such as on AWS), > then this causes cumulative table bloat which can only be cured by a > VACUUM FULL. > > I can't see any way around this which wouldn't also bog down > replication. Clever ideas, anyone? Would it bog it down by "much"? (1 byte per 8kb) * 2TB = 250MB. Even if you doubled or tripled it for pointer-overhead reasons it's pretty menial, whereas VACUUM traffic is already pretty intense. Still, it's clearly...work. -- fdr
On Wed, Mar 14, 2012 at 8:53 AM, Josh Berkus <josh@agliodbs.com> wrote: > All, > > I've discovered a built-in performance issue with replication failover > at one site, which I couldn't find searching the archives. I don't > really see what we can do to fix it, so I'm posting it here in case > others might have clever ideas. > > 1. The Free Space Map is not replicated between servers. > > 2. Thus, when we fail over to a replica, it starts with a blank FSM. > > 3. I believe replica also starts with zero counters for autovacuum. > > 4. On a high-UPDATE workload, this means that the replica assumes tables > have no free space until it starts to build a new FSM or autovacuum > kicks in on some of the tables, much later on. If it's really a high-UPDATE workload, wouldn't autovacuum start soon? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Mar 13, 2012 at 7:05 PM, Fujii Masao <masao.fujii@gmail.com> wrote:> > If it's really a high-UPDATE workload, wouldn't autovacuum start soon? Also, while vacuum cleanup records are applied, could not the standby also update its free space map, without having to send the actual FSM updates? I guess that's bogging down of another variety. -- fdr
On 14.03.2012 01:53, Josh Berkus wrote: > 1. The Free Space Map is not replicated between servers. > > 2. Thus, when we fail over to a replica, it starts with a blank FSM. The FSM is included in the base backup, and it is updated when VACUUM records are replayed. It is also updated when insert/update/delete records are replayed, athough there's some fuzziness there: records with full page images don't update the FSM, and the FSM is only updated when the page has less than 20% of free space left. But that would cause an error in the other direction, with the FSM claiming that some pages have more free space than they do in reality. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki, > The FSM is included in the base backup, and it is updated when VACUUM > records are replayed. Oh? Hmmmm. In that case, the issue I'm seeing in production is something else. Unless that was a change for 9.1? > It is also updated when insert/update/delete records are replayed, > athough there's some fuzziness there: records with full page images > don't update the FSM, and the FSM is only updated when the page has less > than 20% of free space left. But that would cause an error in the other > direction, with the FSM claiming that some pages have more free space > than they do in reality. Thanks. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 20.03.2012 23:41, Josh Berkus wrote: > Heikki, > >> The FSM is included in the base backup, and it is updated when VACUUM >> records are replayed. > > Oh? Hmmmm. In that case, the issue I'm seeing in production is > something else. Unless that was a change for 9.1? No, it's been like that since 8.4, when the FSM was rewritten. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Mar 13, 2012 at 4:53 PM, Josh Berkus <josh@agliodbs.com> wrote: > 4. On a high-UPDATE workload, this means that the replica assumes tables > have no free space until it starts to build a new FSM or autovacuum > kicks in on some of the tables, much later on. > > 5. If your hosting is such that you fail over a lot (such as on AWS), > then this causes cumulative table bloat which can only be cured by a > VACUUM FULL. I'd like to revive this thread. Like other people, I thought this was not a huge problem -- or least maybe not directly from the mechanism proposed -- but sometimes it's a pretty enormous one, and I've started to notice it. I did a bug report here (http://archives.postgresql.org/pgsql-bugs/2012-08/msg00108.php, plots in http://archives.postgresql.org/pgsql-performance/2012-08/msg00181.php), but just today we promoted another system via streaming replication to pick up the planner fix in 9.1.5 (did you know: that planner bug seems to make GIN FTS indexes un-used in non-exotic cases, and one goes to seqscan?), and then a 40MB GIN index bloated to two gigs on a 1.5GB table over the course of maybe six hours. In addition, the thread on pgsql-performance that has the plot I linked to indicates someone having the same problem with 8.3 after a warm-standby promotion. So I think there are some devils at work here, and I am not even sure if they are hard to reproduce -- yet, people use standby promotion ("unfollow") on Heroku all the time and we have not been plagued mightily by support issues involving such incredible bloating, so there's something about the access pattern. In my two cases, there is a significant number of UPDATEs vs actual number of INSERTs/DELETES of records (the ratio is probably 10000+ to 1), even though neither of these would be close to what one could consider a large or even medium-sized database in terms of TPS or database size. In fact, the latter system bloated even though it comfortably fits entirely in memory. -- fdr
Daniel Farina <daniel@heroku.com> writes: > but just today we promoted another system via streaming replication to > pick up the planner fix in 9.1.5 (did you know: that planner bug seems > to make GIN FTS indexes un-used in non-exotic cases, and one goes to > seqscan?), and then a 40MB GIN index bloated to two gigs on a 1.5GB > table over the course of maybe six hours. I think this is probably unrelated to what Josh was griping about: even granting that the system forgot any free space that had been available within the original 40MB, that couldn't in itself lead to eating more than another 40MB, no? My guess is something is broken about the oldest-xmin-horizon mechanism, such that VACUUM is failing to recover space. Can you put together a self-contained test case that exhibits similar growth? regards, tom lane