Allow "snapshot too old" error, to prevent bloat
От | Kevin Grittner |
---|---|
Тема | Allow "snapshot too old" error, to prevent bloat |
Дата | |
Msg-id | 136937748.3364317.1423964815320.JavaMail.yahoo@mail.yahoo.com обсуждение исходный текст |
Ответы |
Re: Allow "snapshot too old" error, to prevent bloat
|
Список | pgsql-hackers |
This patch is related to the "Reduce pinning in btree indexes" patch submitted here: http://www.postgresql.org/message-id/721615179.3351449.1423959585771.JavaMail.yahoo@mail.yahoo.com That describes how they evolved and how they relate; I won't duplicate that here. Unlike the other patch, this one is more at the "proof of concept" phase, because it requires support in the heap and each index AM to work correctly; so far I have only had time to cover the heap and btree indexes. In spite of that, I have thrown the worst test cases I could think of at it (and only succeeded in uncovering a bug which was already out there in production), and it has shown its value in a two-day test test simulating a 300 user load with complex real-world applications (although the only indexes it used were btree indexes). Without the patches the database growth was 39GB per day; with the patches it was 28.5GB per day. (The test does involve more inserts than deletes, so some growth is expected.) At the end of the tests, pgstattuple reported eight times as many dead tuples in the database without the patches. More importantly, without the patches the CPU load started at 60% and showed linear growth to 92% over the course of the first day; with the patches it stayed at a stable 60% throughout the test. What this patch does is add a GUC call old_snapshot_threshold. It defaults to -1, which leaves behavior matching unpatched code. Above that it allows tuples to be vacuumed away after the number of transaction IDs specified by the GUC have been consumed. It also saves the current insertion LSN into every snapshot when it is created. When reading from the heap or any index, if the snapshot is vulnerable to showing incorrect data because the threshold has been crossed since it was generated, reading any page with an LSN past the snapshot LSN causes a "snapshot too old" error to be thrown. Since this is LSN-based, the new logic is not used for any relation which is not WAL-logged. Note that if you don't read data from a page which has been modified after your snapshot was taken, the threshold doesn't matter. All `make installcheck` tests succeed with any setting. With a setting of 0 (the most extreme), `make installcheck-world` sees four isolation tests fail. Those all pass if you raise the setting to 2. The postgres_fdw test needs a setting of 4 to succeed. I would expect most shops would want to tune this to something in the six-digit to eight-digit range. In the tests mentioned above it was set to 150000 (which corresponded to just under 4 minutes of txid consumption) and there were no "snapshot too old" errors, even though some cursors were left open for the entire two-day run. The patch still lacks (as mentioned above) support for index AMs other than btree, and lacks documentation for the new GUC. I'm sure that there are some comments and README files that need adjustment, too. As I said, this is still POC. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: