Обсуждение: Re: [HACKERS] WARM and indirect indexes
On Tue, Jan 10, 2017 at 2:24 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > The big advantage of WARM is that it works automatically, like HOT: the > user doesn't need to do anything different than today to get the > benefit. With indirect indexes, the user needs to create the index as > indirect explicitely. However, this cuts both ways. If the WARM implementation has bugs -- either data-corrupting bugs or crash bugs or returns-wrong-answer bugs or performance-in-corner-cases bugs -- everyone will be exposed to them. The kinds of things that could go wrong here are in my view very similar to the sorts of things that went wrong with multixacts. If indirect indexes turn out to have similar problems, that's sad, but people can decide not to use them and be no worse off than before. (This is exactly why parallel query is now provoking periodic griping rather than howls of agony - it's off by default in 9.6, and even if you turn it on, the costing defaults are set fairly conservatively, and even if it goes completely haywire due to some horrible bug, it's very unlikely to break anything for longer than it takes you to shut it back off. It's possible that WARM bugs would only mess up your indexes and not your table data, which is a lot less bad, but things that touch the on-disk format are near the top of my "this is scary" list.) More broadly, I don't share Bruce's negativity about indirect indexes. My estimate of what needs to be done for them to be really useful is - I think - higher than your estimate of what needs to be done, but I think the concept is great. I also think that some of the concepts - like allowing the tuple pointers to have widths other than 6 byes - could turn out to be a good foundation for global indexes in the future. In fact, it might be considerably easier to make an indirect index span a partitioning hierarchy than it would be to do the same for a regular index. But regardless of that, the feature is good for what it offers today. As to WARM, it scares me a lot. If zero serious data-corrupting bugs survive to final release, or even if no more than one or two do, and if it improves performance in general in the way some of the offered benchmarks indicate, fantastic. But a feature of this type is by its nature prone to eating your data in varied and subtle ways. Caveat committer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 12, 2017 at 3:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jan 10, 2017 at 2:24 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> The big advantage of WARM is that it works automatically, like HOT: the
> user doesn't need to do anything different than today to get the
> benefit. With indirect indexes, the user needs to create the index as
> indirect explicitely.
However, this cuts both ways. If the WARM implementation has bugs --
either data-corrupting bugs or crash bugs or returns-wrong-answer bugs
or performance-in-corner-cases bugs -- everyone will be exposed to
them.
IMHO WARM is way less complicated or intrusive than HOT was. It doesn't change any of the MVCC mechanics or doesn't change when and how tuples are marked dead or when and how dead tuples are removed. What it changes is how tuples are indexed and accessed via index methods. So I believe bugs in this area can possibility corrupt indexes or return wrong results, which is bad but may have happened with many other patches we did in recent past. The other thing the patch changes is how update-chain is maintained. In order to quickly find the root offset while updating a tuple, we now store the root offset in the t_ctid field of the last tuple in the chain and use a separate bit to mark end-of-the-chain (instead of relying of t_ctid = t_self check). That can lead to problems if chains are not maintained or followed correctly. These changes are in the first patch of the patch series and if you've any suggestions on how to improve that or solidify chain following, please let me know. I was looking for some way to hide t_ctid field to ensure that the links are only accessed via some standard API.
I think as a developer of the patch, what I would like to know is what can we do address concerns raised by you? What kind of tests you would like to do to get confidence in the patch? What I've done so far is to rely on the existing tests such as regression, isolation and pgbench. After adding support for system tables, the code gets exercised even more during regression tests, which is good. I also performed a few tests where I would turn sequential scan off and then run "make installcheck" and compare regression diffs between master and patched code. That helps because the index access paths are used even more often. I did not find any bugs in those tests.
My favourite test during HOT development was to run pgbench with large number of clients and periodically check for data consistency while tests are running, by comparing sum(tbalance), sum(bbalance) and sum(abalance) values. I'm yet to do that kind of test with WARM because that would require a slightly different test setup (more indexes and more update statements), but I intend to do those tests too. I have also started writing regression test cases which could lead to some corner cases and share them for inclusion irrespective of WARM.
Please share your thoughts on what more can be and should be done.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 11, 2017 at 11:09 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > I think as a developer of the patch, what I would like to know is what can > we do address concerns raised by you? What kind of tests you would like to > do to get confidence in the patch? Well, off the top of my head, I'd say that there are basically four things that can be done to improve the quality of patches. These things are, of course, not a secret: 1. Detailed manual testing. 2. Stress testing. 3. Regression testing (pg_regress, isolation, TAP) perhaps driven by code coverage reports. 4. Detailed code review by highly-qualified individuals. For a patch of this type, I highly recommend stress testing. Amit and his colleagues have done a huge amount of stress testing of the hash index patches and that's turned up quite a few bugs; sometimes the tests had to run for many hours before any failure was provoked. But my real point here is not that reviewing or testing the WARM patch is any different or more difficult than for any other patch. Rather, the issue is that the stakes are higher. Whatever a committer would do (or ask to have done) for a patch of ordinary consequence should be done ten times over for that one, not because it's more likely to have bugs but because any bugs that it does have will hurt more. There's a time for committing and moving on and a time for extreme paranoia. IMHO, this is the latter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/11/17 8:09 PM, Pavan Deolasee wrote: > The other thing the patch changes is how update-chain is maintained. In > order to quickly find the root offset while updating a tuple, we now > store the root offset in the t_ctid field of the last tuple in the chain > and use a separate bit to mark end-of-the-chain (instead of relying of > t_ctid = t_self check). That can lead to problems if chains are not > maintained or followed correctly. These changes are in the first patch > of the patch series and if you've any suggestions on how to improve that > or solidify chain following, please let me know. I was looking for some > way to hide t_ctid field to ensure that the links are only accessed via > some standard API. AIUI, that's going to affect every method of heap access except for index scans that can skip the heap due to being all-visible. That means the risk here is comparable to the MXID changes. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On Wed, Jan 11, 2017 at 04:38:10PM -0500, Robert Haas wrote: > More broadly, I don't share Bruce's negativity about indirect indexes. > My estimate of what needs to be done for them to be really useful is - > I think - higher than your estimate of what needs to be done, but I > think the concept is great. I also think that some of the concepts - > like allowing the tuple pointers to have widths other than 6 byes - > could turn out to be a good foundation for global indexes in the > future. In fact, it might be considerably easier to make an indirect > index span a partitioning hierarchy than it would be to do the same > for a regular index. But regardless of that, the feature is good for > what it offers today. I am worried that indirect indexes might have such limited usefulness with a well-designed WARM feature that the syntax/feature would be useless for 99% of users. In talking to Alexander Korotkov, he mentioned that indirect indexes could be used for global/cross-partition indexes, and for index-organized tables (heap and index together in a single file). This would greatly expand the usefulness of indirect indexes and would be exciting. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +