Обсуждение: Re: [HACKERS] WARM and indirect indexes

Поиск
Список
Период
Сортировка

Re: [HACKERS] WARM and indirect indexes

От
Robert Haas
Дата:
On Tue, Jan 10, 2017 at 2:24 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> The big advantage of WARM is that it works automatically, like HOT: the
> user doesn't need to do anything different than today to get the
> benefit.  With indirect indexes, the user needs to create the index as
> indirect explicitely.

However, this cuts both ways.  If the WARM implementation has bugs --
either data-corrupting bugs or crash bugs or returns-wrong-answer bugs
or performance-in-corner-cases bugs -- everyone will be exposed to
them.  The kinds of things that could go wrong here are in my view
very similar to the sorts of things that went wrong with multixacts.
If indirect indexes turn out to have similar problems, that's sad, but
people can decide not to use them and be no worse off than before.

(This is exactly why parallel query is now provoking periodic griping
rather than howls of agony - it's off by default in 9.6, and even if
you turn it on, the costing defaults are set fairly conservatively,
and even if it goes completely haywire due to some horrible bug, it's
very unlikely to break anything for longer than it takes you to shut
it back off.  It's possible that WARM bugs would only mess up your
indexes and not your table data, which is a lot less bad, but things
that touch the on-disk format are near the top of my "this is scary"
list.)

More broadly, I don't share Bruce's negativity about indirect indexes.
My estimate of what needs to be done for them to be really useful is -
I think - higher than your estimate of what needs to be done, but I
think the concept is great.  I also think that some of the concepts -
like allowing the tuple pointers to have widths other than 6 byes -
could turn out to be a good foundation for global indexes in the
future.  In fact, it might be considerably easier to make an indirect
index span a partitioning hierarchy than it would be to do the same
for a regular index.  But regardless of that, the feature is good for
what it offers today.

As to WARM, it scares me a lot.  If zero serious data-corrupting bugs
survive to final release, or even if no more than one or two do, and
if it improves performance in general in the way some of the offered
benchmarks indicate, fantastic.  But a feature of this type is by its
nature prone to eating your data in varied and subtle ways.  Caveat
committer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WARM and indirect indexes

От
Pavan Deolasee
Дата:


On Thu, Jan 12, 2017 at 3:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jan 10, 2017 at 2:24 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> The big advantage of WARM is that it works automatically, like HOT: the
> user doesn't need to do anything different than today to get the
> benefit.  With indirect indexes, the user needs to create the index as
> indirect explicitely.

However, this cuts both ways.  If the WARM implementation has bugs --
either data-corrupting bugs or crash bugs or returns-wrong-answer bugs
or performance-in-corner-cases bugs -- everyone will be exposed to
them.  


IMHO WARM is way less complicated or intrusive than HOT was. It doesn't change any of the MVCC mechanics or doesn't change when and how tuples are marked dead or when and how dead tuples are removed. What it changes is how tuples are indexed and accessed via index methods. So I believe bugs in this area can possibility corrupt indexes or return wrong results, which is bad but may have happened with many other patches we did in recent past. The other thing the patch changes is how update-chain is maintained. In order to quickly find the root offset while updating a tuple, we now store the root offset in the t_ctid field of the last tuple in the chain and use a separate bit to mark end-of-the-chain (instead of relying of t_ctid = t_self check). That can lead to problems if chains are not maintained or followed correctly. These changes are in the first patch of the patch series and if you've any suggestions on how to improve that or solidify chain following, please let me know. I was looking for some way to hide t_ctid field to ensure that the links are only accessed via some standard API.

I think as a developer of the patch, what I would like to know is what can we do address concerns raised by you? What kind of tests you would like to do to get confidence in the patch? What I've done so far is to rely on the existing tests such as regression, isolation and pgbench. After adding support for system tables, the code gets exercised even more during regression tests, which is good. I also performed a few tests where I would turn sequential scan off and then run "make installcheck" and compare regression diffs between master and patched code. That helps because the index access paths are used even more often. I did not find any bugs in those tests.

My favourite test during HOT development was to run pgbench with large number of clients and periodically check for data consistency while tests are running, by comparing sum(tbalance), sum(bbalance) and sum(abalance) values. I'm yet to do that kind of test with WARM because that would require a slightly different test setup (more indexes and more update statements), but I intend to do those tests too. I have also started writing regression test cases which could lead to some corner cases and share them for inclusion irrespective of WARM.

Please share your thoughts on what more can be and should be done.

Thanks,
Pavan
--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] WARM and indirect indexes

От
Robert Haas
Дата:
On Wed, Jan 11, 2017 at 11:09 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> I think as a developer of the patch, what I would like to know is what can
> we do address concerns raised by you? What kind of tests you would like to
> do to get confidence in the patch?

Well, off the top of my head, I'd say that there are basically four
things that can be done to improve the quality of patches.  These
things are, of course, not a secret:

1. Detailed manual testing.
2. Stress testing.
3. Regression testing (pg_regress, isolation, TAP) perhaps driven by
code coverage reports.
4. Detailed code review by highly-qualified individuals.

For a patch of this type, I highly recommend stress testing.  Amit and
his colleagues have done a huge amount of stress testing of the hash
index patches and that's turned up quite a few bugs; sometimes the
tests had to run for many hours before any failure was provoked.

But my real point here is not that reviewing or testing the WARM patch
is any different or more difficult than for any other patch.  Rather,
the issue is that the stakes are higher.  Whatever a committer would
do (or ask to have done) for a patch of ordinary consequence should be
done ten times over for that one, not because it's more likely to have
bugs but because any bugs that it does have will hurt more.  There's a
time for committing and moving on and a time for extreme paranoia.
IMHO, this is the latter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WARM and indirect indexes

От
Jim Nasby
Дата:
On 1/11/17 8:09 PM, Pavan Deolasee wrote:
> The other thing the patch changes is how update-chain is maintained. In
> order to quickly find the root offset while updating a tuple, we now
> store the root offset in the t_ctid field of the last tuple in the chain
> and use a separate bit to mark end-of-the-chain (instead of relying of
> t_ctid = t_self check). That can lead to problems if chains are not
> maintained or followed correctly. These changes are in the first patch
> of the patch series and if you've any suggestions on how to improve that
> or solidify chain following, please let me know. I was looking for some
> way to hide t_ctid field to ensure that the links are only accessed via
> some standard API.

AIUI, that's going to affect every method of heap access except for 
index scans that can skip the heap due to being all-visible. That means 
the risk here is comparable to the MXID changes.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] WARM and indirect indexes

От
Bruce Momjian
Дата:
On Wed, Jan 11, 2017 at 04:38:10PM -0500, Robert Haas wrote:
> More broadly, I don't share Bruce's negativity about indirect indexes.
> My estimate of what needs to be done for them to be really useful is -
> I think - higher than your estimate of what needs to be done, but I
> think the concept is great.  I also think that some of the concepts -
> like allowing the tuple pointers to have widths other than 6 byes -
> could turn out to be a good foundation for global indexes in the
> future.  In fact, it might be considerably easier to make an indirect
> index span a partitioning hierarchy than it would be to do the same
> for a regular index.  But regardless of that, the feature is good for
> what it offers today.

I am worried that indirect indexes might have such limited usefulness
with a well-designed WARM feature that the syntax/feature would be
useless for 99% of users.   In talking to Alexander Korotkov, he
mentioned that indirect indexes could be used for global/cross-partition
indexes, and for index-organized tables (heap and index together in a
single file).  This would greatly expand the usefulness of indirect
indexes and would be exciting.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +