Обсуждение: Re: [SQL] Duplicate tuples with unique index

Поиск
Список
Период
Сортировка

Re: [SQL] Duplicate tuples with unique index

От
Palle Girgensohn
Дата:
Bruce Momjian wrote:
> 
> > Nope. pg_upgrade was "disabled in this release because the
> > internal blahblahblah disk layout changed from previous versions".
> 
> Sorry, you have to edit the script to reenable it.

That's OK. I didn't bother to check. the dump/restore sequence was pretty quick.

The layout hasn't really changed since 6.5, right?

Cheers
/Palle


Re: [SQL] Duplicate tuples with unique index

От
Tom Lane
Дата:
Palle Girgensohn <girgen@partitur.se> writes:
>>>> Nope. pg_upgrade was "disabled in this release because the
>>>> internal blahblahblah disk layout changed from previous versions".
>> 
>> Sorry, you have to edit the script to reenable it.

> That's OK. I didn't bother to check. the dump/restore sequence was pretty quick.
> The layout hasn't really changed since 6.5, right?

No; the disk layout is the same.  There is room for trouble nonetheless.
There is some doubt about whether pg_upgrade will work right under MVCC
semantics: in MVCC, whether a tuple stored in a table is considered
valid or not will depend on whether its creator transaction is marked
committed in pg_log (and, if it is marked with a deletor transaction,
whether the deletor is NOT committed).

pg_upgrade tries to deal with this by copying the old database
installation's pg_log into the new.  OK, that ensures that all the
user-table tuples preserve their commit state; but what about tuples
in the system tables?  The idea behind pg_upgrade is to paste together
a set of user tables with a set of system tables --- but unless the
transaction number history of the user tables is exactly the same as
the transaction number history of the system tables, there is clearly
a risk that committed tuples will suddenly be considered not-committed
or vice versa.  And ordinarily those histories will *not* be identical.

The only reason pg_upgrade has any chance at all of working is that
for efficiency reasons we don't want to go back and consult pg_log
for every single tuple we read.  So, there are really six states of
a tuple on disk, which may be described as:1. I was created by transaction N, but I dunno if it committed2. I was
createdby a transaction that definitely committed,   so I'm good; no need to look at pg_log3. I was created by a
transactionthat definitely aborted,   so I'm dead; no need to look at pg_log4. I was deleted by transaction N, but I
dunnoif it committed5. I was deleted by a transaction that definitely committed,   so I'm dead; no need to look at
pg_log6.I was deleted by a transaction that definitely aborted,   so I'm still good; no need to look at pg_log
 
(I'm fuzzing over some fine points that arise when multiple transactions
try to delete the same tuple, but this level of detail will do for now.)
Whenever any backend examines a tuple in state 1 or 4, it will consult
pg_log to discover the state of the source transaction.  If the source
transaction has now committed or aborted, the tuple will be rewritten
with the new state (2,3,5,6 as appropriate) so that future readers of
the tuple don't have to look at pg_log again.

Now states 2,3,5,6 do not depend on what pg_log says.  Therefore,
pg_upgrade's copy of an old pg_log file into a new database will
work *if and only if* all the tuples in the system tables are in
one of these states, and none of them are in states 1 or 4.  We
can survive with user-table tuples that are in the uncertain states,
because the associated pg_log info will be copied over with them.
We cannot afford to have any uncertainly-committed system-table
tuples, because we will overwrite the pg_log data about their status.

Bruce thinks that the pg_upgrade script will ensure that the system-
table tuples are all in frozen states (by VACUUMing them).  I don't
trust it worth a dime, myself.  Maybe it will work, but it hasn't been
proven in the field.  So, if you'd like to try it, by all means do so
--- but make a pg_dump backup first!  And let us know whether you have
problems or not!
        regards, tom lane


Re: [SQL] Duplicate tuples with unique index

От
Bruce Momjian
Дата:
> Bruce thinks that the pg_upgrade script will ensure that the system-
> table tuples are all in frozen states (by VACUUMing them).  I don't
> trust it worth a dime, myself.  Maybe it will work, but it hasn't been
> proven in the field.  So, if you'd like to try it, by all means do so
> --- but make a pg_dump backup first!  And let us know whether you have
> problems or not!

I see what you are saying.  If a table created as part of the new
pg_dump schema create matches a transaction that aborted in the old
database, the table would be invalid.  However, as you mentioned, once
the tuple is marked as committed/rolled back, it doesn't consult.

Basically, I was surprised pg_upgrade worked at all in any of the
releases.  It seems too good to be true.  However, I have received very
few problem reports about its use, and it does get used.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026