Обсуждение: Transaction ISOLATION LEVEL - have I missed something?

Поиск
Список
Период
Сортировка

Transaction ISOLATION LEVEL - have I missed something?

От
Pól Ua Laoínecháin
Дата:
Hi all, not sure if this is for novice or general - however...

Fiddle here:

> https://dbfiddle.uk/?rdbms=postgres_12&fiddle=49de306c75e85ce4ba900719e2545c32

> SHOW TRANSACTION ISOLATION LEVEL;

Result:

> transaction_isolation
> read committed

So far, so good, then

> CREATE TABLE x (y INT, z VARCHAR (25));
> INSERT INTO x VALUES (1, 'One'), (2, 'Two'), (3, 'Three');

and now for a transaction:

> START TRANSACTION;

> SELECT CURRENT_TIMESTAMP(6);

> SELECT * FROM x;

> SELECT CURRENT_TIMESTAMP(6);

> COMMIT;

Result:

> current_timestamp 2021-05-13 15:51:11.194766+01

> y  z
> 1  One
> 2  Two
> 3  Three

> current_timestamp 2021-05-13 15:51:11.194766+01

Now, my understanding of ISOLATION LEVEL = read-committed is that the
current_timestamp should change going through the transaction and that
it should not change when the level is serializable.

Am I wrong about this?

I did a similar experiment with MariaDB and none of the times are the same:

> https://dbfiddle.uk/?rdbms=mariadb_10.5&fiddle=ad2d3b2f360c3dbe59fa00f623a3db6b

Have I misunderstood transactions and isolation levels...

URLs, references and pointers to same appreciated.

TIA and rgs,

Pól...



Re: Transaction ISOLATION LEVEL - have I missed something?

От
"David G. Johnston"
Дата:
On Thursday, May 13, 2021, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
Am I wrong about this?

Transaction isolation levels are immaterial here.  The functions define what time they return:


David J.

Re: Transaction ISOLATION LEVEL - have I missed something?

От
Pól Ua Laoínecháin
Дата:
Hi David, and thanks for getting back to me.

>> Am I wrong about this?
> Transaction isolation levels are immaterial here.  The functions define what time they return:

Ah, yes, it now makes perfect sense...

All statements in a transaction which is READ COMMITTED sees the data
as of CLOCK_TIMESTAMP() and all statements in a transaction that is
SERIALIZABLE see the data as of TRANSACTION_TIMESTAMP().

At least, that's how I understand it now - my error was not
double-checking my function definitions.

Thanks again and rgs,


Pól Ua Laoínecháin...


> David J.



Re: Transaction ISOLATION LEVEL - have I missed something?

От
"David G. Johnston"
Дата:
On Thursday, May 13, 2021, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
Hi David, and thanks for getting back to me.

>> Am I wrong about this?
> Transaction isolation levels are immaterial here.  The functions define what time they return:

Ah, yes, it now makes perfect sense...

All statements in a transaction which is READ COMMITTED sees the data
as of CLOCK_TIMESTAMP() and all statements in a transaction that is
SERIALIZABLE see the data as of TRANSACTION_TIMESTAMP().

At least, that's how I understand it now - my error was not
double-checking my function definitions.

No.  The isolation level is immaterial to which times are returned.  I welcome being proven wrong but nothing I’ve read in that link suggests otherwise so I’m disinclined to prove it correct.

David J.
 

Re: Transaction ISOLATION LEVEL - have I missed something?

От
Pól Ua Laoínecháin
Дата:
Hi again,

>> All statements in a transaction which is READ COMMITTED sees the data
>> as of CLOCK_TIMESTAMP() and all statements in a transaction that is
>> SERIALIZABLE see the data as of TRANSACTION_TIMESTAMP().

> No.  The isolation level is immaterial to which times are returned.  I welcome being proven wrong but nothing I’ve
readin that link suggests otherwise so I’m disinclined to prove it correct. 

(maybe I expressed myself poorly previously)


I didn't say that the times returned would vary depending on the
isolation level - I don't see why they would. Or, at least, that's not
what I meant to say!

Take this scenario.

START TRANSACTION ; -- either TA (READ COMMITTED - RC) or TB (SERIALIZABLE - S)

SELECT TRANSACTION_TIMESTAMP AS tt_1, CLOCK_TIMESTAMP AS ct_1;  --
tt_1 and ct_1 are (almost) identical, give or take a few microseconds
- i.e. the transaction start time

SELECT * FROM x;   -- s1 @t1

-- TA and TB perform some long-running DW query.. the actual length of
time isn't important - what's important is what happens in the
meantime - could be < 0.1s...
--
-- In the meantime, x gets modified by a totally different transaction
by a totally different user - user_x...
--

SELECT TRANSACTION_TIMESTAMP AS tt_2, CLOCK_TIMESTAMP AS ct_2;  --
now, there is a big difference between tt_2 and ct_2 for both TA and
TB. However, tt_1 and tt_2 are identical, again for both transactions
- fixed at the start of either TA or TB.

SELECT * FROM x;  -- s2 @t2
COMMIT;

For TA (RC), the result of s1 will be the state of x at time = t1 (~ =
tt_1 and ~ = ct_1)  the result of s2 will be the state of x at time =
ct_2 and there will be a difference due to the user_x's modfications.

For TB (S), the result of s1 @t1 will be the same as the result of s2
@t2 because of SERIALIZE-ability.


I hope that clears up what I meant - and also that what I have written
above demonstrates a reasonable understanding of the difference
between the potential outcomes of an RC transaction and an S one?

Thanks again for your input - I think that putting all of that down on
paper (well, screen...) has helped crystallise the concepts for me
(any corrections, addenda... appreciated - any references to useful
URLs in this respect likewise).


Pól...


> David J.



Re: Transaction ISOLATION LEVEL - have I missed something?

От
"David G. Johnston"
Дата:
On Thu, May 13, 2021 at 1:31 PM Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
I hope that clears up what I meant - and also that what I have written
above demonstrates a reasonable understanding of the difference
between the potential outcomes of an RC transaction and an S one?


It does.  I see now you are trying to describe snapshots of data available to statements in terms of database timestamps as opposed to referring to transactional boundaries.

I've never seen isolation levels discussed in those terms before and do not know whether it is accurate or missing some important nuances.  With locks, snapshots, and valid interleaving versus invalid interleaving (for serializable) all coming into play the notion of timestamp isn't really in scope.  But if it helps for basic comprehension then it does have merit and indeed the links you describe seem correct at first glance.

David J.

Re: Transaction ISOLATION LEVEL - have I missed something?

От
Pól Ua Laoínecháin
Дата:
On Thu, 13 May 2021 at 22:02, David G. Johnston
<david.g.johnston@gmail.com> wrote:


> It does.  I see now you are trying to describe snapshots of data available to statements in terms of database
timestampsas opposed to referring to transactional boundaries. 

> I've never seen isolation levels discussed in those terms before and do not know whether it is accurate or missing
someimportant nuances.  With locks, snapshots, and valid interleaving versus invalid interleaving (for serializable)
allcoming into play the notion of timestamp isn't really in scope.  But if it helps for basic comprehension then it
doeshave merit and indeed the links you describe seem correct at first glance. 



I was explaining to somebody who doesn't have your (or even my basic)
grasp of isolation levels by using TRANSACTION_TIMESTAMP and
CLOCK_TIMESTAMP as proxies for transaction ids...

Below hopefully is the discussion reframed in terms of transaction ids
- hopefully it's correct and demonstrates a complete understanding of
the topic in terms of PostgreSQL's implementation of the TRANSACTION
ISOLATION paradigm and the distinction between READ COMMITTED (RC) and
SERIALIZABLE (S).

I have put in material below about txid_current() and how RC
statements can see the results of transactions where the transaction
id is greater than txid_current(), but that when the transaction is S,
it can only see the results of committed transactions < txid_current()
and any changes made inside txid_current() itself...

I found this helpful in understanding what I was seeing in the various
pseudo-colum fields

https://www.cybertec-postgresql.com/en/is-update-the-same-as-delete-insert-in-postgresql
(I've seen this guy on the general list...)

I'll have to read that article in-depth to fully grasp all of the
subtleties - but it was good.

So, for a translation of a time-based explanation of the difference
between RC and S to a more PostgreSQL based tables, fields and tx ids
explanation - read on!


From a session - 1, I do this:

===============================================

Session 1 -> START TRANSACTION;
START TRANSACTION
Session 1 -> SHOW TRANSACTION ISOLATION LEVEL;
 transaction_isolation
-----------------------
 read committed
(1 row)


Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |   y
--------------+-------------+------+------+------+------+-------+---+-------
          578 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          578 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          578 | in progress |  552 |    0 |    0 |    0 | (0,3) | 3 | Three
(3 rows)

================================================

So, I've started the READ COMMITTED transaction (txn) no. 578 - and in
table t, there are three records (tuples) (1, 'One'), (2, 'Two') & (3,
'Three').

Txn 527 inserted the first tuple, txn 528 inserted the second one and
the third txn was inserted by txn 529, but I updated it, so
essentially, it was last modified by txn 552 as we learned from the
cybertec article referenced above.


So, now in session 2, I update the table by changing 'Three' to 'Trí'
in tuple 3.

===================================================

Session 2 -> UPDATE t SET y = 'Trí' WHERE x = 3;
UPDATE 1

====================================================

So, that's automatically committed - but, just to check, again from
session 2, I do:

========================================================

Session 2 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |  y
--------------+-------------+------+------+------+------+-------+---+-----
          580 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          580 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          580 | in progress |  579 |    0 |    0 |    0 | (0,4) | 3 | Trí
(3 rows)
===========================================================

So, the txid_current has been incremented - to 579 by the UPDATE and
then by the SELECT here in session 2 to 580.

Still in session 2, I run this query:

======================================================

SELECT
  pid,
  age(clock_timestamp(), query_start),
  usename,
  substring(query FROM 70 FOR 25),
  state,
  backend_xid
FROM pg_stat_activity
WHERE datname = 'test' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;

and we get

  pid  |       age       | usename  |         substring         |
  state        | backend_xid
-------+-----------------+----------+---------------------------+---------------------+-------------
 10468 | 00:36:57.592779 | postgres |  cmax, ctid, x, y FROM t; | idle
in transaction |         578
(1 row)

=========================================================

Now, we see that our txn 578 (which is now > 36 mins old) is still
sitting there in an idle state.

So, everything's rosy in the garden... now, back to session 1... I
rerun my SELECT within txn 578

===========================================================

Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;

 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |  y
--------------+-------------+------+------+------+------+-------+---+-----
          578 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          578 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          578 | in progress |  579 |    0 |    0 |    0 | (0,4) | 3 | Trí
(3 rows)

==========================================================

So, now the tuple where x = 3 has a value of 'Trí' for y.

This is the crux of the TRANSACTION ISOLATION LEVEL being RC -  txn
578 **can see** the result of txn 579 (from session 2)!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

We change the TRANSACTION ISOLATION LEVEL to S in session 1.

I start by issuing the command:

SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE;

and then start a txn and SELECT from t as follows:

=======================================================

Session 1 -> START TRANSACTION;
START TRANSACTION
Session 1 -> SHOW TRANSACTION ISOLATION LEVEL;
 transaction_isolation
-----------------------
 serializable
(1 row)


Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |  y
--------------+-------------+------+------+------+------+-------+---+-----
          582 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          582 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          582 | in progress |  579 |    0 |    0 |    0 | (0,4) | 3 | Trí
(3 rows)

======================================================================

Txn 582 (S) has been started and has tuple 3 as (3, 'Trí').

Back to session 2 and I issue the pg_stats_activity command

=============================================================

SELECT
  pid,
  age(clock_timestamp(), query_start),
  usename,
  substring(query FROM 70 FOR 25),
  state,
  backend_xid
FROM pg_stat_activity
WHERE datname = 'test' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;

which shows

  pid  |       age       | usename  |         substring         |
  state        | backend_xid
-------+-----------------+----------+---------------------------+---------------------+-------------
 10468 | 00:01:09.705916 | postgres |  cmax, ctid, x, y FROM t; | idle
in transaction |         582
(1 row)

====================================================================================

So, we can see that our session 1 transaction (582, S) is happily
sitting there idling away...

Again from session 2, I UPDATE the (3, 'Trí') tuple back to (3,
'Three') - no START TRANSACTION - the session 2 txn and the COMMIT;
are implicit.

================================================================================

Session 2 -> UPDATE t SET y = 'Three' WHERE x = 3;
UPDATE 1

then check my UPDATE from within session 2

Session 2 ->  SELECT txid_current(), txid_status(txid_current()),
xmin, xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |   y
--------------+-------------+------+------+------+------+-------+---+-------
          584 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          584 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          584 | in progress |  583 |    0 |    0 |    0 | (0,5) | 3 | Three
(3 rows)

=================================================================================

Within session 2, I can now see my changed data - (3, 'Trí') has
reverted to (3, 'Three').

I go back to session 1 and reissue the command above

================================================================================

Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |  y
--------------+-------------+------+------+------+------+-------+---+-----
          582 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          582 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          582 | in progress |  579 |  583 |    0 |    0 | (0,4) | 3 | Trí
(3 rows)

================================================================================

So, despite the fact that the S txn 582 is > 2 mins old and has been
modified by the UPDATE in session 2, the data has remained unchanged
from the perspective of the 582 S txn in session 1

Except for xmax which has incremented to 583, the tx_id of the UPDATE
txn from session 2 - serves as notice that the tuple has been updated
by a later txn!

This is the distinction between READ COMMITTED and SERIALIZABLE.

Now, I appreciate that there are performance/resource issues involved
in escalating from RC to S levels - but that's a different day's work.

I think that the first important thing to realise (at least it took me
a while) is that transaction isolation levels only apply to
multi-statement transactions - a single SQL statement txn only has one
view of the database at the instant it is run and it will see all
committed data at that point - txn isolation doesn't apply (I'm
ignoring the possibility of dirty-reads which are the spawn of
Satan...).

Again, thanks for having made me think about my explanation and about
the topic generally - I've heard it said and seen it written that you
can't say that you understand something unless you can explain it
clearly - apparently the Einstein quote is misattributed - I like
Rutherford's one though:

> https://www.linkedin.com/pulse/albert-einsteins-birthday-ode-misattributions-peter-smirniotopoulos

I realise that there are potential added complications if session 1
decides to update table t, tuple (3, 'Trí') - not sure what happens
then? Watch this space...


Hopefully I have explained it clearly...

Rgs,


Pól...


> David J.