Обсуждение: Performance improvements of INSERTs to a partitioned table

Поиск

Список

Период

Сортировка

Performance improvements of INSERTs to a partitioned table

От

"Kato, Sho"

Дата:

06 ноября 2018 г., 12:35:38

Hi,

I want to discuss performance improvements of INSERTs to a partitioned table.

When an application inserts records into a table partitioned into thousands, find_all_inheritors() locks all of the partitions.

Updating partition key locks in the same way.

So, Execution time becomes longer as the number of partition increases.

* nparts 8

testdb=# explain analyze insert into test.accounts_history(aid, delta, mtime) values(8, 5000, current_timestamp);

QUERY PLAN

---------------------------------------------------------------------------------------------------------

Insert on accounts_history (cost=0.00..0.02 rows=1 width=20) (actual time=0.281..0.281 rows=0 loops=1)

-> Result (cost=0.00..0.02 rows=1 width=20) (actual time=0.079..0.080 rows=1 loops=1)

Planning Time: 0.080 ms

Execution Time: 0.362 ms

(4 rows)

* nparts 8192

testdb=# explain analyze insert into test.accounts_history(aid, delta, mtime) values(8192, 5000, current_timestamp);

QUERY PLAN

---------------------------------------------------------------------------------------------------------

Insert on accounts_history (cost=0.00..0.02 rows=1 width=20) (actual time=0.058..0.059 rows=0 loops=1)

-> Result (cost=0.00..0.02 rows=1 width=20) (actual time=0.032..0.034 rows=1 loops=1)

Planning Time: 0.032 ms

Execution Time: 12.508 ms

(4 rows)

Locking only the target partitions like the patch previously proposed by David[1], the performance will improve greatly.

* nparts 8192 (patched)

testdb=# explain analyze insert into test.accounts_history(aid, delta, mtime) values(8192, 5000, current_timestamp);

QUERY PLAN

---------------------------------------------------------------------------------------------------------

Insert on accounts_history (cost=0.00..0.02 rows=1 width=20) (actual time=0.415..0.416 rows=0 loops=1)

-> Result (cost=0.00..0.02 rows=1 width=20) (actual time=0.140..0.141 rows=1 loops=1)

Planning Time: 0.120 ms

Execution Time: 1.694 ms

(4 rows)

However, I am concerned that "unsafe" is included in the name of this patch.

If locking only target partitions and locking all of partitions are executed at the same time, a problem may occurs.

But, I'm not sure what kind of problem will occur.

Is it enough to lock only the target partitions?

If a problem occurs in above case, I think it is safer to divide the steps to acquire the lock into two.

In first step, locking only the parent table in share or exclusive mode.

In second step, locking only the target partitions after locking the parent table.

Thoughts?

[1]: https://www.postgresql.org/message-id/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com

regards,

Sho Kato

RE: Performance improvements of INSERTs to a partitioned table

От

"Kato, Sho"

Дата:

07 ноября 2018 г., 11:31:29

AFAIK, When CREATE INDEX on a partition and INSERT to a parent table are executed at the same time, this patch causes deadlock.

* partitions information

Partition key: RANGE (a)

Partitions: a_1 FOR VALUES FROM (1) TO (100),

a_2 FOR VALUES FROM (100) TO (200)

T1: create index a_1_ix on a_1(a);

T2: insert into a values(101),(1); locking a_2 and waiting releasing a_1’s lock

T1: create index a_2_ix on a_2(a); ERROR: deadlock detected |

I think this situation does not mean unsafe because similar situation will occurs at no partitioned tables and DBMS could not prevent this situation.

But, I'm not sure this behavior is correct.

Does similar deadlock occur in other DBMS like Oracle?

From: Kato, Sho [mailto:kato-sho@jp.fujitsu.com]
Sent: Tuesday, November 6, 2018 6:36 PM
To: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Performance improvements of INSERTs to a partitioned table

Hi,

I want to discuss performance improvements of INSERTs to a partitioned table.

When an application inserts records into a table partitioned into thousands, find_all_inheritors() locks all of the partitions.

Updating partition key locks in the same way.

So, Execution time becomes longer as the number of partition increases.

* nparts 8

testdb=# explain analyze insert into test.accounts_history(aid, delta, mtime) values(8, 5000, current_timestamp);

QUERY PLAN

---------------------------------------------------------------------------------------------------------

Insert on accounts_history (cost=0.00..0.02 rows=1 width=20) (actual time=0.281..0.281 rows=0 loops=1)

-> Result (cost=0.00..0.02 rows=1 width=20) (actual time=0.079..0.080 rows=1 loops=1)

Planning Time: 0.080 ms

Execution Time: 0.362 ms

(4 rows)

* nparts 8192

testdb=# explain analyze insert into test.accounts_history(aid, delta, mtime) values(8192, 5000, current_timestamp);

QUERY PLAN

---------------------------------------------------------------------------------------------------------

Insert on accounts_history (cost=0.00..0.02 rows=1 width=20) (actual time=0.058..0.059 rows=0 loops=1)

-> Result (cost=0.00..0.02 rows=1 width=20) (actual time=0.032..0.034 rows=1 loops=1)

Planning Time: 0.032 ms

Execution Time: 12.508 ms

(4 rows)

Locking only the target partitions like the patch previously proposed by David[1], the performance will improve greatly.

* nparts 8192 (patched)

testdb=# explain analyze insert into test.accounts_history(aid, delta, mtime) values(8192, 5000, current_timestamp);

QUERY PLAN

---------------------------------------------------------------------------------------------------------

Insert on accounts_history (cost=0.00..0.02 rows=1 width=20) (actual time=0.415..0.416 rows=0 loops=1)

-> Result (cost=0.00..0.02 rows=1 width=20) (actual time=0.140..0.141 rows=1 loops=1)

Planning Time: 0.120 ms

Execution Time: 1.694 ms

(4 rows)

However, I am concerned that "unsafe" is included in the name of this patch.

If locking only target partitions and locking all of partitions are executed at the same time, a problem may occurs.

But, I'm not sure what kind of problem will occur.

Is it enough to lock only the target partitions?

If a problem occurs in above case, I think it is safer to divide the steps to acquire the lock into two.

In first step, locking only the parent table in share or exclusive mode.

In second step, locking only the target partitions after locking the parent table.

Thoughts?

[1]: https://www.postgresql.org/message-id/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com

regards,

Sho Kato

Re: Performance improvements of INSERTs to a partitioned table

От

David Rowley

Дата:

09 ноября 2018 г., 05:19:38

On 7 November 2018 at 21:31, Kato, Sho <kato-sho@jp.fujitsu.com> wrote:
> AFAIK, When CREATE INDEX on a partition and INSERT to a parent table are
> executed at the same time, this patch causes deadlock.
>
> * partitions information
>
> Partition key: RANGE (a)
> Partitions: a_1 FOR VALUES FROM (1) TO (100),
>             a_2 FOR VALUES FROM (100) TO (200)
>
> T1: create index a_1_ix on a_1(a);
> T2: insert into a values(101),(1);  locking a_2 and waiting releasing a_1’s
> lock
> T1: create index a_2_ix on a_2(a); ERROR:  deadlock detected
>
> I think this situation does not mean unsafe because similar situation will
> occurs at no partitioned tables and DBMS could not prevent this situation.
>
> But, I'm not sure this behavior is correct.

That's one case that will deadlock with the 0002 patch in [1]. Another
command that takes a conflicting lock on the partition only is
TRUNCATE. Many other commands that normally take an AEL can't be run
on a partition, for example, ALTER TABLE ADD COLUMN.

The same would have deadlocked on master if you'd created the indexes
in the reverse order, but I guess the point is that today there's some
defined order that you can create the indexes in that won't cause a
deadlock, but with [1] there is no defined order since the tables are
locked in order that tuples are routed to them.

Robert pointed out something interesting in [2] about UPDATEs of a
partition key perhaps routing a tuple to a partition that's been
excluded by constraint exclusion, so the lock is not taken initially,
but would be taken later in ExecSetupPartitionTupleRouting(). I ran
through that scenario today and discovered it can't happen as excluded
partitions are still in the rangetable, so are still locked either in
the planner or by AcquireExecutorLocks() and the order those are
locked in is well defined from the planner. However, I believe that's
something Amit plans to change in [3], where he proposes to only lock
partitions that survive partition pruning (among many other things).

There are probably a few things we could do here:

a. Have all DDL on partitions obtain the same lock level on all
ancestor partitioned tables taking a lock on the root first and
working down to the leaf.
b. Just lock what we touch and increase the likelihood of deadlocks occurring.
c. Reject [1] and [3] because we don't want to increase the chances of
deadlocks.

I'm highly against doing 'c' since both [1] and [3] are very useful
patches which scale partitioning to work well with very large numbers
of partitions. At the moment we just can bearly do half a dozen
partitions before the performance starts going south. 'a' is not that
ideal a solution either as it means that anyone doing CREATE INDEX or
TRUNCATE on a partition would conflict with queries that are querying
an unrelated part of the partitioned table.  While I don't really like
'b' either, I wonder how many people are going to hit deadlocks here.
To hit that, I believe that you'd have to be doing the TRUNCATE or
CREATE INDEX inside a transaction and to hit it where you couldn't hit
it before you'd have had to carefully written your CREATE INDEX /
TRUNCATE statements to ensure they're executed in partition order. I'm
unsure how likely that is, but I certainly can't rule out that doing
'b' won't upset anyone.

Perhaps a GUC is needed to choose between 'a' and 'b'?

I'm adding Amit to this email because [3] is his and Robert because
he's recently been talking about partition locking too.

[1] https://www.postgresql.org/message-id/CAKJS1f-vYAHqqaU878Q4cdZYHwPcv1J_C-mG%3DCs2UwhsD6cqwg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BTgmoZGJsy-nRFnzurhZQJtHdDh5fzJKvbvhS0byN6_46pB9Q%40mail.gmail.com
[3] https://commitfest.postgresql.org/20/1778/

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance improvements of INSERTs to a partitioned table

От

Amit Langote

Дата:

09 ноября 2018 г., 08:45:18

On 2018/11/09 11:19, David Rowley wrote:
> On 7 November 2018 at 21:31, Kato, Sho <kato-sho@jp.fujitsu.com> wrote:
>> AFAIK, When CREATE INDEX on a partition and INSERT to a parent table are
>> executed at the same time, this patch causes deadlock.
>>
>> * partitions information
>>
>> Partition key: RANGE (a)
>> Partitions: a_1 FOR VALUES FROM (1) TO (100),
>>             a_2 FOR VALUES FROM (100) TO (200)
>>
>> T1: create index a_1_ix on a_1(a);
>> T2: insert into a values(101),(1);  locking a_2 and waiting releasing a_1’s
>> lock
>> T1: create index a_2_ix on a_2(a); ERROR:  deadlock detected
>>
>> I think this situation does not mean unsafe because similar situation will
>> occurs at no partitioned tables and DBMS could not prevent this situation.
>>
>> But, I'm not sure this behavior is correct.
> 
> That's one case that will deadlock with the 0002 patch in [1]. Another
> command that takes a conflicting lock on the partition only is
> TRUNCATE. Many other commands that normally take an AEL can't be run
> on a partition, for example, ALTER TABLE ADD COLUMN.
>
> The same would have deadlocked on master if you'd created the indexes
> in the reverse order, but I guess the point is that today there's some
> defined order that you can create the indexes in that won't cause a
> deadlock, but with [1] there is no defined order since the tables are
> locked in order that tuples are routed to them.

We cannot control the order in which a user performs operations directly
on partitions.  What we do have to worry about is the order in which
partitions are locked by a particular piece of backend code when a certain
operation is applied to the parent table.  However, if the user had
applied the operation to the parent table instead, then it would've
blocked for any concurrent queries that already had a lock on the parent,
or vice versa, because obviously the table named in the command is locked
before any of its children.

> Robert pointed out something interesting in [2] about UPDATEs of a
> partition key perhaps routing a tuple to a partition that's been
> excluded by constraint exclusion, so the lock is not taken initially> but would be taken later in
ExecSetupPartitionTupleRouting().I ran

> through that scenario today and discovered it can't happen as excluded
> partitions are still in the rangetable, so are still locked either in
> the planner or by AcquireExecutorLocks() and the order those are
> locked in is well defined from the planner.

Yes, expand_inherited_rtentry() would lock *all* partitions even if some
of them might not end up in the final plan due to some being excluded.

> However, I believe that's
> something Amit plans to change in [3], where he proposes to only lock
> partitions that survive partition pruning (among many other things).

With my patch, while the planner would take the locks in deterministic
order on the partitions that it adds to the plan, the executor time
locking order is non-deterministic considering that tuple routing may
encounter multiple, not yet seen, partitions.

> There are probably a few things we could do here:
> 
> a. Have all DDL on partitions obtain the same lock level on all
> ancestor partitioned tables taking a lock on the root first and
> working down to the leaf.
> b. Just lock what we touch and increase the likelihood of deadlocks occurring.
> c. Reject [1] and [3] because we don't want to increase the chances of
> deadlocks.
>
> I'm highly against doing 'c' since both [1] and [3] are very useful
> patches which scale partitioning to work well with very large numbers
> of partitions. At the moment we just can bearly do half a dozen
> partitions before the performance starts going south.

Agreed.

> 'a' is not that
> ideal a solution either as it means that anyone doing CREATE INDEX or
> TRUNCATE on a partition would conflict with queries that are querying
> an unrelated part of the partitioned table.

As you pointed out, many of the commands that require locks that would
conflict with concurrent queries are not allowed to run directly on
partitions anyway.  For operations that *are* allowed, doing 'a' is same
as asking the user to perform the operation on the root parent instead, at
least as far as the concurrency is concerned (user may not want to
*truncate* all partitions, only some.)

> While I don't really like
> 'b' either, I wonder how many people are going to hit deadlocks here.
> To hit that, I believe that you'd have to be doing the TRUNCATE or
> CREATE INDEX inside a transaction and to hit it where you couldn't hit
> it before you'd have had to carefully written your CREATE INDEX /
> TRUNCATE statements to ensure they're executed in partition order. I'm
> unsure how likely that is, but I certainly can't rule out that doing
> 'b' won't upset anyone.

As long as queries involve tuple routing that touches multiple not yet
seen partitions, someone doing conflicting operations directly on multiple
partitions in a transaction will have to be ready to see deadlocks.
Maybe, we can document that.

> Perhaps a GUC is needed to choose between 'a' and 'b'?

I guess we'll have a hard time explaining such configuration option to
users though.

Thanks,
Amit

RE: Performance improvements of INSERTs to a partitioned table

От

"Kato, Sho"

Дата:

09 ноября 2018 г., 12:36:41

Thanks David.

> I'm highly against doing 'c' since both [1] and [3] are very useful patches which scale partitioning to work well
withvery large numbers of partitions.
 

Agreed.

>'a' is not that ideal a solution either as it means that anyone doing CREATE INDEX or TRUNCATE on a partition would
conflictwith queries that are querying an unrelated part of the partitioned table. 
 

If only deadlock is a problem, I think intention lock is needed.
All SQL on a partitioned tables take a intention exclusive or shared lock on the root first and take a conventional
lockon target leaf next.
 

>  While I don't really like 'b' either, I wonder how many people are going to hit deadlocks here.
>To hit that, I believe that you'd have to be doing the TRUNCATE or CREATE INDEX inside a transaction and to hit it
whereyou couldn't hit it before you'd have had to carefully written your CREATE INDEX / TRUNCATE statements to ensure
they'reexecuted in partition order.
 

In case of CREATE INDEX on the leaf, users have to ensure to statements executed in partition order since similar case
couldoccur in no partitioned table.
 
But, DBMS have to prevent deadlock occurred by CREATE INDEX/TRUNCATE on the root.

>Perhaps a GUC is needed to choose between 'a' and 'b'?

I agree with Amit. It is difficult to choose for users.

regards,
Sho Kato
> -----Original Message-----
> From: David Rowley [mailto:david.rowley@2ndquadrant.com]
> Sent: Friday, November 9, 2018 11:20 AM
> To: Kato, Sho/加藤 翔 <kato-sho@jp.fujitsu.com>; Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp>; Robert Haas <robertmhaas@gmail.com>
> Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
> Subject: Re: Performance improvements of INSERTs to a partitioned table
> 
> On 7 November 2018 at 21:31, Kato, Sho <kato-sho@jp.fujitsu.com> wrote:
> > AFAIK, When CREATE INDEX on a partition and INSERT to a parent table
> > are executed at the same time, this patch causes deadlock.
> >
> > * partitions information
> >
> > Partition key: RANGE (a)
> > Partitions: a_1 FOR VALUES FROM (1) TO (100),
> >             a_2 FOR VALUES FROM (100) TO (200)
> >
> > T1: create index a_1_ix on a_1(a);
> > T2: insert into a values(101),(1);  locking a_2 and waiting releasing
> > a_1’s lock
> > T1: create index a_2_ix on a_2(a); ERROR:  deadlock detected
> >
> > I think this situation does not mean unsafe because similar situation
> > will occurs at no partitioned tables and DBMS could not prevent this
> situation.
> >
> > But, I'm not sure this behavior is correct.
> 
> That's one case that will deadlock with the 0002 patch in [1]. Another
> command that takes a conflicting lock on the partition only is TRUNCATE.
> Many other commands that normally take an AEL can't be run on a partition,
> for example, ALTER TABLE ADD COLUMN.
> 
> The same would have deadlocked on master if you'd created the indexes
> in the reverse order, but I guess the point is that today there's some
> defined order that you can create the indexes in that won't cause a
> deadlock, but with [1] there is no defined order since the tables are
> locked in order that tuples are routed to them.
> 
> Robert pointed out something interesting in [2] about UPDATEs of a
> partition key perhaps routing a tuple to a partition that's been excluded
> by constraint exclusion, so the lock is not taken initially, but would
> be taken later in ExecSetupPartitionTupleRouting(). I ran through that
> scenario today and discovered it can't happen as excluded partitions are
> still in the rangetable, so are still locked either in the planner or
> by AcquireExecutorLocks() and the order those are locked in is well
> defined from the planner. However, I believe that's something Amit plans
> to change in [3], where he proposes to only lock partitions that survive
> partition pruning (among many other things).
> 
> There are probably a few things we could do here:
> 
> a. Have all DDL on partitions obtain the same lock level on all ancestor
> partitioned tables taking a lock on the root first and working down to
> the leaf.
> b. Just lock what we touch and increase the likelihood of deadlocks
> occurring.
> c. Reject [1] and [3] because we don't want to increase the chances of
> deadlocks.
> 
> I'm highly against doing 'c' since both [1] and [3] are very useful patches
> which scale partitioning to work well with very large numbers of
> partitions. At the moment we just can bearly do half a dozen partitions
> before the performance starts going south. 'a' is not that ideal a solution
> either as it means that anyone doing CREATE INDEX or TRUNCATE on a
> partition would conflict with queries that are querying an unrelated part
> of the partitioned table.  While I don't really like 'b' either, I wonder
> how many people are going to hit deadlocks here.
> To hit that, I believe that you'd have to be doing the TRUNCATE or CREATE
> INDEX inside a transaction and to hit it where you couldn't hit it before
> you'd have had to carefully written your CREATE INDEX / TRUNCATE
> statements to ensure they're executed in partition order. I'm unsure how
> likely that is, but I certainly can't rule out that doing 'b' won't upset
> anyone.
> 
> Perhaps a GUC is needed to choose between 'a' and 'b'?
> 
> I'm adding Amit to this email because [3] is his and Robert because he's
> recently been talking about partition locking too.
> 
> [1]
> https://www.postgresql.org/message-id/CAKJS1f-vYAHqqaU878Q4cdZYHwPcv
> 1J_C-mG%3DCs2UwhsD6cqwg%40mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/CA%2BTgmoZGJsy-nRFnzurhZQJtHdD
> h5fzJKvbvhS0byN6_46pB9Q%40mail.gmail.com
> [3] https://commitfest.postgresql.org/20/1778/
> 
> --
>  David Rowley                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: Performance improvements of INSERTs to a partitioned table

От

David Rowley

Дата:

10 ноября 2018 г., 12:29:09

On 9 November 2018 at 18:45, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> As long as queries involve tuple routing that touches multiple not yet
> seen partitions, someone doing conflicting operations directly on multiple
> partitions in a transaction will have to be ready to see deadlocks.
> Maybe, we can document that.

Perhaps it's good enough. A guess there's at least a workaround of
doing LOCK TABLE <root_partitioned_table> in the CREATE INDEX /
TRUNCATE session.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Performance improvements of INSERTs to a partitioned table

Performance improvements of INSERTs to a partitioned table

RE: Performance improvements of INSERTs to a partitioned table

Re: Performance improvements of INSERTs to a partitioned table

Re: Performance improvements of INSERTs to a partitioned table

RE: Performance improvements of INSERTs to a partitioned table

Re: Performance improvements of INSERTs to a partitioned table