Обсуждение: Incorrect number of rows inserted into partitioned table

Поиск
Список
Период
Сортировка

Incorrect number of rows inserted into partitioned table

От
Роман Осипов
Дата:
Good afternoon

When reloading data from a simple table into a partitioned (through inheritance) using a query like: insert into [new_partition_table] select * from [old_table] limit xxxxxxx;

There is an insertion not of the amount specified in limit,, but a little more or less than it.

SQL queries in attachment
Вложения

Re: Incorrect number of rows inserted into partitioned table

От
David Rowley
Дата:
On Tue, 11 Apr 2023 at 08:50, Роман Осипов
<osipovromanvladimirovich@gmail.com> wrote:
> When reloading data from a simple table into a partitioned (through inheritance) using a query like: insert into
[new_partition_table]select * from [old_table] limit xxxxxxx; 
>
> There is an insertion not of the amount specified in limit,, but a little more or less than it.

It would be good to see what your output actually is.  I don't
currently see a way for the number of rows inserted to be *more* than
the LIMIT.  If it's less, then that'll be because of the rule you have
defined to insert into another table instead.

With your example SQLs, you'll probably find that the reported number
of inserted rows matches: select count(*) from ONLY book_inherit_test;

Also, you might want to look at declarative partitioning [1] and use
that instead.

David

[1] https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE



Re: Incorrect number of rows inserted into partitioned table

От
Tomas Vondra
Дата:
On 4/10/23 20:42, Роман Осипов wrote:
> Good afternoon
> 
> When reloading data from a simple table into a partitioned (through
> inheritance) using a query like:*insert into [new_partition_table]
> select * from [old_table] limit xxxxxxx*;
> 
> There is an insertion not of the amount specified in *limit*,, but a
> little more or less than it.
> 

Which version? I did try this on master, and I can't reproduce it - the
count at the end returns the correct value (1M).

The rule affects the insert status, which looks e.g. like this:

  INSERT 0 611532

instead of 1M, but that's expected because of the rule, I think.

FWIW I wonder why you use rules, it's rather tricky to get that right,
which is why the usual recommendation is not to use this if there's an
alternative way to do stuff (which for partitioning there is).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incorrect number of rows inserted into partitioned table

От
Роман Осипов
Дата:
Good afternoon David.

Thanks for the quick response.
I know about declarative partitioning and just use it. But unfortunately that doesn't fix the problem.

In this case, the problem is not how many returns insert with limit. And the fact that the tables always have a different number of rows. For example:

1) truncate table book_inherit_test;

2) insert into book_inherit_test select * from book_test limit 1000000;

3) select count(*) from book_inherit_test;

4) select count(*) from only book_inherit_test;

5) select count(*) from only book_inherit_test_before_1950;

Attempt 1

1) Updated Rows 0

2) Updated Rows 612528

3) count(*) 1001227

4) count(*) 612528

5) count(*) 388699

(4)+(5)=(3)

612528 + 388699 = 1001227 That's right, but expected 1000000

Attempt 2

1) Updated Rows 0

2) Updated Rows 612602

3) count(*) 1001320

4) count(*) 612602

5) count(*) 388718

(4)+(5)=(3)

612602 388718 1001320  That's right, but expected 1000000


вт, 11 апр. 2023 г. в 00:39, Tomas Vondra <tomas.vondra@enterprisedb.com>:
On 4/10/23 20:42, Роман Осипов wrote:
> Good afternoon
>
> When reloading data from a simple table into a partitioned (through
> inheritance) using a query like:*insert into [new_partition_table]
> select * from [old_table] limit xxxxxxx*;
>
> There is an insertion not of the amount specified in *limit*,, but a
> little more or less than it.
>

Which version? I did try this on master, and I can't reproduce it - the
count at the end returns the correct value (1M).

The rule affects the insert status, which looks e.g. like this:

  INSERT 0 611532

instead of 1M, but that's expected because of the rule, I think.

FWIW I wonder why you use rules, it's rather tricky to get that right,
which is why the usual recommendation is not to use this if there's an
alternative way to do stuff (which for partitioning there is).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Incorrect number of rows inserted into partitioned table

От
Tom Lane
Дата:
=?UTF-8?B?0KDQvtC80LDQvSDQntGB0LjQv9C+0LI=?= <osipovromanvladimirovich@gmail.com> writes:
> There is an insertion not of the amount specified in *limit*,, but a little
> more or less than it.

I think you're getting burnt by the fact that a rule is a macro,
combined with the fact that your query is underspecified:

insert into book_inherit_test select * from book_test limit 1000000;

This doesn't constrain *which* 1000000 rows of book_test get inserted.
If we EXPLAIN it we get:

=# explain insert into book_inherit_test select * from book_test limit 1000000;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Insert on book_inherit_test  (cost=0.00..32353.00 rows=0 width=0)
   ->  Subquery Scan on "*SELECT*"  (cost=0.00..32353.00 rows=617176 width=31)
         Filter: ((("*SELECT*".releaseyear >= 1900) AND ("*SELECT*".releaseyear < 1951)) IS NOT TRUE)
         ->  Limit  (cost=0.00..17353.00 rows=1000000 width=31)
               ->  Seq Scan on book_test  (cost=0.00..34706.00 rows=2000000 width=31)

 Insert on book_inherit_test_before_1950  (cost=0.00..32353.00 rows=0 width=0)
   ->  Subquery Scan on "*SELECT*"  (cost=0.00..32353.00 rows=382824 width=31)
         Filter: (("*SELECT*".releaseyear >= 1900) AND ("*SELECT*".releaseyear < 1951))
         ->  Limit  (cost=0.00..17353.00 rows=1000000 width=31)
               ->  Seq Scan on book_test  (cost=0.00..34706.00 rows=2000000 width=31)
(11 rows)

from which we can see that the insertions into book_inherit_test and
those into book_inherit_test_before_1950 are driven off completely
separate scans of book_test.  And that table is big enough that
the synchronize_seqscans feature kicks in, meaning that indeed this
will scan two different million-row subsets of book_test, producing
results fundamentally unlike what you expected.

I get stable results after disabling synchronize_seqscans, but
a more correct way would be to add ORDER BY to fully determine which
rows of book_test are considered.

Advice: don't use a RULE for this sort of thing.  If you really want to,
you can get the same effects more predictably with a trigger.
But as David said, the whole thing looks like a poor reimplementation
of partitioning.

            regards, tom lane



Re: Incorrect number of rows inserted into partitioned table

От
Роман Осипов
Дата:
Good afternoon Thomas.

I am using the PostgreSQL 15.2, compiled by Visual C++ build 1914, 64-bit.

If you answer the question why I use this option.
That answer will be simple. This is just the result of experiments in which I stumbled upon this behavior.

вт, 11 апр. 2023 г. в 00:58, Роман Осипов <osipovromanvladimirovich@gmail.com>:
Good afternoon David.

Thanks for the quick response.
I know about declarative partitioning and just use it. But unfortunately that doesn't fix the problem.

In this case, the problem is not how many returns insert with limit. And the fact that the tables always have a different number of rows. For example:

1) truncate table book_inherit_test;

2) insert into book_inherit_test select * from book_test limit 1000000;

3) select count(*) from book_inherit_test;

4) select count(*) from only book_inherit_test;

5) select count(*) from only book_inherit_test_before_1950;

Attempt 1

1) Updated Rows 0

2) Updated Rows 612528

3) count(*) 1001227

4) count(*) 612528

5) count(*) 388699

(4)+(5)=(3)

612528 + 388699 = 1001227 That's right, but expected 1000000

Attempt 2

1) Updated Rows 0

2) Updated Rows 612602

3) count(*) 1001320

4) count(*) 612602

5) count(*) 388718

(4)+(5)=(3)

612602 388718 1001320  That's right, but expected 1000000


вт, 11 апр. 2023 г. в 00:39, Tomas Vondra <tomas.vondra@enterprisedb.com>:
On 4/10/23 20:42, Роман Осипов wrote:
> Good afternoon
>
> When reloading data from a simple table into a partitioned (through
> inheritance) using a query like:*insert into [new_partition_table]
> select * from [old_table] limit xxxxxxx*;
>
> There is an insertion not of the amount specified in *limit*,, but a
> little more or less than it.
>

Which version? I did try this on master, and I can't reproduce it - the
count at the end returns the correct value (1M).

The rule affects the insert status, which looks e.g. like this:

  INSERT 0 611532

instead of 1M, but that's expected because of the rule, I think.

FWIW I wonder why you use rules, it's rather tricky to get that right,
which is why the usual recommendation is not to use this if there's an
alternative way to do stuff (which for partitioning there is).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company