Обсуждение: Re: speedup COPY TO for partitioned table.

Поиск

Список

Период

Сортировка

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

27 января, 04:47:18

On Wed, Jan 22, 2025 at 6:54 AM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
>
> Hi Jian,
>
> Thanks for the patch.
>
> jian he <jian.universality@gmail.com>, 19 Ara 2024 Per, 15:03 tarihinde şunu yazdı:
>>
>> attached copy_par_regress_test.sql is a simple benchmark sql file,
>> a partitioned table with 10 partitions, 2 levels of indirection.
>> The simple benchmark shows around 7.7% improvement in my local environment.
>
>
> I confirm that the patch introduces some improvement in simple cases like the one you shared. I looked around a bit
tounderstand whether there is an obvious reason why copying from a partitioned table is not allowed, but couldn't find
one.It seems ok to me. 

hi. melih mutlu
thanks for confirmation.

> I realized that while both "COPY <partitioned_table> TO..." and "COPY (SELECT..) TO..." can return the same set of
rows,their orders may not be the same. I guess that it's hard to guess in which order find_all_inheritors() would
returntables, and that might be something we should be worried about with the patch. What do you think? 
>

in the
find_all_inheritors->find_inheritance_children->find_inheritance_children_extended

find_inheritance_children_extended we have
"""
    if (numoids > 1)
        qsort(oidarr, numoids, sizeof(Oid), oid_cmp);
"""

so the find_all_inheritors output order is deterministic?

Re: speedup COPY TO for partitioned table.

От

Melih Mutlu

Дата:

10 февраля, 22:09:42

Hi,

jian he <jian.universality@gmail.com>, 27 Oca 2025 Pzt, 04:47 tarihinde şunu yazdı:

in the
find_all_inheritors->find_inheritance_children->find_inheritance_children_extended

find_inheritance_children_extended we have
"""
if (numoids > 1)
qsort(oidarr, numoids, sizeof(Oid), oid_cmp);
"""

so the find_all_inheritors output order is deterministic?

You're right that order in find_all_inheritors is deterministic. But it's not always the same with the order of SELECT output. You can quickly see what I mean by running a slightly modified version of the example that you shared in your first email:

CREATE TABLE t3 (a INT, b int ) PARTITION BY RANGE (a);

-- change the order. first create t3_2 then t3_1
create table t3_2 partition of t3 for values from (11) to (15);
create table t3_1 partition of t3 for values from (1) to (11);
insert into t3 select g from generate_series(1, 3) g;
insert into t3 select g from generate_series(11, 11) g;

And the results of the two different COPY approaches would be:

postgres=# COPY t3 TO STDOUT;
11 \N
1 \N
2 \N
3 \N
postgres=# COPY (SELECT * FROM t3) TO STDOUT;
1 \N
2 \N
3 \N
11 \N

Notice that "COPY t3 TO STDOUT" changes the order since the partition t3_2 has been created first, hence it has a smaller OID. On the other hand, SELECT sorts the partitions based on partition boundaries, not OIDs. That's why we should always see the same order regardless of the OIDs of partitions (you can see create_range_bounds() in partbounds.c if interested in more details). One thing that might be useful in the COPY case would be using a partition descriptor to access the correct order of partitions. I believe something like (PartitionDesc) partdesc->oid should give us the partition OIDs in order.

Thanks,

Melih Mutlu

Microsoft

Re: speedup COPY TO for partitioned table.

От

David Rowley

Дата:

11 февраля, 03:25:52

On Tue, 11 Feb 2025 at 08:10, Melih Mutlu <m.melihmutlu@gmail.com> wrote:
> jian he <jian.universality@gmail.com>, 27 Oca 2025 Pzt, 04:47 tarihinde şunu yazdı:
>> so the find_all_inheritors output order is deterministic?
>
> You're right that order in find_all_inheritors is deterministic. But it's not always the same with the order of
SELECToutput. You can quickly see what I mean by running a slightly modified version of the example that you shared in
yourfirst email: 

I think it's fine to raise the question as to whether the order
changing matters, however, I don't personally think there should be
any concerns with this. The main reason I think this is that the
command isn't the same, so the user shouldn't expect the same
behaviour. They'll need to adjust their commands to get the new
behaviour and possibly a different order.

Another few reasons are:

1) In the subquery version, the Append children are sorted by cost, so
the order isn't that predictable in the first place. (See
create_append_path() -> list_sort(subpaths,
append_total_cost_compare))
2) The order tuples are copied with COPY TO on non-partitioned tables
isn't that well defined in the first place. Two reasons for this, a)
the heap is a heap and has no defined order; and b) sync scans might
be used and the scan might start at any point in the heap and circle
back around again to the page prior to the page where the scan
started. See (table_beginscan() adds SO_ALLOW_SYNC to the flags).

I think the main thing to be concerned about regarding order is to
ensure that all rows from the same partition are copied consecutively,
and that does not seem to be at risk of changing here. This is
important as 3592e0ff9 added caching of the last found partition when
partition lookups continually find the same partition.

David

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

07 марта, 13:41:10

hi.

rebased and polished patch attached, test case added.
However there is a case (the following) where
``COPY(partitioned_table)`` is much slower
(around 25% in some cases) than ``COPY (select * from partitioned_table)``.

If the partition attribute order is not the same as the partitioned table,
then for each output row, we need to create a template TupleTableSlot
and call execute_attr_map_slot,
i didn't find a work around to reduce the inefficiency.

Since the master doesn't have ``COPY(partitioned_table)``,
I guess this slowness case is allowed?


----------- the follow case is far slower than ``COPY(select * From pp) TO ``
drop table if exists pp;
CREATE TABLE pp (id  INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);

Вложения

v2-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

11 марта, 15:53:14

On Fri, Mar 7, 2025 at 6:41 PM jian he <jian.universality@gmail.com> wrote:
>
> hi.
>
> rebased and polished patch attached, test case added.
hi.

I realized I need to change the doc/src/sgml/ref/copy.sgml
<title>Notes</title> section.

current doc note section:
COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables or child partitions.
For example, COPY table TO copies the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, partitioned table, or view.

after my change:
------------
COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables,
however COPY TO can be used with partitioned tables.
For example, in a table inheritance hierarchy, COPY table TO copies
the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, or view.
------------

Вложения

v3-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

newtglobal postgresql_contributors

Дата:

17 марта, 11:02:09

Hi Jian,
Tested this patch with COPY sales TO STDOUT; ~ 1.909ms, improving performance over the older COPY (SELECT * FROM sales)
TOSTDOUT; ~ 3.80ms method. This eliminates query planning overhead and significantly speeds up data export from
partitionedtables. 
 
Our test setup involved creating a partitioned table(sales), inserted 500 records, and comparing execution times.

-- Step 1: Create Partitioned Parent Table
CREATE TABLE sales (
    id SERIAL NOT NULL,
    sale_date DATE NOT NULL,
    region TEXT NOT NULL,
    amount NUMERIC(10,2) NOT NULL,
    category TEXT NOT NULL,
    PRIMARY KEY (id, sale_date,region)
) PARTITION BY RANGE (sale_date);

-- Step 2: Create Range Partitions (2023 & 2024)
CREATE TABLE sales_2023 PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
    PARTITION BY HASH (region);

CREATE TABLE sales_2024 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
    PARTITION BY HASH (region);

-- Step 3: Create Hash Partitions for sales_2023
CREATE TABLE sales_2023_part1 PARTITION OF sales_2023
    FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2023_part2 PARTITION OF sales_2023
    FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 4: Create Hash Partitions for sales_2024
CREATE TABLE sales_2024_part1 PARTITION OF sales_2024
    FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2024_part2 PARTITION OF sales_2024
    FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 5: Insert Data **AFTER** Creating Partitions
INSERT INTO sales (sale_date, region, amount, category)
SELECT 
    ('2023-01-01'::DATE + (random() * 730)::int) AS sale_date,  -- Random date in 2023-2024 range
    CASE WHEN random() > 0.5 THEN 'North' ELSE 'South' END AS region,  -- Random region
    (random() * 1000)::NUMERIC(10,2) AS amount,  -- Random amount (0 to 1000)
    CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Furniture' END AS category  -- Random category
FROM generate_series(1, 500);

COPY (SELECT * FROM SALES) TO STDOUT;  ~ 1.909ms

COPY SALES TO STDOUT; ~ 3.80ms

This change is recommended for better performance in PostgreSQL partitioned tables.

Re: speedup COPY TO for partitioned table.

От

vignesh C

Дата:

21 марта, 13:12:50

On Tue, 11 Mar 2025 at 18:24, jian he <jian.universality@gmail.com> wrote:
>
> after my change:
> ------------
> COPY TO can be used only with plain tables, not views, and does not
> copy rows from child tables,
> however COPY TO can be used with partitioned tables.
> For example, in a table inheritance hierarchy, COPY table TO copies
> the same rows as SELECT * FROM ONLY table.
> The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
> of the rows in an inheritance hierarchy, or view.
> ------------

I find an issue with the patch:

-- Setup
CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'testdb', port '5432');
CREATE TABLE t1(id int) PARTITION BY RANGE(id);
CREATE TABLE part1 PARTITION OF t1 FOR VALUES FROM (0) TO (5);
CREATE TABLE part2 PARTITION OF t1 FOR VALUES FROM (5) TO (15)
PARTITION BY RANGE(id);
CREATE FOREIGN TABLE part2_1 PARTITION OF part2 FOR VALUES FROM (10)
TO (15) SERVER myserver;

-- Create table in testdb
create table part2_1(id int);

-- Copy partitioned table data
postgres=#  copy t1 to stdout(header);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Stack trace for the same is:
#0  table_beginscan (rel=0x72b109f9aad8, snapshot=0x5daafa77e000,
nkeys=0, key=0x0) at ../../../src/include/access/tableam.h:883
#1  0x00005daadf89eb9b in DoCopyTo (cstate=0x5daafa71e278) at copyto.c:1105
#2  0x00005daadf8913f4 in DoCopy (pstate=0x5daafa6c5fc0,
stmt=0x5daafa6f20c8, stmt_location=0, stmt_len=25,
processed=0x7ffd3799c2f0) at copy.c:316
#3  0x00005daadfc7a770 in standard_ProcessUtility
(pstmt=0x5daafa6f21e8, queryString=0x5daafa6f15c0 "copy t1 to
stdout(header);", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL,
    params=0x0, queryEnv=0x0, dest=0x5daafa6f25a8, qc=0x7ffd3799c660)
at utility.c:738

(gdb) f 0
#0  table_beginscan (rel=0x72b109f9aad8, snapshot=0x5daafa77e000,
nkeys=0, key=0x0) at ../../../src/include/access/tableam.h:883
883 return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);

The table access method is not available in this care
(gdb) p *rel->rd_tableam
Cannot access memory at address 0x0

This failure happens when we do table_beginscan on scan part2_1 table

Regards,
Vignesh

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

28 марта, 06:03:29

On Fri, Mar 21, 2025 at 6:13 PM vignesh C <vignesh21@gmail.com> wrote:
>
> I find an issue with the patch:
>
> -- Setup
> CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS
> (dbname 'testdb', port '5432');
> CREATE TABLE t1(id int) PARTITION BY RANGE(id);
> CREATE TABLE part1 PARTITION OF t1 FOR VALUES FROM (0) TO (5);
> CREATE TABLE part2 PARTITION OF t1 FOR VALUES FROM (5) TO (15)
> PARTITION BY RANGE(id);
> CREATE FOREIGN TABLE part2_1 PARTITION OF part2 FOR VALUES FROM (10)
> TO (15) SERVER myserver;
>
> -- Create table in testdb
> create table part2_1(id int);
>
> -- Copy partitioned table data
> postgres=#  copy t1 to stdout(header);
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>

I manually tested:
sequence, temp table, materialized view, index, view,
composite types, partitioned indexes.
all these above can not attach to partitioned tables.

We should care about the unlogged table, foreign table attached to the
partition.
an unlogged table should work just fine.
we should error out foreign tables.

so:
copy t1 to stdout(header);
ERROR:  cannot copy from foreign table "t1"
DETAIL:  partition "t1" is a foreign table
HINT:  Try the COPY (SELECT ...) TO variant.

Вложения

v4-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

28 марта, 06:09:15

hi.

I made a mistake.
The regress test sql file should have a new line at the end of the file.

Вложения

v5-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

vignesh C

Дата:

28 марта, 16:02:57

On Fri, 28 Mar 2025 at 08:39, jian he <jian.universality@gmail.com> wrote:
>
> hi.
>
> I made a mistake.
> The regress test sql file should have a new line at the end of the file.

Couple of suggestions:
1) Can you add some comments here, this is the only code that is
different from the regular table handling code:
+                       scan_tupdesc = RelationGetDescr(scan_rel);
+                       map = build_attrmap_by_name_if_req(tupDesc,
scan_tupdesc, false);

2) You can see if you can try to make a function add call it from both
the partitioned table and regular table case, that way you could
reduce the duplicate code:
+                       while (table_scan_getnextslot(scandesc,
ForwardScanDirection, slot))
+                       {
+                               CHECK_FOR_INTERRUPTS();
+
+                               /* Deconstruct the tuple ... */
+                               if (map != NULL)
+                               {
+                                       original_slot = slot;
+                                       root_slot =
MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple);
+                                       slot =
execute_attr_map_slot(map, slot, root_slot);
+                               }
+                               else
+                                       slot_getallattrs(slot);
+
+                               /* Format and send the data */
+                               CopyOneRowTo(cstate, slot);
+
+
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
+
                  ++processed);
+
+                               if (original_slot != NULL)
+
ExecDropSingleTupleTableSlot(original_slot);
+                       };

Regards,
Vignesh

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

29 марта, 09:37:48

On Fri, Mar 28, 2025 at 9:03 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Fri, 28 Mar 2025 at 08:39, jian he <jian.universality@gmail.com> wrote:
> >
> > hi.
> >
> > I made a mistake.
> > The regress test sql file should have a new line at the end of the file.
>
> Couple of suggestions:
> 1) Can you add some comments here, this is the only code that is
> different from the regular table handling code:
> +                       scan_tupdesc = RelationGetDescr(scan_rel);
> +                       map = build_attrmap_by_name_if_req(tupDesc,
> scan_tupdesc, false);
>

I have added the following comments around build_attrmap_by_name_if_req.

    /*
     * partition's rowtype might differ from the root table's.  We must
     * convert it back to the root table's rowtype as we are export
     * partitioned table data here.
    */


> 2) You can see if you can try to make a function add call it from both
> the partitioned table and regular table case, that way you could
> reduce the duplicate code:
> +                       while (table_scan_getnextslot(scandesc,
> ForwardScanDirection, slot))
> +                       {
> +                               CHECK_FOR_INTERRUPTS();
> +
> +                               /* Deconstruct the tuple ... */
> +                               if (map != NULL)
> +                               {
> +                                       original_slot = slot;
> +                                       root_slot =
> MakeSingleTupleTableSlot(tupDesc, &TTSOpsBufferHeapTuple);
> +                                       slot =
> execute_attr_map_slot(map, slot, root_slot);
> +                               }
> +                               else
> +                                       slot_getallattrs(slot);
> +
> +                               /* Format and send the data */
> +                               CopyOneRowTo(cstate, slot);
> +
> +
> pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
> +
>                   ++processed);
> +
> +                               if (original_slot != NULL)
> +
> ExecDropSingleTupleTableSlot(original_slot);
> +                       };
>

I consolidated it into a new function: CopyThisRelTo.

Вложения

v6-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

vignesh C

Дата:

30 марта, 04:14:18

On Sat, 29 Mar 2025 at 12:08, jian he <jian.universality@gmail.com> wrote:
>
>
> I consolidated it into a new function: CopyThisRelTo.

Few comments:
1) Here the error message is not correct, we are printing the original
table from where copy was done which is a regular table and not a
foreign table, we should use childreloid instead of rel.

+                               if (relkind == RELKIND_FOREIGN_TABLE)
+                                       ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                                                       errmsg("cannot
copy from foreign table \"%s\"",
+
 RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+                                                       errhint("Try
the COPY (SELECT ...) TO variant."));

In the error detail you can include the original table too.

postgres=# copy t1 to stdout(header);
ERROR:  cannot copy from foreign table "t1"
DETAIL:  partition "t1" is a foreign table
HINT:  Try the COPY (SELECT ...) TO variant.

2) 2.a) I felt the comment should be "then copy partitioned rel to
destionation":
+ * rel: the relation to be copied to.
+ * root_rel: if not null, then the COPY TO partitioned rel.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
uint64 *processed)
+{
+       TupleTableSlot *slot;

2.b) you can have processed argument in the next line for better readability

3) There is a small indentation issue here:
+       /*
+        * partition's rowtype might differ from the root table's.  We must
+        * convert it back to the root table's rowtype as we are export
+        * partitioned table data here.
+       */
+       if (root_rel != NULL)

Regards,
Vignesh

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

30 марта, 16:53:22

On Sun, Mar 30, 2025 at 9:14 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Sat, 29 Mar 2025 at 12:08, jian he <jian.universality@gmail.com> wrote:
> >
> >
> > I consolidated it into a new function: CopyThisRelTo.
>
> Few comments:
> 1) Here the error message is not correct, we are printing the original
> table from where copy was done which is a regular table and not a
> foreign table, we should use childreloid instead of rel.
>
> +                               if (relkind == RELKIND_FOREIGN_TABLE)
> +                                       ereport(ERROR,
> +
> errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                                                       errmsg("cannot
> copy from foreign table \"%s\"",
> +
>  RelationGetRelationName(rel)),
> +
> errdetail("partition \"%s\" is a foreign table",
> RelationGetRelationName(rel)),
> +                                                       errhint("Try
> the COPY (SELECT ...) TO variant."));
>
> In the error detail you can include the original table too.
>
I changed it to:
                if (relkind == RELKIND_FOREIGN_TABLE)
                {
                    char       *relation_name;

                    relation_name = get_rel_name(childreloid);
                    ereport(ERROR,
                            errcode(ERRCODE_WRONG_OBJECT_TYPE),
                            errmsg("cannot copy from foreign table
\"%s\"", relation_name),
                            errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s.%s\"",
                                      relation_name,
RelationGetRelationName(rel),

get_namespace_name(rel->rd_rel->relnamespace)),
                            errhint("Try the COPY (SELECT ...) TO variant."));
                }

>
> 2) 2.a) I felt the comment should be "then copy partitioned rel to
> destionation":
> + * rel: the relation to be copied to.
> + * root_rel: if not null, then the COPY TO partitioned rel.
> + * processed: number of tuple processed.
> +*/
> +static void
> +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
> uint64 *processed)
> +{
> +       TupleTableSlot *slot;
>

i changed it to:
+/*
+ * rel: the relation to be copied to.
+ * root_rel: if not null, then the COPY partitioned relation to destination.
+ * processed: number of tuple processed.
+*/
+static void
+CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+              uint64 *processed)


> 3) There is a small indentation issue here:
> +       /*
> +        * partition's rowtype might differ from the root table's.  We must
> +        * convert it back to the root table's rowtype as we are export
> +        * partitioned table data here.
> +       */
> +       if (root_rel != NULL)
>
I am not so sure.
can you check if the attached still has the indentation issue.

Вложения

v7-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

Kirill Reshke

Дата:

31 марта, 11:05:44

Hi!
I reviewed v7. Maybe we should add a multi-level partitioning case
into copy2.sql regression test?

I also did quick benchmarking for this patch:

==== DDL

 create table ppp(i int) partition by range (i);

genddl.sh:

for i in `seq 0 200`; do echo "create table p$i partition of ppp for
values from ( $((10 * i)) ) to ( $((10 * (i + 1))) ); "; done

=== insert data data:
insert into ppp select i / 1000  from generate_series(0, 2000000)i;

=== results:


for 2000001 rows speedup is 1.40 times : 902.604 ms (patches) vs
1270.648 ms (unpatched)

for 4000002 rows speedup is 1.20 times : 1921.724 ms (patches) vs
2343.393 ms (unpatched)

for 8000004 rows speedup is 1.10 times : 3932.361 ms (patches) vs
4358.489ms (unpatched)

So, this patch indeed speeds up some cases, but with larger tables
speedup becomes negligible.

-- 
Best regards,
Kirill Reshke

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

01 апреля, 04:01:10

On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> Hi!
> I reviewed v7. Maybe we should add a multi-level partitioning case
> into copy2.sql regression test?
>

sure.

> I also did quick benchmarking for this patch:
>
> ==== DDL
>
>  create table ppp(i int) partition by range (i);
>
> genddl.sh:
>
> for i in `seq 0 200`; do echo "create table p$i partition of ppp for
> values from ( $((10 * i)) ) to ( $((10 * (i + 1))) ); "; done
>
> === insert data data:
> insert into ppp select i / 1000  from generate_series(0, 2000000)i;
>
> === results:
>
>
> for 2000001 rows speedup is 1.40 times : 902.604 ms (patches) vs
> 1270.648 ms (unpatched)
>
> for 4000002 rows speedup is 1.20 times : 1921.724 ms (patches) vs
> 2343.393 ms (unpatched)
>
> for 8000004 rows speedup is 1.10 times : 3932.361 ms (patches) vs
> 4358.489ms (unpatched)
>
> So, this patch indeed speeds up some cases, but with larger tables
> speedup becomes negligible.
>

Thanks for doing the benchmark.

Вложения

v8-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

vignesh C

Дата:

01 апреля, 08:38:41

On Tue, 1 Apr 2025 at 06:31, jian he <jian.universality@gmail.com> wrote:
>
> On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> Thanks for doing the benchmark.

Few comments to improve the comments, error message and remove
redundant assignment:
1) How about we change below:
/*
 * partition's rowtype might differ from the root table's.  We must
 * convert it back to the root table's rowtype as we are export
 * partitioned table data here.
*/
To:
/*
 * A partition's row type might differ from the root table's.
 * Since we're exporting partitioned table data, we must
 * convert it back to the root table's row type.
 */

2) How about we change below:
+                               if (relkind == RELKIND_FOREIGN_TABLE)
+                                       ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                                                       errmsg("cannot
copy from foreign table \"%s\"",
+
 RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+                                                       errhint("Try
the COPY (SELECT ...) TO variant."));

To:
+                               if (relkind == RELKIND_FOREIGN_TABLE)
+                                       ereport(ERROR,
+
errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                                                       errmsg("cannot
copy from a partitioned table having foreign table partition \"%s\"",
+
 RelationGetRelationName(rel)),
+
errdetail("partition \"%s\" is a foreign table",
RelationGetRelationName(rel)),
+                                                       errhint("Try
the COPY (SELECT ...) TO variant."));

3) How about we change below:
/*
 * rel: the relation to be copied to.
 * root_rel: if not NULL, then the COPY partitioned relation to destination.
 * processed: number of tuples processed.
*/
To:
/*
 * rel: the relation from which data will be copied.
 * root_rel: If not NULL, indicates that rel's row type must be
 *           converted to root_rel's row type.
 * processed: number of tuples processed.
 */

 4)  You can initialize processed to 0 along with declaration in
DoCopyTo function and remove the below:
 +       if (cstate->rel && cstate->rel->rd_rel->relkind ==
RELKIND_PARTITIONED_TABLE)
        {
...
...
                processed = 0;
-               while (table_scan_getnextslot(scandesc,
ForwardScanDirection, slot))
...
...
-
-               ExecDropSingleTupleTableSlot(slot);
-               table_endscan(scandesc);
+       }
+       else if (cstate->rel)
+       {
+               processed = 0;
+               CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
        }

Regards,
Vignesh

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

04 апреля, 06:20:07

On Tue, Apr 1, 2025 at 1:38 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 1 Apr 2025 at 06:31, jian he <jian.universality@gmail.com> wrote:
> >
> > On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
> >
> > Thanks for doing the benchmark.
>
> Few comments to improve the comments, error message and remove
> redundant assignment:
> 1) How about we change below:
> /*
>  * partition's rowtype might differ from the root table's.  We must
>  * convert it back to the root table's rowtype as we are export
>  * partitioned table data here.
> */
> To:
> /*
>  * A partition's row type might differ from the root table's.
>  * Since we're exporting partitioned table data, we must
>  * convert it back to the root table's row type.
>  */
>
i changed it to
    /*
     * A partition's rowtype might differ from the root table's.
     * Since we are export partitioned table data here,
     * we must convert it back to the root table's rowtype.
    */
Since many places use "rowtype",
using "rowtype" instead of "row type" should be fine.


> 2) How about we change below:
> +                               if (relkind == RELKIND_FOREIGN_TABLE)
> +                                       ereport(ERROR,
> +
> errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                                                       errmsg("cannot
> copy from foreign table \"%s\"",
> +
>  RelationGetRelationName(rel)),
> +
> errdetail("partition \"%s\" is a foreign table",
> RelationGetRelationName(rel)),
> +                                                       errhint("Try
> the COPY (SELECT ...) TO variant."));
>
> To:
> +                               if (relkind == RELKIND_FOREIGN_TABLE)
> +                                       ereport(ERROR,
> +
> errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                                                       errmsg("cannot
> copy from a partitioned table having foreign table partition \"%s\"",
> +
>  RelationGetRelationName(rel)),
> +
> errdetail("partition \"%s\" is a foreign table",
> RelationGetRelationName(rel)),
> +                                                       errhint("Try
> the COPY (SELECT ...) TO variant."));
>
i am not so sure.
since the surrounding code we have

else if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
            ereport(ERROR,
                    (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                     errmsg("cannot copy from foreign table \"%s\"",
                            RelationGetRelationName(rel)),
                     errhint("Try the COPY (SELECT ...) TO variant.")));

let's see what others think.
> 3) How about we change below:
> /*
>  * rel: the relation to be copied to.
>  * root_rel: if not NULL, then the COPY partitioned relation to destination.
>  * processed: number of tuples processed.
> */
> To:
> /*
>  * rel: the relation from which data will be copied.
>  * root_rel: If not NULL, indicates that rel's row type must be
>  *           converted to root_rel's row type.
>  * processed: number of tuples processed.
>  */
>
i changed it to

/*
 * rel: the relation from which the actual data will be copied.
 * root_rel: if not NULL, it indicates that we are copying partitioned relation
 * data to the destination, and "rel" is the partition of "root_rel".
 * processed: number of tuples processed.
*/

>  4)  You can initialize processed to 0 along with declaration in
> DoCopyTo function and remove the below:
>  +       if (cstate->rel && cstate->rel->rd_rel->relkind ==
> RELKIND_PARTITIONED_TABLE)
>         {
> ...
> ...
>                 processed = 0;
> -               while (table_scan_getnextslot(scandesc,
> ForwardScanDirection, slot))
> ...
> ...
> -
> -               ExecDropSingleTupleTableSlot(slot);
> -               table_endscan(scandesc);
> +       }
> +       else if (cstate->rel)
> +       {
> +               processed = 0;
> +               CopyThisRelTo(cstate, cstate->rel, NULL, &processed);
>         }
ok.

Вложения

v9-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

Kirill Reshke

Дата:

04 апреля, 15:17:59

Hi!

First of all, a commit message does not need to contain SQL examples
of what it does. We should provide human-readable explanations and
that's it.

Next, about changes to src/test/regress/sql/copy2.sql. I find the sql
you used to test really unintuitive. How about CREATE TABLE ...
PARTITION OF syntax? It is also one command instead of two (create +
alter). It is also hard to say what partition structure is, because
column names on different partition levels are the same, just order is
switched. Let's change it to something more intuitive too?



-- 
Best regards,
Kirill Reshke

Re: speedup COPY TO for partitioned table.

От

Kirill Reshke

Дата:

07 апреля, 19:54:13

On Fri, 4 Apr 2025, 15:17 Kirill Reshke, <reshkekirill@gmail.com> wrote:

Hi!

First of all, a commit message does not need to contain SQL examples
of what it does. We should provide human-readable explanations and
that's it.

Next, about changes to src/test/regress/sql/copy2.sql. I find the sql
you used to test really unintuitive. How about CREATE TABLE ...
PARTITION OF syntax? It is also one command instead of two (create +
alter). It is also hard to say what partition structure is, because
column names on different partition levels are the same, just order is
switched. Let's change it to something more intuitive too?

--
Best regards,
Kirill Reshke

Maybe we can tab-complete here if prefix matches pg_% ? Does that makes good use case?

Re: speedup COPY TO for partitioned table.

От

Kirill Reshke

Дата:

07 апреля, 19:55:10

Sorry, wrong thread

Best regards,

Kirill Reshke

On Mon, 7 Apr 2025, 19:54 Kirill Reshke, <reshkekirill@gmail.com> wrote:

On Fri, 4 Apr 2025, 15:17 Kirill Reshke, <reshkekirill@gmail.com> wrote:
Hi!

First of all, a commit message does not need to contain SQL examples
of what it does. We should provide human-readable explanations and
that's it.

Next, about changes to src/test/regress/sql/copy2.sql. I find the sql
you used to test really unintuitive. How about CREATE TABLE ...
PARTITION OF syntax? It is also one command instead of two (create +
alter). It is also hard to say what partition structure is, because
column names on different partition levels are the same, just order is
switched. Let's change it to something more intuitive too?

--
Best regards,
Kirill Reshke

Maybe we can tab-complete here if prefix matches pg_% ? Does that makes good use case?

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

10 апреля, 05:44:56

hi.

rebase and simplify regress tests.

Вложения

v10-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

Kirill Reshke

Дата:

10 апреля, 11:25:00

On Thu, 10 Apr 2025 at 07:45, jian he <jian.universality@gmail.com> wrote:
>
> hi.
>
> rebase and simplify regress tests.

HI!
You used CREATE TABLE PARTITION OF syntax for the second level of
partitioning scheme, but not for the first level. Is there any reason?

Also about column names. how about

+CREATE TABLE pp (year int, day int) PARTITION BY RANGE (year);
+CREATE TABLE pp_1 (year int, day int) PARTITION BY RANGE (day);
+CREATE TABLE pp_2 (year int, day int) PARTITION BY RANGE (day);

??

-- 
Best regards,
Kirill Reshke

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

10 апреля, 15:37:16

On Thu, Apr 10, 2025 at 4:25 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Thu, 10 Apr 2025 at 07:45, jian he <jian.universality@gmail.com> wrote:
> >
> > hi.
> >
> > rebase and simplify regress tests.
>
> HI!
> You used CREATE TABLE PARTITION OF syntax for the second level of
> partitioning scheme, but not for the first level. Is there any reason?
>
hi.

I want the partitioned table and partition column position to be different.
Here, the partitioned table column order is "(id int,val int) ",
but the actual partition column order is "(val int, id int)".

> Also about column names. how about
>
> +CREATE TABLE pp (year int, day int) PARTITION BY RANGE (year);
> +CREATE TABLE pp_1 (year int, day int) PARTITION BY RANGE (day);
> +CREATE TABLE pp_2 (year int, day int) PARTITION BY RANGE (day);
>
> ??

I think the current test example is fine.

Re: speedup COPY TO for partitioned table.

От

Kirill Reshke

Дата:

15 апреля, 17:00:21

On Thu, 10 Apr 2025 at 17:37, jian he <jian.universality@gmail.com> wrote:
>
>
> I think the current test example is fine.

Ok, let it be so. I changed status to RFQ as I have no more input
here, and other reviewers in thread remain silent (so I assume they
are fine with v10)

-- 
Best regards,
Kirill Reshke

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

05 июня, 03:45:10

hi.

In the V10 patch, there will be some regression if the partition column
ordering is different from the root partitioned table.

because in V10 CopyThisRelTo
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ if (map != NULL)
+ {
+ original_slot = slot;
+ root_slot = MakeSingleTupleTableSlot(rootdesc, &TTSOpsBufferHeapTuple);
+ slot = execute_attr_map_slot(map, slot, root_slot);
+ }
+ else
+ slot_getallattrs(slot);
+
+ if (original_slot != NULL)
+ ExecDropSingleTupleTableSlot(original_slot);
+}
as you can see, for each slot in the partition, i called
MakeSingleTupleTableSlot to get the dumpy root_slot
and ExecDropSingleTupleTableSlot too.
that will cause overhead.

we can call produce root_slot before the main while loop.
like the following:
+ if (root_rel != NULL)
+ {
+ rootdesc = RelationGetDescr(root_rel);
+ root_slot = table_slot_create(root_rel, NULL);
+ map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+ }
....
+ while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
+ {
+ TupleTableSlot *copyslot;
+ if (map != NULL)
+ copyslot = execute_attr_map_slot(map, slot, root_slot);
+ else
+ {
+ slot_getallattrs(slot);
+ copyslot = slot;
+ }
+
please check CopyThisRelTo in v11. so, with v11, there is no
regression for case when
partition column ordering differs from partitioned.

I have tested 30 partitions, 10 columns, all the column ordering
is different with the root partitioned table.

copy pp to '/tmp/2.txt'
is still faster than
copy (select * from pp) to '/tmp/1.txt';
(359.463 ms versus  376.371 ms)

I am using -Dbuildtype=release
PostgreSQL 18beta1_release_build on x86_64-linux, compiled by gcc-14.1.0, 64-bit

you may see the attached test file.

Вложения

Re: speedup COPY TO for partitioned table.

От

torikoshia

Дата:

26 июня, 04:43:52

On 2025-06-05 09:45, jian he wrote:
> hi.
> 
> In the V10 patch, there will be some regression if the partition column
> ordering is different from the root partitioned table.
> 
> because in V10 CopyThisRelTo
> + while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
> + {
> + if (map != NULL)
> + {
> + original_slot = slot;
> + root_slot = MakeSingleTupleTableSlot(rootdesc, 
> &TTSOpsBufferHeapTuple);
> + slot = execute_attr_map_slot(map, slot, root_slot);
> + }
> + else
> + slot_getallattrs(slot);
> +
> + if (original_slot != NULL)
> + ExecDropSingleTupleTableSlot(original_slot);
> +}
> as you can see, for each slot in the partition, i called
> MakeSingleTupleTableSlot to get the dumpy root_slot
> and ExecDropSingleTupleTableSlot too.
> that will cause overhead.
> 
> we can call produce root_slot before the main while loop.
> like the following:
> + if (root_rel != NULL)
> + {
> + rootdesc = RelationGetDescr(root_rel);
> + root_slot = table_slot_create(root_rel, NULL);
> + map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
> + }
> ....
> + while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
> + {
> + TupleTableSlot *copyslot;
> + if (map != NULL)
> + copyslot = execute_attr_map_slot(map, slot, root_slot);
> + else
> + {
> + slot_getallattrs(slot);
> + copyslot = slot;
> + }
> +
> please check CopyThisRelTo in v11. so, with v11, there is no
> regression for case when
> partition column ordering differs from partitioned.

Thanks for working on this improvement.

Here are some minor comments on v11 patch:

> +    For example, if <replaceable class="parameter">table</replaceable> 
> is not partitioned table,
>      <literal>COPY <replaceable class="parameter">table</replaceable>
>      TO</literal> copies the same rows as
>      <literal>SELECT * FROM ONLY <replaceable 
> class="parameter">table</replaceable></literal>

This describes the behavior when the table is not partitioned, but would 
it also be helpful to mention the behavior when the table is a 
partitioned table?
For example:

   If table is a partitioned table, then COPY table TO copies the same 
rows as SELECT * FROM table.


> +        * if COPY TO source table is a partitioned table, then open 
> each

if -> If


> +                       scan_rel = table_open(scan_oid, 
> AccessShareLock);
> 
> -                       /* Format and send the data */
> -                       CopyOneRowTo(cstate, slot);
> +                       CopyThisRelTo(cstate, scan_rel, cstate->rel, 
> &processed);
> 
> -                       /*
> -                        * Increment the number of processed tuples, 
> and report the
> -                        * progress.
> -                        */
> -                       
> pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
> -                                                                       
>          ++processed);
> +                       table_close(scan_rel, AccessShareLock)


After applying the patch, blank lines exist between these statements as 
below. Do we really need these blank lines?

```
                          scan_rel = table_open(scan_oid, 
AccessShareLock);

                          CopyThisRelTo(cstate, scan_rel, cstate->rel, 
&processed);

                          table_close(scan_rel, AccessShareLock);
``

> +/*
> + * rel: the relation from which the actual data will be copied.
> + * root_rel: if not NULL, it indicates that we are copying partitioned 
> relation
> + * data to the destination, and "rel" is the partition of "root_rel".
> + * processed: number of tuples processed.
> +*/
> +static void
> +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,

This comment only describes the parameters. Wouldn't it better to add a 
brief summary of what this function does overall?


  +        * A partition's rowtype might differ from the root table's.  
Since we are
  +        * export partitioned table data here, we must convert it back 
to the root

  are export -> are exporting?


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: speedup COPY TO for partitioned table.

От

torikoshia

Дата:

30 июня, 10:57:30

On 2025-06-27 16:14, jian he wrote:
Thanks for updating the patch!

> On Thu, Jun 26, 2025 at 9:43 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> After applying the patch, blank lines exist between these statements 
>> as
>> below. Do we really need these blank lines?
>> 
>> ```
>>                           scan_rel = table_open(scan_oid,
>> AccessShareLock);
>> 
>>                           CopyThisRelTo(cstate, scan_rel, cstate->rel,
>> &processed);
>> 
>>                           table_close(scan_rel, AccessShareLock);
>> ``
>> 
> we can remove these empty new lines.
> actually, I realized we don't need to use AccessShareLock here—we can 
> use NoLock
> instead, since BeginCopyTo has already acquired AccessShareLock via
> find_all_inheritors.

That makes sense.
I think it would be helpful to add a comment explaining why NoLock is 
safe here — for example:

   /* We already got the needed lock */

In fact, in other places where table_open(..., NoLock) is used, similar 
explanatory comments are often included(Above comment is one of them).

>> > +/*
>> > + * rel: the relation from which the actual data will be copied.
>> > + * root_rel: if not NULL, it indicates that we are copying partitioned
>> > relation
>> > + * data to the destination, and "rel" is the partition of "root_rel".
>> > + * processed: number of tuples processed.
>> > +*/
>> > +static void
>> > +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
>> 
>> This comment only describes the parameters. Wouldn't it better to add 
>> a
>> brief summary of what this function does overall?
>> 
> 
> what do you think the following
> 
> /*
>  * CopyThisRelTo:
>  * This will scanning a single table (which may be a partition) and 
> exporting
>  * its rows to a COPY destination.
>  *
>  * rel: the relation from which the actual data will be copied.
>  * root_rel: if not NULL, it indicates that we are copying partitioned 
> relation
>  * data to the destination, and "rel" is the partition of "root_rel".
>  * processed: number of tuples processed.
> */
> static void
> CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
>               uint64 *processed)

I think it would be better to follow the style of nearby functions in 
the same file. For example:

   /*
    * Scan a single table (which may be a partition) and export
    * its rows to the COPY destination.
    */

Also, regarding the function name CopyThisRelTo() — I wonder if the 
"This" is really necessary?
Maybe something simpler like CopyRelTo() is enough.

What do you think?


Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

02 июля, 07:10:21

On Mon, Jun 30, 2025 at 3:57 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> >> ```
> >>                           scan_rel = table_open(scan_oid,
> >> AccessShareLock);
> >>
> >>                           CopyThisRelTo(cstate, scan_rel, cstate->rel,
> >> &processed);
> >>
> >>                           table_close(scan_rel, AccessShareLock);
> >> ``
> >>
> > we can remove these empty new lines.
> > actually, I realized we don't need to use AccessShareLock here—we can
> > use NoLock
> > instead, since BeginCopyTo has already acquired AccessShareLock via
> > find_all_inheritors.
>
> That makes sense.
> I think it would be helpful to add a comment explaining why NoLock is
> safe here — for example:
>
>    /* We already got the needed lock */
>
> In fact, in other places where table_open(..., NoLock) is used, similar
> explanatory comments are often included(Above comment is one of them).
>

hi.

I changed it to:
        foreach_oid(scan_oid, cstate->partitions)
        {
            Relation        scan_rel;
            /* We already got the needed lock in BeginCopyTo */
            scan_rel = table_open(scan_oid, NoLock);
            CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
            table_close(scan_rel, NoLock);
        }

> > what do you think the following
> >
> > /*
> >  * CopyThisRelTo:
> >  * This will scanning a single table (which may be a partition) and
> > exporting
> >  * its rows to a COPY destination.
> >  *
> >  * rel: the relation from which the actual data will be copied.
> >  * root_rel: if not NULL, it indicates that we are copying partitioned
> > relation
> >  * data to the destination, and "rel" is the partition of "root_rel".
> >  * processed: number of tuples processed.
> > */
> > static void
> > CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
> >               uint64 *processed)
>
> I think it would be better to follow the style of nearby functions in
> the same file. For example:
>
>    /*
>     * Scan a single table (which may be a partition) and export
>     * its rows to the COPY destination.
>     */
>

now it is:
/*
 * Scan a single table (which may be a partition) and export its rows to the
 * COPY destination.
 *
 * rel: the relation from which the actual data will be copied.
 * root_rel: if not NULL, it indicates that we are copying partitioned relation
 * data to the destination, and "rel" is the partition of "root_rel".
 * processed: number of tuples processed.
*/
static void
CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
          uint64 *processed)


> Also, regarding the function name CopyThisRelTo() — I wonder if the
> "This" is really necessary?
> Maybe something simpler like CopyRelTo() is enough.
>
> What do you think?
>
sure. CopyRelTo looks good to me.

while at it.
I found that in BeginCopyTo:
            ereport(ERROR,
                    (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                     errmsg("cannot copy from foreign table \"%s\"",
                            RelationGetRelationName(rel)),
                     errhint("Try the COPY (SELECT ...) TO variant.")));

                    ereport(ERROR,
                            errcode(ERRCODE_WRONG_OBJECT_TYPE),
                            errmsg("cannot copy from foreign table
\"%s\"", relation_name),
                            errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s\"",
                                      relation_name,
RelationGetRelationName(rel)),
                            errhint("Try the COPY (SELECT ...) TO variant."));

don't have any regress tests on it.
see https://coverage.postgresql.org/src/backend/commands/copyto.c.gcov.html
So I added some tests on contrib/postgres_fdw/sql/postgres_fdw.sql

Вложения

v13-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

torikoshia

Дата:

14 июля, 16:38:21

On 2025-07-02 13:10, jian he wrote:
Thanks for updating the patch.

> now it is:
> /*
>  * Scan a single table (which may be a partition) and export its rows
> to the
>  * COPY destination.

Based on the explanations in the glossary, should 'parition' be
partitioned table/relation?

|  -- https://www.postgresql.org/docs/devel/glossary.html
|  partition: One of several disjoint (not overlapping) subsets of a
larger set
|  Partitioned table(relation): A relation that is in semantic terms the
same as a table, but whose storage is distributed across several
partitions

Also, the terms "table" and "relation" seem to be used somewhat
interchangeably in this patch.
For consistency, perhaps it's better to pick one term and use it
consistently throughout the comments.

249 + * root_rel: if not NULL, it indicates that we are copying
partitioned relation
270 +    * exporting partitioned table data here, we must convert it
back to the

>  *
>  * rel: the relation from which the actual data will be copied.
>  * root_rel: if not NULL, it indicates that we are copying partitioned
> relation
>  * data to the destination, and "rel" is the partition of "root_rel".
>  * processed: number of tuples processed.
> */
> static void
> CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
>           uint64 *processed)
>
>
> > Also, regarding the function name CopyThisRelTo() — I wonder if the
> > "This" is really necessary?
> > Maybe something simpler like CopyRelTo() is enough.
> >
> > What do you think?
> >
> sure. CopyRelTo looks good to me.
>
> while at it.
> I found that in BeginCopyTo:
>             ereport(ERROR,
>                     (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>                      errmsg("cannot copy from foreign table \"%s\"",
>                             RelationGetRelationName(rel)),
>                      errhint("Try the COPY (SELECT ...) TO
> variant.")));
>
>                     ereport(ERROR,
>                             errcode(ERRCODE_WRONG_OBJECT_TYPE),
>                             errmsg("cannot copy from foreign table
> \"%s\"", relation_name),
>                             errdetail("Partition \"%s\" is a foreign
> table in the partitioned table \"%s\"",
>                                       relation_name,
> RelationGetRelationName(rel)),
>                             errhint("Try the COPY (SELECT ...) TO
> variant."));
>
> don't have any regress tests on it.

Hmm, I agree there are no regression tests for this, but is it about
copying foreign table, isn't it?

Since this patch is primarily about supporting COPY on partitioned
tables, I’m not sure adding regression tests for foreign tables is in
scope here.
It might be better handled in a follow-up patch focused on improving
test coverage for such unsupported cases, if we decide that's
worthwhile.



--https://coverage.postgresql.org/src/backend/commands/copyto.c.gcov.html
   670           0 :         else if (rel->rd_rel->relkind ==
RELKIND_SEQUENCE)
   671           0 :             ereport(ERROR,
   672             :
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
   673             :                      errmsg("cannot copy from
sequence \"%s\"",
   674             :
RelationGetRelationName(rel))));

Also, I’m not entirely sure how much value such tests would bring,
especially if the error paths are straightforward and unlikely to
regress.


Regarding performance: it's already confirmed that COPY
partitioned_table performs better than COPY (SELECT * FROM
partitioned_table) as expected [1].
I was a bit curious, though, whether this patch might introduce any
performance regression when copying a regular (non-partitioned) table.
To check this, I ran a simple benchmark and did not observe any
degradation.
To minimize I/O overhead, I used a tmpfs mount:

   % mkdir /tmp/mem
   % sudo mount_tmpfs -s 500M /tmp/mem
   % pgbench -i

Then I ran the following command several times on both patched and
unpatched builds:
   =# COPY pgbench_accounts TO '/tmp/mem/accounts';


[1]
https://www.postgresql.org/message-id/174219852967.294107.6195385625494034792.pgcf%40coridan.postgresql.org


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: speedup COPY TO for partitioned table.

От

Álvaro Herrera

Дата:

14 июля, 17:02:35

On 2025-Jul-02, jian he wrote:

> @@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate,
>                       errmsg("cannot copy from sequence \"%s\"",
>                              RelationGetRelationName(rel))));
>          else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> -            ereport(ERROR,
> -                    (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> -                     errmsg("cannot copy from partitioned table \"%s\"",
> -                            RelationGetRelationName(rel)),
> -                     errhint("Try the COPY (SELECT ...) TO variant.")));
> +        {
> +            children = find_all_inheritors(RelationGetRelid(rel),
> +                                           AccessShareLock,
> +                                           NULL);
> +
> +            foreach_oid(childreloid, children)
> +            {
> +                char         relkind = get_rel_relkind(childreloid);
> +
> +                if (relkind == RELKIND_FOREIGN_TABLE)
> +                {
> +                    char       *relation_name;
> +
> +                    relation_name = get_rel_name(childreloid);
> +                    ereport(ERROR,
> +                            errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                            errmsg("cannot copy from foreign table \"%s\"", relation_name),
> +                            errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"",
> +                                      relation_name, RelationGetRelationName(rel)),
> +                            errhint("Try the COPY (SELECT ...) TO variant."));
> +                }

This code looks like it's duplicating what you could obtain by using
RelationGetPartitionDesc and then observe the ->isleaf bits.  Maybe have
a look at the function RelationHasForeignPartition() in the patch at
https://postgr.es/m/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg@mail.gmail.com
which looks very similar to what you need here.  I think that would also
have the (maybe dubious) advantage that the rows will be output in
partition bound order rather than breadth-first (partition hierarchy)
OID order.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The saddest aspect of life right now is that science gathers knowledge faster
 than society gathers wisdom."  (Isaac Asimov)

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

15 июля, 06:31:58

On Mon, Jul 14, 2025 at 10:02 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Jul-02, jian he wrote:
>
> > @@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate,
> >                                        errmsg("cannot copy from sequence \"%s\"",
> >                                                       RelationGetRelationName(rel))));
> >               else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> > -                     ereport(ERROR,
> > -                                     (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> > -                                      errmsg("cannot copy from partitioned table \"%s\"",
> > -                                                     RelationGetRelationName(rel)),
> > -                                      errhint("Try the COPY (SELECT ...) TO variant.")));
> > +             {
> > +                     children = find_all_inheritors(RelationGetRelid(rel),
> > +                                                                                AccessShareLock,
> > +                                                                                NULL);
> > +
> > +                     foreach_oid(childreloid, children)
> > +                     {
> > +                             char             relkind = get_rel_relkind(childreloid);
> > +
> > +                             if (relkind == RELKIND_FOREIGN_TABLE)
> > +                             {
> > +                                     char       *relation_name;
> > +
> > +                                     relation_name = get_rel_name(childreloid);
> > +                                     ereport(ERROR,
> > +                                                     errcode(ERRCODE_WRONG_OBJECT_TYPE),
> > +                                                     errmsg("cannot copy from foreign table \"%s\"",
relation_name),
> > +                                                     errdetail("Partition \"%s\" is a foreign table in the
partitionedtable \"%s\"", 
> > +                                                                       relation_name,
RelationGetRelationName(rel)),
> > +                                                     errhint("Try the COPY (SELECT ...) TO variant."));
> > +                             }
>
> This code looks like it's duplicating what you could obtain by using
> RelationGetPartitionDesc and then observe the ->isleaf bits.  Maybe have
> a look at the function RelationHasForeignPartition() in the patch at
> https://postgr.es/m/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg@mail.gmail.com
> which looks very similar to what you need here.  I think that would also
> have the (maybe dubious) advantage that the rows will be output in
> partition bound order rather than breadth-first (partition hierarchy)
> OID order.
>
hi.

        else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
        {
            PartitionDesc pd = RelationGetPartitionDesc(rel, true);
            for (int i = 0; i < pd->nparts; i++)
            {
                Relation    partRel;
                if (!pd->is_leaf[i])
                    continue;
                partRel = table_open(pd->oids[i], AccessShareLock);
                if (partRel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
                    ereport(ERROR,
                            errcode(ERRCODE_WRONG_OBJECT_TYPE),
                            errmsg("cannot copy from foreign table
\"%s\"", RelationGetRelationName(partRel)),
                            errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s\"",

RelationGetRelationName(partRel), RelationGetRelationName(rel)),
                            errhint("Try the COPY (SELECT ...) TO
variant."));
                table_close(partRel, NoLock);
                scan_oids = lappend_oid(scan_oids, RelationGetRelid(partRel));
            }
        }

I tried the above code, but it doesn't work because RelationGetPartitionDesc
only retrieves the immediate partition descriptor of a partitioned relation, it
doesn't recurse to the lowest level.

Actually Melih Mutlu raised this question at
https://postgr.es/m/CAGPVpCQou3hWQYUqXNTLKdcuO6envsWJYSJqbZZQnRCjZA6nkQ%40mail.gmail.com
I kind of ignored it...
I guess we have to stick with find_all_inheritors here?

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

15 июля, 07:16:11

On Mon, Jul 14, 2025 at 9:38 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Based on the explanations in the glossary, should 'parition' be
> partitioned table/relation?
>
I think "Scan a single table (which may be a partition) and export its
rows to the...."
the word "partition" is correct.


> |  -- https://www.postgresql.org/docs/devel/glossary.html
> |  partition: One of several disjoint (not overlapping) subsets of a
> larger set
> |  Partitioned table(relation): A relation that is in semantic terms the
> same as a table, but whose storage is distributed across several
> partitions
>
> Also, the terms "table" and "relation" seem to be used somewhat
> interchangeably in this patch.
> For consistency, perhaps it's better to pick one term and use it
> consistently throughout the comments.
>
> 249 + * root_rel: if not NULL, it indicates that we are copying
> partitioned relation
> 270 +    * exporting partitioned table data here, we must convert it
> back to the
>

now it's:

+/*
+ * Scan a single table (which may be a partition) and export its rows to the
+ * COPY destination.
+ *
+ * rel: the table from which the actual data will be copied.
+ * root_rel: if not NULL, it indicates that COPY TO command copy partitioned
+ * table data to the destination, and "rel" is the partition of "root_rel".
+ * processed: number of tuples processed.
+*/
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+          uint64 *processed)
+{
+
+    tupdesc = RelationGetDescr(rel);
+    scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+    slot = table_slot_create(rel, NULL);
+
+    /*
+     * A partition's rowtype might differ from the root table's. If we are
+     * exporting partition data here, we must convert it back to the root
+     * table's rowtype.
+    */
+    if (root_rel != NULL)
+    {
+        rootdesc = RelationGetDescr(root_rel);
+        root_slot = table_slot_create(root_rel, NULL);
+        map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+    }
+

> >
> > while at it.
> > I found that in BeginCopyTo:
> >             ereport(ERROR,
> >                     (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> >                      errmsg("cannot copy from foreign table \"%s\"",
> >                             RelationGetRelationName(rel)),
> >                      errhint("Try the COPY (SELECT ...) TO
> > variant.")));
> >
> >                     ereport(ERROR,
> >                             errcode(ERRCODE_WRONG_OBJECT_TYPE),
> >                             errmsg("cannot copy from foreign table
> > \"%s\"", relation_name),
> >                             errdetail("Partition \"%s\" is a foreign
> > table in the partitioned table \"%s\"",
> >                                       relation_name,
> > RelationGetRelationName(rel)),
> >                             errhint("Try the COPY (SELECT ...) TO
> > variant."));
> >
> > don't have any regress tests on it.
>
> Hmm, I agree there are no regression tests for this, but is it about
> copying foreign table, isn't it?
>
> Since this patch is primarily about supporting COPY on partitioned
> tables, I’m not sure adding regression tests for foreign tables is in
> scope here.
> It might be better handled in a follow-up patch focused on improving
> test coverage for such unsupported cases, if we decide that's
> worthwhile.
>
i guess it should  be fine.
since we are only adding one somehow related test case.

+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+COPY async_pt TO stdout; --error

Вложения

v14-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

torikoshia

Дата:

28 июля, 04:22:50

On 2025-07-15 12:31, jian he wrote:
> On Mon, Jul 14, 2025 at 10:02 PM Álvaro Herrera <alvherre@kurilemu.de>
> wrote:
>>
>> On 2025-Jul-02, jian he wrote:
>>
>> > @@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate,
>> >                                        errmsg("cannot copy from sequence \"%s\"",
>> >                                                       RelationGetRelationName(rel))));
>> >               else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>> > -                     ereport(ERROR,
>> > -                                     (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>> > -                                      errmsg("cannot copy from partitioned table \"%s\"",
>> > -                                                     RelationGetRelationName(rel)),
>> > -                                      errhint("Try the COPY (SELECT ...) TO variant.")));
>> > +             {
>> > +                     children = find_all_inheritors(RelationGetRelid(rel),
>> > +                                                                                AccessShareLock,
>> > +                                                                                NULL);
>> > +
>> > +                     foreach_oid(childreloid, children)
>> > +                     {
>> > +                             char             relkind = get_rel_relkind(childreloid);
>> > +
>> > +                             if (relkind == RELKIND_FOREIGN_TABLE)
>> > +                             {
>> > +                                     char       *relation_name;
>> > +
>> > +                                     relation_name = get_rel_name(childreloid);
>> > +                                     ereport(ERROR,
>> > +                                                     errcode(ERRCODE_WRONG_OBJECT_TYPE),
>> > +                                                     errmsg("cannot copy from foreign table \"%s\"",
relation_name),
>> > +                                                     errdetail("Partition \"%s\" is a foreign table in the
partitionedtable \"%s\"", 
>> > +                                                                       relation_name,
RelationGetRelationName(rel)),
>> > +                                                     errhint("Try the COPY (SELECT ...) TO variant."));
>> > +                             }
>>
>> This code looks like it's duplicating what you could obtain by using
>> RelationGetPartitionDesc and then observe the ->isleaf bits.  Maybe
>> have
>> a look at the function RelationHasForeignPartition() in the patch at
>> https://postgr.es/m/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg@mail.gmail.com
>> which looks very similar to what you need here.  I think that would
>> also
>> have the (maybe dubious) advantage that the rows will be output in
>> partition bound order rather than breadth-first (partition hierarchy)
>> OID order.
>>
> hi.
>
>         else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>         {
>             PartitionDesc pd = RelationGetPartitionDesc(rel, true);
>             for (int i = 0; i < pd->nparts; i++)
>             {
>                 Relation    partRel;
>                 if (!pd->is_leaf[i])
>                     continue;
>                 partRel = table_open(pd->oids[i], AccessShareLock);
>                 if (partRel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
>                     ereport(ERROR,
>                             errcode(ERRCODE_WRONG_OBJECT_TYPE),
>                             errmsg("cannot copy from foreign table
> \"%s\"", RelationGetRelationName(partRel)),
>                             errdetail("Partition \"%s\" is a foreign
> table in the partitioned table \"%s\"",
>
> RelationGetRelationName(partRel), RelationGetRelationName(rel)),
>                             errhint("Try the COPY (SELECT ...) TO
> variant."));
>                 table_close(partRel, NoLock);
>                 scan_oids = lappend_oid(scan_oids,
> RelationGetRelid(partRel));
>             }
>         }
>
> I tried the above code, but it doesn't work because
> RelationGetPartitionDesc
> only retrieves the immediate partition descriptor of a partitioned
> relation, it
> doesn't recurse to the lowest level.
>
> Actually Melih Mutlu raised this question at
> https://postgr.es/m/CAGPVpCQou3hWQYUqXNTLKdcuO6envsWJYSJqbZZQnRCjZA6nkQ%40mail.gmail.com
> I kind of ignored it...
> I guess we have to stick with find_all_inheritors here?

That might be the case.
I thought we could consider using RelationHasForeignPartition() instead,
if [1] gets committed.
However, since that function only tells us whether any foreign
partitions exist, whereas the current patch outputs the specific
problematic partitions or foreign tables in the log, I think the current
approach is more user-friendly.

  >      <command>COPY TO</command> can be used with plain
  > -    tables and populated materialized views.
  > -    For example,
  > +    tables, populated materialized views and partitioned tables.
  > +    For example, if <replaceable
class="parameter">table</replaceable> is not partitioned table,
  >      <literal>COPY <replaceable class="parameter">table</replaceable>
  >      TO</literal> copies the same rows as
  >      <literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.

I believe "is not a partitioned table" here is intended to refer to both
plain tables and materialized views.
However, as far as I understand, using ONLY with a materialized view has
no effect.
So, wouldn’t it be better and clearer to say "if the table is a plain
table" instead?
I think the behavior for materialized views can be described along with
that for partitioned tables. For example:

      <command>COPY TO</command> can be used with plain
      tables, populated materialized views and partitioned tables.
      For example, if <replaceable class="parameter">table</replaceable>
is a plain table,
      <literal>COPY <replaceable class="parameter">table</replaceable>
      TO</literal> copies the same rows as
      <literal>SELECT * FROM ONLY <replaceable
class="parameter">table</replaceable></literal>.

      If <replaceable class="parameter">table</replaceable> is a
partitioned table or a materialized view,
      <literal>COPY <replaceable class="parameter">table</replaceable>
TO</literal>
      copies the same rows as <literal>SELECT * FROM <replaceable
class="parameter">table</replaceable></literal>.


  +   List       *children = NIL;
  ...

  @@ -673,11 +680,34 @@ BeginCopyTo(ParseState *pstate,
                       errmsg("cannot copy from sequence \"%s\"",
                              RelationGetRelationName(rel))));
          else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
  -           ereport(ERROR,
  -                   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  -                    errmsg("cannot copy from partitioned table
\"%s\"",
  -                           RelationGetRelationName(rel)),
  -                    errhint("Try the COPY (SELECT ...) TO
variant.")));
  +       {
  +           children = find_all_inheritors(RelationGetRelid(rel),

Since 'children' is only used inside the else if block, I think we don't
need the separate "List *children = NIL;" declaration.
Instead, it could just be  "List *children = find_all_inheritors(...)".


[1]

https://www.postgresql.org/message-id/flat/CANhcyEW_s2LD6RiDSMHtWQnpYB67EWXqf7N8mn7dOrnaKMfROg%40mail.gmail.com#7bfbe70576e7a3f7162ca268f08bd395


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

30 июля, 06:21:55

On Mon, Jul 28, 2025 at 9:22 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> I think the behavior for materialized views can be described along with
> that for partitioned tables. For example:
>
>       <command>COPY TO</command> can be used with plain
>       tables, populated materialized views and partitioned tables.
>       For example, if <replaceable class="parameter">table</replaceable>
> is a plain table,
>       <literal>COPY <replaceable class="parameter">table</replaceable>
>       TO</literal> copies the same rows as
>       <literal>SELECT * FROM ONLY <replaceable
> class="parameter">table</replaceable></literal>.
>
>       If <replaceable class="parameter">table</replaceable> is a
> partitioned table or a materialized view,
>       <literal>COPY <replaceable class="parameter">table</replaceable>
> TO</literal>
>       copies the same rows as <literal>SELECT * FROM <replaceable
> class="parameter">table</replaceable></literal>.
>
Your description seems ok to me.
Let's see if anyone else has a different take.

>   +   List       *children = NIL;
>   ...
>   +       {
>   +           children = find_all_inheritors(RelationGetRelid(rel),
>
> Since 'children' is only used inside the else if block, I think we don't
> need the separate "List *children = NIL;" declaration.
> Instead, it could just be  "List *children = find_all_inheritors(...)".
>
you are right.
""List *children = find_all_inheritors(...)"."  should be ok.

Re: speedup COPY TO for partitioned table.

От

torikoshia

Дата:

03 октября, 03:31:31

On 2025-07-30 12:21, jian he wrote:

Hi, Jian

> On Mon, Jul 28, 2025 at 9:22 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> I think the behavior for materialized views can be described along 
>> with
>> that for partitioned tables. For example:
>> 
>>       <command>COPY TO</command> can be used with plain
>>       tables, populated materialized views and partitioned tables.
>>       For example, if <replaceable 
>> class="parameter">table</replaceable>
>> is a plain table,
>>       <literal>COPY <replaceable class="parameter">table</replaceable>
>>       TO</literal> copies the same rows as
>>       <literal>SELECT * FROM ONLY <replaceable
>> class="parameter">table</replaceable></literal>.
>> 
>>       If <replaceable class="parameter">table</replaceable> is a
>> partitioned table or a materialized view,
>>       <literal>COPY <replaceable class="parameter">table</replaceable>
>> TO</literal>
>>       copies the same rows as <literal>SELECT * FROM <replaceable
>> class="parameter">table</replaceable></literal>.
>> 
> Your description seems ok to me.
> Let's see if anyone else has a different take.

It’s been about two months since this discussion, and there don’t seem 
to be any further comments.
How about updating the patch accordingly?
If there are no new remarks, I’d like to mark the patch as Ready for 
Committer.

>>   +   List       *children = NIL;
>>   ...
>>   +       {
>>   +           children = find_all_inheritors(RelationGetRelid(rel),
>> 
>> Since 'children' is only used inside the else if block, I think we 
>> don't
>> need the separate "List *children = NIL;" declaration.
>> Instead, it could just be  "List *children = 
>> find_all_inheritors(...)".
>> 
> you are right.
> ""List *children = find_all_inheritors(...)"."  should be ok.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

06 октября, 12:48:53

On Fri, Oct 3, 2025 at 8:31 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> It’s been about two months since this discussion, and there don’t seem
> to be any further comments.
> How about updating the patch accordingly?
> If there are no new remarks, I’d like to mark the patch as Ready for
> Committer.
>
> >>   +   List       *children = NIL;
> >>   ...
> >>   +       {
> >>   +           children = find_all_inheritors(RelationGetRelid(rel),
> >>
> >> Since 'children' is only used inside the else if block, I think we
> >> don't
> >> need the separate "List *children = NIL;" declaration.
> >> Instead, it could just be  "List *children =
> >> find_all_inheritors(...)".
> >>
> > you are right.
> > ""List *children = find_all_inheritors(...)"."  should be ok.
>

hi.

please check the attached v15.

only minor adjustment based on comments in
https://postgr.es/m/c507919d8c8219ab6cfd8376a4f9a887@oss.nttdata.com

Вложения

v15-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

Masahiko Sawada

Дата:

09 октября, 04:13:30

Hi,

On Mon, Oct 6, 2025 at 2:49 AM jian he <jian.universality@gmail.com> wrote:
>
> On Fri, Oct 3, 2025 at 8:31 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >
> > It’s been about two months since this discussion, and there don’t seem
> > to be any further comments.
> > How about updating the patch accordingly?
> > If there are no new remarks, I’d like to mark the patch as Ready for
> > Committer.
> >
> > >>   +   List       *children = NIL;
> > >>   ...
> > >>   +       {
> > >>   +           children = find_all_inheritors(RelationGetRelid(rel),
> > >>
> > >> Since 'children' is only used inside the else if block, I think we
> > >> don't
> > >> need the separate "List *children = NIL;" declaration.
> > >> Instead, it could just be  "List *children =
> > >> find_all_inheritors(...)".
> > >>
> > > you are right.
> > > ""List *children = find_all_inheritors(...)"."  should be ok.
> >
>
> hi.
>
> please check the attached v15.
>

Thank you for working on this!  I've reviewed the v15 patch, and here
are review comments:

---
+           children = find_all_inheritors(RelationGetRelid(rel),
+                                          AccessShareLock,
+                                          NULL);
+
+           foreach_oid(childreloid, children)
+           {
+               char         relkind = get_rel_relkind(childreloid);

I think it's better to write some comments summarizing what we're
doing in the loop.

---
+                   relation_name = get_rel_name(childreloid);
+                   ereport(ERROR,
+                           errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                           errmsg("cannot copy from foreign table
\"%s\"", relation_name),
+                           errdetail("Partition \"%s\" is a foreign
table in the partitioned table \"%s\"",
+                                     relation_name,
RelationGetRelationName(rel)),
+                           errhint("Try the COPY (SELECT ...) TO variant."));

I think we don't need "the" in the error message.

It's conventional to put all err*() macros in parentheses (i.e.,
"(errcode(), ...)", it's technically omittable though.

---
+               if (RELKIND_HAS_PARTITIONS(relkind))
+                   continue;
+
+               scan_oids = lappend_oid(scan_oids, childreloid);

find_all_inheritors() returns a list of OIDs of child relations. I
think we can delete relations whose kind is RELKIND_HAS_PARTITIONS()
from the list instead of creating a new list scan_oids. Then, we can
set cstate->partition to the list.

---
        tupDesc = RelationGetDescr(cstate->rel);
+       cstate->partitions = list_copy(scan_oids);
    }

Why do we need to copy the list here?

---
With the patch we have:

   /*
    * If COPY TO source table is a partitioned table, then open each
    * partition and process each individual partition.
    */
   if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
   {
       foreach_oid(scan_oid, cstate->partitions)
       {
           Relation        scan_rel;

           /* We already got the needed lock in BeginCopyTo */
           scan_rel = table_open(scan_oid, NoLock);
           CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
           table_close(scan_rel, NoLock);
       }
   }
   else if (cstate->rel)
       CopyRelTo(cstate, cstate->rel, NULL, &processed);
   else
   {
       /* run the plan --- the dest receiver will send tuples */

I think we can refactor the code structure as follow:

if (cstate->rel)
{
    if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
    {
        do CopyRelTo() for each OIDs in cstate->partition here.
    }
    else
        CopyRelTo(cstate, cstate->rel, NULL, &processed);
}
else
{
 ...

---
+   if (root_rel != NULL)
+   {
+       rootdesc = RelationGetDescr(root_rel);
+       root_slot = table_slot_create(root_rel, NULL);
+       map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
+   }

rootdesc can be declared inside this if statement or we can directly
pass 'RelationGetDescr(root_rel)' to build_attrmap_by_name_if_req().

---
+       /* Deconstruct the tuple ... */
+       if (map != NULL)
+           copyslot = execute_attr_map_slot(map, slot, root_slot);
+       else
+       {
+           slot_getallattrs(slot);
+           copyslot = slot;
+       }

ISTM that the comment "Deconstruct the tuple" needs to move to before
slot_getallattrs(slot).

How about doing "slot = execute_attr_map_slot(map, slot, root_slot);"
instead? (i.e., no need to have 'copyslot')

---
+   if (root_slot != NULL)
+       ExecDropSingleTupleTableSlot(root_slot);
+   table_endscan(scandesc);

We might want to pfree 'map' if we create it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: speedup COPY TO for partitioned table.

От

torikoshia

Дата:

09 октября, 08:59:30

<jian.universality@gmail.com> wrote:
>> 
>> On Fri, Oct 3, 2025 at 8:31 AM torikoshia <torikoshia@oss.nttdata.com> 
>> wrote:
>> >
>> > It’s been about two months since this discussion, and there don’t seem
>> > to be any further comments.
>> > How about updating the patch accordingly?
>> > If there are no new remarks, I’d like to mark the patch as Ready for
>> > Committer.
>> >
>> > >>   +   List       *children = NIL;
>> > >>   ...
>> > >>   +       {
>> > >>   +           children = find_all_inheritors(RelationGetRelid(rel),
>> > >>
>> > >> Since 'children' is only used inside the else if block, I think we
>> > >> don't
>> > >> need the separate "List *children = NIL;" declaration.
>> > >> Instead, it could just be  "List *children =
>> > >> find_all_inheritors(...)".
>> > >>
>> > > you are right.
>> > > ""List *children = find_all_inheritors(...)"."  should be ok.
>> >
>> 
>> hi.
>> 
>> please check the attached v15.

Thanks for updating the patch!
Here are some minor comments.

>   #include "access/tableam.h"
>  +#include "access/table.h"

As in partbounds.c, I think table.h should come before tableam.h in the 
include order.

> +               char         relkind = get_rel_relkind(childreloid);
> +
> +               if (relkind == RELKIND_FOREIGN_TABLE)
> +               {
> +                   char       *relation_name;
> +
> +                   relation_name = get_rel_name(childreloid);

Similar to how relkind is declared, it might be simpler to combine the 
declaration and assignment here.


On 2025-10-09 10:13, Masahiko Sawada wrote:

Thanks for your review!

> RelationGetRelationName(rel)),
> +                           errhint("Try the COPY (SELECT ...) TO 
> variant."));
> 
> I think we don't need "the" in the error message.

I agree. However, I noticed that some existing messages use “the” in 
similar contexts. For example:

           if (rel->rd_rel->relkind == RELKIND_VIEW)
               ereport(ERROR,
                       (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                        errmsg("cannot copy from view \"%s\"",
                               RelationGetRelationName(rel)),
                        errhint("Try the COPY (SELECT ...) TO 
variant.")));

If we want to fix it, I think we should update all similar messages 
together for consistency.


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

09 октября, 10:10:16

On Thu, Oct 9, 2025 at 9:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > please check the attached v15.
> >
> Thank you for working on this!  I've reviewed the v15 patch, and here
> are review comments:
>
> ---
> +           children = find_all_inheritors(RelationGetRelid(rel),
> +                                          AccessShareLock,
> +                                          NULL);
> +
> +           foreach_oid(childreloid, children)
> +           {
> +               char         relkind = get_rel_relkind(childreloid);
>
> I think it's better to write some comments summarizing what we're
> doing in the loop.

sure.

>
> ---
> +                   relation_name = get_rel_name(childreloid);
> +                   ereport(ERROR,
> +                           errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                           errmsg("cannot copy from foreign table
> \"%s\"", relation_name),
> +                           errdetail("Partition \"%s\" is a foreign
> table in the partitioned table \"%s\"",
> +                                     relation_name,
> RelationGetRelationName(rel)),
> +                           errhint("Try the COPY (SELECT ...) TO variant."));
>
> I think we don't need "the" in the error message.
>
> It's conventional to put all err*() macros in parentheses (i.e.,
> "(errcode(), ...)", it's technically omittable though.
>

https://www.postgresql.org/docs/current/error-message-reporting.html
QUOTE:
<<<<>>>>>
The extra parentheses were required before PostgreSQL version 12, but
are now optional.
Here is a more complex example:
.....
<<<<>>>>>

related commit:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=e3a87b4991cc2d00b7a3082abb54c5f12baedfd1
Less parenthesis is generally more readable, I think.


> ---
> +               if (RELKIND_HAS_PARTITIONS(relkind))
> +                   continue;
> +
> +               scan_oids = lappend_oid(scan_oids, childreloid);
>
> find_all_inheritors() returns a list of OIDs of child relations. I
> think we can delete relations whose kind is RELKIND_HAS_PARTITIONS()
> from the list instead of creating a new list scan_oids. Then, we can
> set cstate->partition to the list.
>

yech, we can use foreach_delete_current to delete list elements on the fly.

> ---
>         tupDesc = RelationGetDescr(cstate->rel);
> +       cstate->partitions = list_copy(scan_oids);
>     }
>
> Why do we need to copy the list here?
>

yech, list_copy is not needed.

> ---
> With the patch we have:
>
>    /*
>     * If COPY TO source table is a partitioned table, then open each
>     * partition and process each individual partition.
>     */
>    if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>    {
>        foreach_oid(scan_oid, cstate->partitions)
>        {
>            Relation        scan_rel;
>
>            /* We already got the needed lock in BeginCopyTo */
>            scan_rel = table_open(scan_oid, NoLock);
>            CopyRelTo(cstate, scan_rel, cstate->rel, &processed);
>            table_close(scan_rel, NoLock);
>        }
>    }
>    else if (cstate->rel)
>        CopyRelTo(cstate, cstate->rel, NULL, &processed);
>    else
>    {
>        /* run the plan --- the dest receiver will send tuples */
>
> I think we can refactor the code structure as follow:
>
> if (cstate->rel)
> {
>     if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>     {
>         do CopyRelTo() for each OIDs in cstate->partition here.
>     }
>     else
>         CopyRelTo(cstate, cstate->rel, NULL, &processed);
> }
> else
> {
>  ...

sure, this may increase readability.

> +   if (root_rel != NULL)
> +   {
> +       rootdesc = RelationGetDescr(root_rel);
> +       root_slot = table_slot_create(root_rel, NULL);
> +       map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
> +   }
>
> rootdesc can be declared inside this if statement or we can directly
> pass 'RelationGetDescr(root_rel)' to build_attrmap_by_name_if_req().
>
sure. good idea.

> ---
> +       /* Deconstruct the tuple ... */
> +       if (map != NULL)
> +           copyslot = execute_attr_map_slot(map, slot, root_slot);
> +       else
> +       {
> +           slot_getallattrs(slot);
> +           copyslot = slot;
> +       }
>
> ISTM that the comment "Deconstruct the tuple" needs to move to before
> slot_getallattrs(slot).
>
ok.

> How about doing "slot = execute_attr_map_slot(map, slot, root_slot);"
> instead? (i.e., no need to have 'copyslot')
>

I tried but it seems not possible.
table_scan_getnextslot function require certain type of "slot", if we do
"slot = execute_attr_map_slot(map, slot, root_slot);"
then pointer "slot" type becomes virtual slot, then
it will fail on second time call table_scan_getnextslot


> ---
> +   if (root_slot != NULL)
> +       ExecDropSingleTupleTableSlot(root_slot);
> +   table_endscan(scandesc);
>
> We might want to pfree 'map' if we create it.
>
ok.


Please check the attached v16.

Вложения

v16-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

Chao Li

Дата:

09 октября, 11:14:09

Hi Jian,

Thanks for the patch. After reviewing it, I got a few small comments:

On Oct 9, 2025, at 15:10, jian he <jian.universality@gmail.com> wrote:

Please check the attached v16.
<v16-0001-support-COPY-partitioned_table-TO.patch>

```

+ List *partitions; /* oid list of partition oid for copy to */

```

The comment doesn’t look very good. First, it repeats “oid”; second, as “List *partitions” implies multiple partitions, the comment should use plural OIDs. Maybe change the comment to “/* list of partition OIDs for COPY TO */"

```

+ /*

+ * Collect a list of partitions containing data, so that later

+ * DoCopyTo can copy the data from them.

+ */

+ children = find_all_inheritors(RelationGetRelid(rel),

+ AccessShareLock,

+ NULL);

+ foreach_oid(childreloid, children)

+ {

+ char relkind = get_rel_relkind(childreloid);

+ if (relkind == RELKIND_FOREIGN_TABLE)

+ {

+ char *relation_name;

+ relation_name = get_rel_name(childreloid);

+ ereport(ERROR,

+ errcode(ERRCODE_WRONG_OBJECT_TYPE),

+ errmsg("cannot copy from foreign table \"%s\"", relation_name),

+ errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"",

+ relation_name, RelationGetRelationName(rel)),

+ errhint("Try the COPY (SELECT ...) TO variant."));

+ }

+ if (RELKIND_HAS_PARTITIONS(relkind))

+ children = foreach_delete_current(children, childreloid);

+ }

```

Is it better to move the RELKIND_HAS_PARTIONS() check to before FOREIGH_TABLE check and continue after foreach_delete_current()? Now every childreloid goes through the both checks, if we do the movement, then HAS_PARTIONS child will go through 1 check. This is a tiny optimization.

```

+ if (RELKIND_HAS_PARTITIONS(relkind))

+ children = foreach_delete_current(children, childreloid);

+ }

```

I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the function comment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks for both relations and views. Logically using this macro works, but it may lead to some confusion to code readers.

```

@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,

DestReceiver *dest;

cstate->rel = NULL;

+ cstate->partitions = NIL;

```

Both NULL assignment are not needed as cstate is allocated by palloc0().

```

+static void

+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,

+ uint64 *processed)

```

Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand the current implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone, a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is still simple:

```

processed += CopyRelTo(cstate, …);

```

6. In BeginCopyTo(), “children” list is created before “cstate” is created, it is not allocated under “cstate->copycontext”, so in EndCopyTo(), we should also free memory of “cstate->partitions”.

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

09 октября, 17:50:34

On Thu, Oct 9, 2025 at 4:14 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi Jian,
>
> Thanks for the patch. After reviewing it, I got a few small comments:
>
> On Oct 9, 2025, at 15:10, jian he <jian.universality@gmail.com> wrote:
>
> Please check the attached v16.
> <v16-0001-support-COPY-partitioned_table-TO.patch>
> 1
> ```
> + List   *partitions; /* oid list of partition oid for copy to */
> ```
>
> The comment doesn’t look very good. First, it repeats “oid”; second, as “List *partitions” implies multiple
partitions,the comment should use plural OIDs. Maybe change the comment to “/* list of partition OIDs for COPY TO */" 
>

yech, I need to improve these comments.

> 2
> ```
> + /*
> + * Collect a list of partitions containing data, so that later
> + * DoCopyTo can copy the data from them.
> + */
> + children = find_all_inheritors(RelationGetRelid(rel),
> +   AccessShareLock,
> +   NULL);
> +
> + foreach_oid(childreloid, children)
> + {
> + char relkind = get_rel_relkind(childreloid);
> +
> + if (relkind == RELKIND_FOREIGN_TABLE)
> + {
> + char   *relation_name;
> +
> + relation_name = get_rel_name(childreloid);
> + ereport(ERROR,
> + errcode(ERRCODE_WRONG_OBJECT_TYPE),
> + errmsg("cannot copy from foreign table \"%s\"", relation_name),
> + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"",
> +  relation_name, RelationGetRelationName(rel)),
> + errhint("Try the COPY (SELECT ...) TO variant."));
> + }
> +
> + if (RELKIND_HAS_PARTITIONS(relkind))
> + children = foreach_delete_current(children, childreloid);
> + }
> ```
>
> Is it better to move the RELKIND_HAS_PARTIONS() check to before FOREIGH_TABLE check and continue after
foreach_delete_current()?Now every childreloid goes through the both checks, if we do the movement, then HAS_PARTIONS
childwill go through 1 check. This is a tiny optimization. 
>
I think the current handling is fine.

> 3
> ```
> + if (RELKIND_HAS_PARTITIONS(relkind))
> + children = foreach_delete_current(children, childreloid);
> + }
> ```
>
> I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the
functioncomment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks
forboth relations and views. Logically using this macro works, but it may lead to some confusion to code readers. 
>

find_all_inheritors comments says:
 *        Returns a list of relation OIDs including the given rel plus
 *        all relations that inherit from it, directly or indirectly.

CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);

If we copy partitioned table "pp" data out, but partitioned table "pp_1"
don't have storage, so we have to skip it, using RELKIND_HAS_PARTITIONS
to skip it should be fine.

> 4
> ```
> @@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
>   DestReceiver *dest;
>
>   cstate->rel = NULL;
> + cstate->partitions = NIL;
> ```
>
> Both NULL assignment are not needed as cstate is allocated by palloc0().
>
I guess this is just a code convention. Such not necessary is quite common
within the codebase.

> 5
> ```
> +static void
> +CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
> +  uint64 *processed)
> ```
>
> Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand
thecurrent implementation allows continuous increment while calling this function in a loop. However, it’s a bit
error-prone,a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is
stillsimple: 
>
> ```
> processed += CopyRelTo(cstate, …);
> ```
>
pgstat_progress_update_param was within CopyRelTo.
so we have to pass (uint64 *processed) to CopyRelTo.
Am I missing something?

> 6. In BeginCopyTo(), “children” list is created before “cstate” is created, it is not allocated under
“cstate->copycontext”,so in EndCopyTo(), we should also free memory of “cstate->partitions”. 
>
I think so.

Re: speedup COPY TO for partitioned table.

От

Masahiko Sawada

Дата:

10 октября, 01:02:32

On Thu, Oct 9, 2025 at 12:10 AM jian he <jian.universality@gmail.com> wrote:
>
> On Thu, Oct 9, 2025 at 9:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> > ---
> > +                   relation_name = get_rel_name(childreloid);
> > +                   ereport(ERROR,
> > +                           errcode(ERRCODE_WRONG_OBJECT_TYPE),
> > +                           errmsg("cannot copy from foreign table
> > \"%s\"", relation_name),
> > +                           errdetail("Partition \"%s\" is a foreign
> > table in the partitioned table \"%s\"",
> > +                                     relation_name,
> > RelationGetRelationName(rel)),
> > +                           errhint("Try the COPY (SELECT ...) TO variant."));
> >
> > I think we don't need "the" in the error message.
> >
> > It's conventional to put all err*() macros in parentheses (i.e.,
> > "(errcode(), ...)", it's technically omittable though.
> >
>
> https://www.postgresql.org/docs/current/error-message-reporting.html
> QUOTE:
> <<<<>>>>>
> The extra parentheses were required before PostgreSQL version 12, but
> are now optional.
> Here is a more complex example:
> .....
> <<<<>>>>>
>
> related commit:
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=e3a87b4991cc2d00b7a3082abb54c5f12baedfd1
> Less parenthesis is generally more readable, I think.

Yes, but I think it's more consistent given that we use the
parentheses in all other places in copyto.c.

>
> > How about doing "slot = execute_attr_map_slot(map, slot, root_slot);"
> > instead? (i.e., no need to have 'copyslot')
> >
>
> I tried but it seems not possible.
> table_scan_getnextslot function require certain type of "slot", if we do
> "slot = execute_attr_map_slot(map, slot, root_slot);"
> then pointer "slot" type becomes virtual slot, then
> it will fail on second time call table_scan_getnextslot

Right. Let's keep as it is.

I've attached a patch for cosmetic changes including comment updates,
indent fixes by pgindent, and renaming variable names. Some fixes are
just my taste, so please check the changes.

Also I have a few comments on new tests:

+-- Tests for COPY TO with partitioned tables.
+CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
+CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
+CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
+ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
+ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
+
+CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
+CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
+
+INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
+

I think it's better to have both cases: partitions' rowtype match the
root's rowtype and partition's rowtype doesn't match the root's
rowtype.

---
+-- Test COPY TO with a foreign table or when the foreign table is a partition
+COPY async_p3 TO stdout; --error
+ERROR:  cannot copy from foreign table "async_p3"
+HINT:  Try the COPY (SELECT ...) TO variant.

async_p3 is a foreign table so it seems not related to this patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

v16_fix_masahiko.patch.txt

Re: speedup COPY TO for partitioned table.

От

Chao Li

Дата:

10 октября, 04:02:26

On Oct 9, 2025, at 22:50, jian he <jian.universality@gmail.com> wrote:
3
```
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ children = foreach_delete_current(children, childreloid);
+ }
```

I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the function comment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks for both relations and views. Logically using this macro works, but it may lead to some confusion to code readers.

find_all_inheritors comments says:
* Returns a list of relation OIDs including the given rel plus
* all relations that inherit from it, directly or indirectly.

CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);

If we copy partitioned table "pp" data out, but partitioned table "pp_1"
don't have storage, so we have to skip it, using RELKIND_HAS_PARTITIONS
to skip it should be fine.

My point is that RELKIND_HAS_PARTITIONS is defined as:

#define RELKIND_HAS_PARTITIONS(relkind) \
 ((relkind) == RELKIND_PARTITIONED_TABLE || \
 (relkind) == RELKIND_PARTITIONED_INDEX)

It just checks relkind to be table or index. The example in your explanation seems to not address my concern. Why do we need to check against index?

4
```
@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;

cstate->rel = NULL;
+ cstate->partitions = NIL;
```

Both NULL assignment are not needed as cstate is allocated by palloc0().

I guess this is just a code convention. Such not necessary is quite common
within the codebase.

I don’t agree. cstate has a lot of more fields with pointer types, why don’t set NULL to them?

5
```
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
```

Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand the current implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone, a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is still simple:

```
processed += CopyRelTo(cstate, …);
```

pgstat_progress_update_param was within CopyRelTo.
so we have to pass (uint64 *processed) to CopyRelTo.
Am I missing something?

Make sense. I didn’t notice postage_progress_update_param. So, “processed” is both input and output. In that case, I think the comment for parameter “processed” should be enhanced, for example:

```

* processed: on entry, contains the current count of processed count;

* this function increments it by the number of rows copied

* from this relation and writes back the updated total.

```

Or a short version:

```

* processed: input/output; cumulative count of tuples processed, incremented here.

```

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

10 октября, 05:54:47

On Fri, Oct 10, 2025 at 9:02 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> On Oct 9, 2025, at 22:50, jian he <jian.universality@gmail.com> wrote:
>
> 3
> ```
> + if (RELKIND_HAS_PARTITIONS(relkind))
> + children = foreach_delete_current(children, childreloid);
> + }
> ```
>
> I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the
functioncomment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks
forboth relations and views. Logically using this macro works, but it may lead to some confusion to code readers. 
>
>
> find_all_inheritors comments says:
> *        Returns a list of relation OIDs including the given rel plus
> *        all relations that inherit from it, directly or indirectly.
>
> CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
> CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
> ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
>
> If we copy partitioned table "pp" data out, but partitioned table "pp_1"
> don't have storage, so we have to skip it, using RELKIND_HAS_PARTITIONS
> to skip it should be fine.
>
> My point is that RELKIND_HAS_PARTITIONS is defined as:
>
> #define RELKIND_HAS_PARTITIONS(relkind) \
> ((relkind) == RELKIND_PARTITIONED_TABLE || \
> (relkind) == RELKIND_PARTITIONED_INDEX)
>
> It just checks relkind to be table or index. The example in your explanation seems to not address my concern. Why do
weneed to check against index? 
>

the macro name RELKIND_HAS_PARTITIONS improves the readability, I think.
also we don't need to worry about partitioned index here, because
we are in the
``
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
}
``
loop.

sure we can change it ``if (relkind == RELKIND_PARTITIONED_TABLE)``.

> 5
> ```
> +static void
> +CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
> +  uint64 *processed)
> ```
>
> Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand
thecurrent implementation allows continuous increment while calling this function in a loop. However, it’s a bit
error-prone,a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is
stillsimple: 
>
> ```
> processed += CopyRelTo(cstate, …);
> ```
>
> pgstat_progress_update_param was within CopyRelTo.
> so we have to pass (uint64 *processed) to CopyRelTo.
> Am I missing something?
>
>
> Make sense. I didn’t notice postage_progress_update_param. So, “processed” is both input and output. In that case, I
thinkthe comment for parameter “processed” should be enhanced, for example: 
>
if your function is:

static processed CopyRelationTo(CopyToState cstate, Relation rel,
Relation root_rel, uint64 *processed);

where function return value is also passed as function argument,
I think it will lead to more confusion.

Re: speedup COPY TO for partitioned table.

От

Chao Li

Дата:

10 октября, 06:29:01

On Oct 10, 2025, at 10:54, jian he <jian.universality@gmail.com> wrote:

5
```
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
```

Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand the current implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone, a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is still simple:

```
processed += CopyRelTo(cstate, …);
```

pgstat_progress_update_param was within CopyRelTo.
so we have to pass (uint64 *processed) to CopyRelTo.
Am I missing something?

Make sense. I didn’t notice postage_progress_update_param. So, “processed” is both input and output. In that case, I think the comment for parameter “processed” should be enhanced, for example:

if your function is:

static processed CopyRelationTo(CopyToState cstate, Relation rel,
Relation root_rel, uint64 *processed);

where function return value is also passed as function argument,
I think it will lead to more confusion.

I am not suggesting add a return value to the function. My comment was just to enhance the parameter comment of “processed”.

Best regards,

Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

10 октября, 10:10:11

On Fri, Oct 10, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > > ---
> > > +                   relation_name = get_rel_name(childreloid);
> > > +                   ereport(ERROR,
> > > +                           errcode(ERRCODE_WRONG_OBJECT_TYPE),
> > > +                           errmsg("cannot copy from foreign table
> > > \"%s\"", relation_name),
> > > +                           errdetail("Partition \"%s\" is a foreign
> > > table in the partitioned table \"%s\"",
> > > +                                     relation_name,
> > > RelationGetRelationName(rel)),
> > > +                           errhint("Try the COPY (SELECT ...) TO variant."));
> > >
> > > I think we don't need "the" in the error message.
> > >
> > > It's conventional to put all err*() macros in parentheses (i.e.,
> > > "(errcode(), ...)", it's technically omittable though.
> > >
> >
> > https://www.postgresql.org/docs/current/error-message-reporting.html
> > QUOTE:
> > <<<<>>>>>
> > The extra parentheses were required before PostgreSQL version 12, but
> > are now optional.
> > Here is a more complex example:
> > .....
> > <<<<>>>>>
> >
> > related commit:
> > https://git.postgresql.org/cgit/postgresql.git/commit/?id=e3a87b4991cc2d00b7a3082abb54c5f12baedfd1
> > Less parenthesis is generally more readable, I think.
>
> Yes, but I think it's more consistent given that we use the
> parentheses in all other places in copyto.c.
>

If you look at tablecmds.c, like ATExecSetNotNull, there are
parentheses and no parentheses cases.
Overall, I think less parentheses improves readability and makes the
code more future-proof.


> >
> > > How about doing "slot = execute_attr_map_slot(map, slot, root_slot);"
> > > instead? (i.e., no need to have 'copyslot')
> > >
> >
> > I tried but it seems not possible.
> > table_scan_getnextslot function require certain type of "slot", if we do
> > "slot = execute_attr_map_slot(map, slot, root_slot);"
> > then pointer "slot" type becomes virtual slot, then
> > it will fail on second time call table_scan_getnextslot
>
> Right. Let's keep as it is.
>

> I've attached a patch for cosmetic changes including comment updates,
> indent fixes by pgindent, and renaming variable names. Some fixes are
> just my taste, so please check the changes.
>
thanks!
I have applied most of it. expect points I mentioned in this email.

> Also I have a few comments on new tests:
>
> +-- Tests for COPY TO with partitioned tables.
> +CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
> +CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
> +CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id);
> +ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
> +ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
> +
> +CREATE TABLE pp_15 PARTITION OF pp_1 FOR VALUES FROM (1) TO (5);
> +CREATE TABLE pp_510 PARTITION OF pp_2 FOR VALUES FROM (5) TO (10);
> +
> +INSERT INTO pp SELECT g, 10 + g FROM generate_series(1,6) g;
> +
>
> I think it's better to have both cases: partitions' rowtype match the
> root's rowtype and partition's rowtype doesn't match the root's
> rowtype.
>
sure.

> ---
> +-- Test COPY TO with a foreign table or when the foreign table is a partition
> +COPY async_p3 TO stdout; --error
> +ERROR:  cannot copy from foreign table "async_p3"
> +HINT:  Try the COPY (SELECT ...) TO variant.
>
> async_p3 is a foreign table so it seems not related to this patch.
>
I replied in
https://postgr.es/m/CACJufxGkkMtRUJEbLczRnWp7x2YWqu4r1gEJEv9Po1UPxS6kGQ%40mail.gmail.com
I kind of doubt anyone would submit a patch just to rewrite a coverage test for
the sake of coverage itself.  While we're here, adding nearby coverage tests
should be fine?

i just found out I ignored the case when partitioned tables have RLS.
when exporting a partitioned table,
find_all_inheritors will sort the returned partition by oid.
in DoCopy, we can do the same:
make a SortBy node for SelectStmt->sortClause also mark the
RangeVar->inh as true.
OR
ereport(ERRCODE_FEATURE_NOT_SUPPORTED...) for partitioned tables with RLS.

please see the change I made in DoCopy.

Вложения

v17-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

Álvaro Herrera

Дата:

10 октября, 12:26:50

On 2025-Oct-10, jian he wrote:

> On Fri, Oct 10, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> > Yes, but I think it's more consistent given that we use the
> > parentheses in all other places in copyto.c.
> 
> If you look at tablecmds.c, like ATExecSetNotNull, there are
> parentheses and no parentheses cases.
> Overall, I think less parentheses improves readability and makes the
> code more future-proof.

I strive to remove those extra parentheses when I edit some part of the
code (though I may forget at times), but leave them alone from other
places that I'm not editing.  I don't add them in new code.  Most
likely, this is why ATExecSetNotNull has some cases with them and other
cases without.

Given sufficient time, the Postgres of Theseus would eventually have
zero of those extra parens.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Niemand ist mehr Sklave, als der sich für frei hält, ohne es zu sein."
       Nadie está tan esclavizado como el que se cree libre no siéndolo
                                           (Johann Wolfgang von Goethe)

Re: speedup COPY TO for partitioned table.

От

Masahiko Sawada

Дата:

13 октября, 20:43:22

On Fri, Oct 10, 2025 at 3:04 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Oct-10, jian he wrote:
>
> > On Fri, Oct 10, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > > Yes, but I think it's more consistent given that we use the
> > > parentheses in all other places in copyto.c.
> >
> > If you look at tablecmds.c, like ATExecSetNotNull, there are
> > parentheses and no parentheses cases.
> > Overall, I think less parentheses improves readability and makes the
> > code more future-proof.
>
> I strive to remove those extra parentheses when I edit some part of the
> code (though I may forget at times), but leave them alone from other
> places that I'm not editing.  I don't add them in new code.  Most
> likely, this is why ATExecSetNotNull has some cases with them and other
> cases without.
>
> Given sufficient time, the Postgres of Theseus would eventually have
> zero of those extra parens.

Thank you for the input. I didn't know some files or functions already
have mixed style. I'll use this style for future changes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: speedup COPY TO for partitioned table.

От

Masahiko Sawada

Дата:

13 октября, 23:07:43

On Fri, Oct 10, 2025 at 12:10 AM jian he <jian.universality@gmail.com> wrote:
>
> On Fri, Oct 10, 2025 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > > ---
> > > > +                   relation_name = get_rel_name(childreloid);
> > > > +                   ereport(ERROR,
> > > > +                           errcode(ERRCODE_WRONG_OBJECT_TYPE),
> > > > +                           errmsg("cannot copy from foreign table
> > > > \"%s\"", relation_name),
> > > > +                           errdetail("Partition \"%s\" is a foreign
> > > > table in the partitioned table \"%s\"",
> > > > +                                     relation_name,
> > > > RelationGetRelationName(rel)),
> > > > +                           errhint("Try the COPY (SELECT ...) TO variant."));
> > > >
> > > > I think we don't need "the" in the error message.
> > > >
> > > > It's conventional to put all err*() macros in parentheses (i.e.,
> > > > "(errcode(), ...)", it's technically omittable though.
> > > >
> > >
> > > https://www.postgresql.org/docs/current/error-message-reporting.html
> > > QUOTE:
> > > <<<<>>>>>
> > > The extra parentheses were required before PostgreSQL version 12, but
> > > are now optional.
> > > Here is a more complex example:
> > > .....
> > > <<<<>>>>>
> > >
> > > related commit:
> > > https://git.postgresql.org/cgit/postgresql.git/commit/?id=e3a87b4991cc2d00b7a3082abb54c5f12baedfd1
> > > Less parenthesis is generally more readable, I think.
> >
> > Yes, but I think it's more consistent given that we use the
> > parentheses in all other places in copyto.c.
> >
>
> If you look at tablecmds.c, like ATExecSetNotNull, there are
> parentheses and no parentheses cases.
> Overall, I think less parentheses improves readability and makes the
> code more future-proof.
>

Understood.

>
> > ---
> > +-- Test COPY TO with a foreign table or when the foreign table is a partition
> > +COPY async_p3 TO stdout; --error
> > +ERROR:  cannot copy from foreign table "async_p3"
> > +HINT:  Try the COPY (SELECT ...) TO variant.
> >
> > async_p3 is a foreign table so it seems not related to this patch.
> >
> I replied in
> https://postgr.es/m/CACJufxGkkMtRUJEbLczRnWp7x2YWqu4r1gEJEv9Po1UPxS6kGQ%40mail.gmail.com
> I kind of doubt anyone would submit a patch just to rewrite a coverage test for
> the sake of coverage itself.  While we're here, adding nearby coverage tests
> should be fine?

For me, it's perfectly fine to have patches just for improving the
test coverage and I think we have had such patches ever. Given this
patch expands the supported relation kind, I guess it makes sense to
cover other cases as well in this patch (i.e., foreign tables and
sequences) or to have a separate patch to increase the overall test
coverage of copyto.c.

>
> i just found out I ignored the case when partitioned tables have RLS.
> when exporting a partitioned table,
> find_all_inheritors will sort the returned partition by oid.
> in DoCopy, we can do the same:
> make a SortBy node for SelectStmt->sortClause also mark the
> RangeVar->inh as true.
> OR
> ereport(ERRCODE_FEATURE_NOT_SUPPORTED...) for partitioned tables with RLS.
>
> please see the change I made in DoCopy.

Good catch. However, I guess adding a SortBy node with "tableoid"
doesn't necessarily work in the same way as the 'COPY TO' using
find_all_inheritors():

+           /*
+            * To COPY data from multiple partitions, we rely on the order of
+            * the partitions' tableoids, which matches the order produced by
+            * find_all_inheritors.
+            */

The table list returned by find_all_inheritors() is deterministic, but
it doesn't sort the whole list by their OIDs. If I understand
correctly, it retrieves all descendant tables in a BFS order. For
example, if I create the tables in the following sequence:

create table p (i int) partition by list (i);
create table c12 partition of p for values in (1, 2) partition by list (i);
create table c12_1 partition of c12 for values in (1);
create table c12_2 partition of c12 for values in (2);
create table c3 partition of p for values in (3);
insert into p values (1), (2), (3);
alter table p enable row level security;
create policy policy_p on p using (i > 0);
create user test_user;
grant select on table p to test_user;

I got the result without RLS:

copy p to stdout;
3
1
2

whereas I got the results with RLS:

copy p to stdout;
1
2
3

I think that adding SortBy doesn't help more than making the results
deterministic. Or we can re-sort the OID list returned by
find_all_inheritors() to match it. However, I'm not sure that we need
to make COPY TO results deterministic in the first place. It's not
guaranteed that the order of tuples returned from 'COPY TO rel' where
rel is not a partitioned table is sorted nor even deterministic (e.g.,
due to sync scans). If 'rel' is a partitioned table without RLS, the
order of tables to scan is deterministic but returned tuples within a
single partition is not. Given that sorting the whole results is not
cost free, I'm not sure that guaranteeing this ordering also for
partitioned tables with RLS would be useful for users.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

15 октября, 04:53:12

On Tue, Oct 14, 2025 at 4:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > > +-- Test COPY TO with a foreign table or when the foreign table is a partition
> > > +COPY async_p3 TO stdout; --error
> > > +ERROR:  cannot copy from foreign table "async_p3"
> > > +HINT:  Try the COPY (SELECT ...) TO variant.
> > >
> > > async_p3 is a foreign table so it seems not related to this patch.
> > >
> > I replied in
> > https://postgr.es/m/CACJufxGkkMtRUJEbLczRnWp7x2YWqu4r1gEJEv9Po1UPxS6kGQ%40mail.gmail.com
> > I kind of doubt anyone would submit a patch just to rewrite a coverage test for
> > the sake of coverage itself.  While we're here, adding nearby coverage tests
> > should be fine?
>
> For me, it's perfectly fine to have patches just for improving the
> test coverage and I think we have had such patches ever. Given this
> patch expands the supported relation kind, I guess it makes sense to
> cover other cases as well in this patch (i.e., foreign tables and
> sequences) or to have a separate patch to increase the overall test
> coverage of copyto.c.
>

Let's have a  seperate patch to handle COPY test coverage.

> >
> > i just found out I ignored the case when partitioned tables have RLS.
> > when exporting a partitioned table,
> > find_all_inheritors will sort the returned partition by oid.
> > in DoCopy, we can do the same:
> > make a SortBy node for SelectStmt->sortClause also mark the
> > RangeVar->inh as true.
> > OR
> > ereport(ERRCODE_FEATURE_NOT_SUPPORTED...) for partitioned tables with RLS.
> >
> > please see the change I made in DoCopy.
>
> Good catch. However, I guess adding a SortBy node with "tableoid"
> doesn't necessarily work in the same way as the 'COPY TO' using
> find_all_inheritors():
>
> +           /*
> +            * To COPY data from multiple partitions, we rely on the order of
> +            * the partitions' tableoids, which matches the order produced by
> +            * find_all_inheritors.
> +            */
>
> The table list returned by find_all_inheritors() is deterministic, but
> it doesn't sort the whole list by their OIDs. If I understand
> correctly, it retrieves all descendant tables in a BFS order. For
> example, if I create the tables in the following sequence:
>
> create table p (i int) partition by list (i);
> create table c12 partition of p for values in (1, 2) partition by list (i);
> create table c12_1 partition of c12 for values in (1);
> create table c12_2 partition of c12 for values in (2);
> create table c3 partition of p for values in (3);
> insert into p values (1), (2), (3);
> alter table p enable row level security;
> create policy policy_p on p using (i > 0);
> create user test_user;
> grant select on table p to test_user;
>
> I got the result without RLS:
>
> copy p to stdout;
> 3
> 1
> 2
>
> whereas I got the results with RLS:
>
> copy p to stdout;
> 1
> 2
> 3
>
> I think that adding SortBy doesn't help more than making the results
> deterministic. Or we can re-sort the OID list returned by
> find_all_inheritors() to match it. However, I'm not sure that we need
> to make COPY TO results deterministic in the first place. It's not
> guaranteed that the order of tuples returned from 'COPY TO rel' where
> rel is not a partitioned table is sorted nor even deterministic (e.g.,
> due to sync scans). If 'rel' is a partitioned table without RLS, the
> order of tables to scan is deterministic but returned tuples within a
> single partition is not. Given that sorting the whole results is not
> cost free, I'm not sure that guaranteeing this ordering also for
> partitioned tables with RLS would be useful for users.
>

I removed the "SortBy node", and also double checked the patch again.
Please check the attached v18.

Вложения

v18-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

Masahiko Sawada

Дата:

16 октября, 04:21:10

On Tue, Oct 14, 2025 at 6:53 PM jian he <jian.universality@gmail.com> wrote:
>
> On Tue, Oct 14, 2025 at 4:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > > +-- Test COPY TO with a foreign table or when the foreign table is a partition
> > > > +COPY async_p3 TO stdout; --error
> > > > +ERROR:  cannot copy from foreign table "async_p3"
> > > > +HINT:  Try the COPY (SELECT ...) TO variant.
> > > >
> > > > async_p3 is a foreign table so it seems not related to this patch.
> > > >
> > > I replied in
> > > https://postgr.es/m/CACJufxGkkMtRUJEbLczRnWp7x2YWqu4r1gEJEv9Po1UPxS6kGQ%40mail.gmail.com
> > > I kind of doubt anyone would submit a patch just to rewrite a coverage test for
> > > the sake of coverage itself.  While we're here, adding nearby coverage tests
> > > should be fine?
> >
> > For me, it's perfectly fine to have patches just for improving the
> > test coverage and I think we have had such patches ever. Given this
> > patch expands the supported relation kind, I guess it makes sense to
> > cover other cases as well in this patch (i.e., foreign tables and
> > sequences) or to have a separate patch to increase the overall test
> > coverage of copyto.c.
> >
>
> Let's have a  seperate patch to handle COPY test coverage.
>
> > >
> > > i just found out I ignored the case when partitioned tables have RLS.
> > > when exporting a partitioned table,
> > > find_all_inheritors will sort the returned partition by oid.
> > > in DoCopy, we can do the same:
> > > make a SortBy node for SelectStmt->sortClause also mark the
> > > RangeVar->inh as true.
> > > OR
> > > ereport(ERRCODE_FEATURE_NOT_SUPPORTED...) for partitioned tables with RLS.
> > >
> > > please see the change I made in DoCopy.
> >
> > Good catch. However, I guess adding a SortBy node with "tableoid"
> > doesn't necessarily work in the same way as the 'COPY TO' using
> > find_all_inheritors():
> >
> > +           /*
> > +            * To COPY data from multiple partitions, we rely on the order of
> > +            * the partitions' tableoids, which matches the order produced by
> > +            * find_all_inheritors.
> > +            */
> >
> > The table list returned by find_all_inheritors() is deterministic, but
> > it doesn't sort the whole list by their OIDs. If I understand
> > correctly, it retrieves all descendant tables in a BFS order. For
> > example, if I create the tables in the following sequence:
> >
> > create table p (i int) partition by list (i);
> > create table c12 partition of p for values in (1, 2) partition by list (i);
> > create table c12_1 partition of c12 for values in (1);
> > create table c12_2 partition of c12 for values in (2);
> > create table c3 partition of p for values in (3);
> > insert into p values (1), (2), (3);
> > alter table p enable row level security;
> > create policy policy_p on p using (i > 0);
> > create user test_user;
> > grant select on table p to test_user;
> >
> > I got the result without RLS:
> >
> > copy p to stdout;
> > 3
> > 1
> > 2
> >
> > whereas I got the results with RLS:
> >
> > copy p to stdout;
> > 1
> > 2
> > 3
> >
> > I think that adding SortBy doesn't help more than making the results
> > deterministic. Or we can re-sort the OID list returned by
> > find_all_inheritors() to match it. However, I'm not sure that we need
> > to make COPY TO results deterministic in the first place. It's not
> > guaranteed that the order of tuples returned from 'COPY TO rel' where
> > rel is not a partitioned table is sorted nor even deterministic (e.g.,
> > due to sync scans). If 'rel' is a partitioned table without RLS, the
> > order of tables to scan is deterministic but returned tuples within a
> > single partition is not. Given that sorting the whole results is not
> > cost free, I'm not sure that guaranteeing this ordering also for
> > partitioned tables with RLS would be useful for users.
> >
>
> I removed the "SortBy node", and also double checked the patch again.
> Please check the attached v18.

Thank you for updating the patch!

I've reviewed the patch and here is one review comment:

            from->inh = false;  /* apply ONLY */
+           if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE)
+               from->inh = true;

It's better to check rel->rd_rel->relkind instead of calling
get_rel_relkind() as it checks syscache.

I've attached a patch to fix the above and includes some cosmetic
changes. Please review it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

v18_masahiko_fix.patch

Re: speedup COPY TO for partitioned table.

От

jian he

Дата:

16 октября, 05:56:28

On Thu, Oct 16, 2025 at 9:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Please check the attached v18.
>
> Thank you for updating the patch!
>
> I've reviewed the patch and here is one review comment:
>
>             from->inh = false;  /* apply ONLY */
> +           if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE)
> +               from->inh = true;
>
> It's better to check rel->rd_rel->relkind instead of calling
> get_rel_relkind() as it checks syscache.
>
> I've attached a patch to fix the above and includes some cosmetic
> changes. Please review it.
>

hi.

overall looks good to me, thanks for polishing it.

+ * However, when copying data from a partitioned table, we don't
+ * not use "ONLY", since we need to retrieve rows from its
+ * descendant tables too.

I guess here it should be
"we don't use "ONLY"
?

I’ve incorporated your changes into v19.

Вложения

v19-0001-support-COPY-partitioned_table-TO.patch

Re: speedup COPY TO for partitioned table.

От

Masahiko Sawada

Дата:

17 октября, 01:01:22

On Wed, Oct 15, 2025 at 7:57 PM jian he <jian.universality@gmail.com> wrote:
>
> On Thu, Oct 16, 2025 at 9:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > Please check the attached v18.
> >
> > Thank you for updating the patch!
> >
> > I've reviewed the patch and here is one review comment:
> >
> >             from->inh = false;  /* apply ONLY */
> > +           if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE)
> > +               from->inh = true;
> >
> > It's better to check rel->rd_rel->relkind instead of calling
> > get_rel_relkind() as it checks syscache.
> >
> > I've attached a patch to fix the above and includes some cosmetic
> > changes. Please review it.
> >
>
> hi.
>
> overall looks good to me, thanks for polishing it.
>
> + * However, when copying data from a partitioned table, we don't
> + * not use "ONLY", since we need to retrieve rows from its
> + * descendant tables too.
>
> I guess here it should be
> "we don't use "ONLY"
> ?

Right, thank you for pointing it out.

> I’ve incorporated your changes into v19.

Thank you!

I think the patch is in good shape. I've slightly changed the
documentation changes and updated the commit message. I'm going to
push the attached patch, if there are no objections or further review
comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

v20-0001-Support-COPY-TO-for-partitioned-tables.patch

Re: speedup COPY TO for partitioned table.

От

Masahiko Sawada

Дата:

20 октября, 23:06:23

On Thu, Oct 16, 2025 at 3:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Oct 15, 2025 at 7:57 PM jian he <jian.universality@gmail.com> wrote:
> >
> > On Thu, Oct 16, 2025 at 9:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > Please check the attached v18.
> > >
> > > Thank you for updating the patch!
> > >
> > > I've reviewed the patch and here is one review comment:
> > >
> > >             from->inh = false;  /* apply ONLY */
> > > +           if (get_rel_relkind(relid) == RELKIND_PARTITIONED_TABLE)
> > > +               from->inh = true;
> > >
> > > It's better to check rel->rd_rel->relkind instead of calling
> > > get_rel_relkind() as it checks syscache.
> > >
> > > I've attached a patch to fix the above and includes some cosmetic
> > > changes. Please review it.
> > >
> >
> > hi.
> >
> > overall looks good to me, thanks for polishing it.
> >
> > + * However, when copying data from a partitioned table, we don't
> > + * not use "ONLY", since we need to retrieve rows from its
> > + * descendant tables too.
> >
> > I guess here it should be
> > "we don't use "ONLY"
> > ?
>
> Right, thank you for pointing it out.
>
> > I’ve incorporated your changes into v19.
>
> Thank you!
>
> I think the patch is in good shape. I've slightly changed the
> documentation changes and updated the commit message. I'm going to
> push the attached patch, if there are no objections or further review
> comments.

Pushed.

Regards

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: speedup COPY TO for partitioned table.

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения