Обсуждение: In-order pg_dump (or in-order COPY TO)

Поиск

Список

Период

Сортировка

In-order pg_dump (or in-order COPY TO)

От

Dimitrios Apostolou

Дата:

26 августа, 22:43:44

Hello list,

I am storing dumps of a database (pg_dump custom format) in a
de-duplicating backup server. Each dump is many terabytes in size, so
deduplication is very important. And de-duplication itself is based on
rolling checksums which is pretty flexible, it can compensate for blocks
moving by some offset.

Unfortunately after I did pg_restore to a new server, I notice that the
dumps from the new server are not being de-duplicated, all blocks are
considered new.

This means that the data has been significantly altered. The new dumps
contain the same rows but probably in very different order. Could the
row-order have changed when doing COPY FROM with pg_restore? No idea,
but now that I think about it this can happen by many operations, like
CLUSTER, VACUUM FULL etc so the question still applies.

A *logical* dump of data shouldn't be affected by on-disk order.
Internal representation shouldn't affect the output.

This makes me wonder: Is there a way to COPY TO in primary-key order?

If that is possible, then pg_dump could make use of it.


Thanks in advance,
Dimitris

Re: In-order pg_dump (or in-order COPY TO)

От

"David G. Johnston"

Дата:

26 августа, 23:31:12

On Tue, Aug 26, 2025 at 12:43 PM Dimitrios Apostolou <jimis@gmx.net> wrote:

Could the
row-order have changed when doing COPY FROM with pg_restore?

There is no reliable, meaningful, row ordering when it comes to the physical files. Sure, cluster does make an attempt, but it is quite limited in practice.

A *logical* dump of data shouldn't be affected by on-disk order.
Internal representation shouldn't affect the output.

The logical dump has no ordering - it will come out however it comes out. "COPY <table> TO ..." doesn't have an order by clause - there is no way to make or communicate to it that ordering is important. For adhoc work you can use "COPY <query> TO ..." and put and order by in the query.

David J.

Re: In-order pg_dump (or in-order COPY TO)

От

Ron Johnson

Дата:

27 августа, 01:00:09

On Tue, Aug 26, 2025 at 3:44 PM Dimitrios Apostolou <jimis@gmx.net> wrote:

Hello list,

I am storing dumps of a database (pg_dump custom format) in a
de-duplicating backup server. Each dump is many terabytes in size, so
deduplication is very important. And de-duplication itself is based on
rolling checksums which is pretty flexible, it can compensate for blocks
moving by some offset.

This might be a silly question, but why are you using -Fc to create multi-TB dumps instead of -Fd? If nothing else, the parallelization granted by -Fd will greatly speed up the dumps.

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

Re: In-order pg_dump (or in-order COPY TO)

От

Tom Lane

Дата:

27 августа, 01:08:19

Dimitrios Apostolou <jimis@gmx.net> writes:
> Unfortunately after I did pg_restore to a new server, I notice that the
> dumps from the new server are not being de-duplicated, all blocks are
> considered new.

> This means that the data has been significantly altered. The new dumps 
> contain the same rows but probably in very different order. Could the 
> row-order have changed when doing COPY FROM with pg_restore?

I'd expect pg_dump/pg_restore to preserve the physical row ordering,
simply because it doesn't do anything that would change that.

However, restoring into an empty table would result in a table with
minimal free space, whereas the original table probably had a
meaningful amount of free space thanks to updates and deletes.  Thus
for example TIDs would not be the same.  If your "rolling checksum"
methodology is at all sensitive to page boundaries, the table would
look quite different to it.

            regards, tom lane

Re: In-order pg_dump (or in-order COPY TO)

От

Ron Johnson

Дата:

27 августа, 01:08:39

On Tue, Aug 26, 2025 at 4:31 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tue, Aug 26, 2025 at 12:43 PM Dimitrios Apostolou <jimis@gmx.net> wrote:
Could the
row-order have changed when doing COPY FROM with pg_restore?

There is no reliable, meaningful, row ordering when it comes to the physical files. Sure, cluster does make an attempt, but it is quite limited in practice.

A *logical* dump of data shouldn't be affected by on-disk order.
Internal representation shouldn't affect the output.

The logical dump has no ordering - it will come out however it comes out. "COPY <table> TO ..." doesn't have an order by clause - there is no way to make or communicate to it that ordering is important.

Doesn't COPY TO copy out records in the order they appeared in the physical files? That _seems_ to mean that the records laid down by COPY FROM should be in the same order as they were in the old dump files.

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

Re: In-order pg_dump (or in-order COPY TO)

От

Ron Johnson

Дата:

27 августа, 01:12:52

On Tue, Aug 26, 2025 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dimitrios Apostolou <jimis@gmx.net> writes:
> Unfortunately after I did pg_restore to a new server, I notice that the
> dumps from the new server are not being de-duplicated, all blocks are
> considered new.

> This means that the data has been significantly altered. The new dumps
> contain the same rows but probably in very different order. Could the
> row-order have changed when doing COPY FROM with pg_restore?

I'd expect pg_dump/pg_restore to preserve the physical row ordering,
simply because it doesn't do anything that would change that.

However, restoring into an empty table would result in a table with
minimal free space, whereas the original table probably had a
meaningful amount of free space thanks to updates and deletes. Thus
for example TIDs would not be the same. If your "rolling checksum"
methodology is at all sensitive to page boundaries, the table would
look quite different to it.

But the rolling checksums are against a pg_dump file, not a pg_basebackup file.

What probably changed are table OIDs. Would that change the ordering of COPY data in post-restore dump files?

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

Re: In-order pg_dump (or in-order COPY TO)

От

Tom Lane

Дата:

27 августа, 01:17:28

Ron Johnson <ronljohnsonjr@gmail.com> writes:
> On Tue, Aug 26, 2025 at 4:31 PM David G. Johnston <
> david.g.johnston@gmail.com> wrote:
>> The logical dump has no ordering - it will come out however it comes out.
>> "COPY <table> TO ..." doesn't have an order by clause - there is no way to
>> make or communicate to it that ordering is important.

> Doesn't COPY TO copy out records in the order they appeared in the physical
> files?

It emits whatever a sequential-scan plan would emit.  If you set
synchronize_seqscans = off (which pg_dump does), that will match
physical row order.

At least with our standard table AM.  If you're using Aurora or
one of those other PG forks with proprietary storage layers,
you'd have to ask them.

I suspect the OP's problem is not row order per se, but differing
TIDs or XIDs, which are things pg_dump does not endeavor to
replicate.  Or, given that he said something about blocks, maybe
he's actually sensitive to where the free space is.

            regards, tom lane

Re: In-order pg_dump (or in-order COPY TO)

От

Tom Lane

Дата:

27 августа, 01:23:45

Ron Johnson <ronljohnsonjr@gmail.com> writes:
> On Tue, Aug 26, 2025 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'd expect pg_dump/pg_restore to preserve the physical row ordering,
>> simply because it doesn't do anything that would change that.

> But the rolling checksums are against a pg_dump file, not a pg_basebackup
> file.

Oh, that wasn't clear to me.

> What probably changed are table OIDs.  Would that change the ordering of
> COPY data in post-restore dump files?

It would not change the order of data within any one table.  There are
corner cases in which different OID assignments can cause pg_dump to
emit database objects in a different order, see this recent thread:

https://www.postgresql.org/message-id/flat/20250707192654.9e.nmisch%40google.com

            regards, tom lane

Re: In-order pg_dump (or in-order COPY TO)

От

Adrian Klaver

Дата:

27 августа, 01:54:52

On 8/26/25 12:43, Dimitrios Apostolou wrote:
> Hello list,
> 
> I am storing dumps of a database (pg_dump custom format) in a de- 
> duplicating backup server. Each dump is many terabytes in size, so 
> deduplication is very important. And de-duplication itself is based on 
> rolling checksums which is pretty flexible, it can compensate for blocks 
> moving by some offset.
> 
> Unfortunately after I did pg_restore to a new server, I notice that the
> dumps from the new server are not being de-duplicated, all blocks are
> considered new.

What are the pg_dump/pg_restore commands?

What are the Postgres versions involved?

Are they community versions of Postgres or something else?

What is the depduplication program?


> Thanks in advance,
> Dimitris
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: In-order pg_dump (or in-order COPY TO)

От

Dimitrios Apostolou

Дата:

27 августа, 15:09:39

On Wednesday 2025-08-27 00:54, Adrian Klaver wrote:

>Date: Wed, 27 Aug 2025 00:54:52
>From: Adrian Klaver <adrian.klaver@aklaver.com>
>To: Dimitrios Apostolou <jimis@gmx.net>, pgsql-general@lists.postgresql.org
>Subject: Re: In-order pg_dump (or in-order COPY TO)
>
> On 8/26/25 12:43, Dimitrios Apostolou wrote:
>>  Hello list,
>>
>>  I am storing dumps of a database (pg_dump custom format) in a de-
>>  duplicating backup server. Each dump is many terabytes in size, so
>>  deduplication is very important. And de-duplication itself is based on
>>  rolling checksums which is pretty flexible, it can compensate for blocks
>>  moving by some offset.
>>
>>  Unfortunately after I did pg_restore to a new server, I notice that the
>>  dumps from the new server are not being de-duplicated, all blocks are
>>  considered new.
>

> What are the pg_dump/pg_restore commands?
>
> What are the Postgres versions involved?
>
> Are they community versions of Postgres or something else?
>
> What is the depduplication program?
>
>

Dump is from PostgreSQL 16, it's pg_dump writing to stdout:

pg_dump -v --format=custom --compress=none --no-toast-compression --serializable-deferrable db_name  |  borg create ...

As you can see the backup (and deduplicating) program is borgbackup.

Restore is in PostgreSQL 17:

I first create the empty tables by running the DDL commands in version
control to setup the database. And then I do pg_restore --data-only:

pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public --section=data  dump_file

Worth noting is that the above pg_restore goes through the WAL, i.e. all
writes are done by walwriter, not the backend directly.

Postgres is standard open source running on own server. It has a couple
of custom patches that shouldn't matter in this codepath.

>>  Thanks in advance,
>>  Dimitris

Re: In-order pg_dump (or in-order COPY TO)

От

Dimitrios Apostolou

Дата:

27 августа, 15:34:31

On Wednesday 2025-08-27 00:08, Tom Lane wrote:

> However, restoring into an empty table would result in a table with
> minimal free space, whereas the original table probably had a
> meaningful amount of free space thanks to updates and deletes.  Thus
> for example TIDs would not be the same.  If your "rolling checksum"
> methodology is at all sensitive to page boundaries, the table would
> look quite different to it.

Thanks Tom. I'm not following how the empty space in the tables could
affect the custom format dumps. Where can I read more about these TIDs?
Are they stored in the pg_dump custom format archive?

The rolling checksum method should iron out shifting of data chunks that
are around a couple MB in size. Shorter shifts will not be caught, and
I assume that the "page boundaries" changes you mentioned would happen
every 8KB. So that is definitely too fine grained for the deduplicated
algorithm.

FYI something that I forgot to mention is that pg_restore is --data-only
and writes go through walwriter. The database with the tables has been
created from scratch so every table is empty before the pg_restore. Not
sure how this affects the above.

> I'd expect pg_dump/pg_restore to preserve the physical row ordering,
> simply because it doesn't do anything that would change that.

Regardless of my specific case, it's scary to think that doing VACUUM
FULL, CLUSTER, or who knows what other maintenance command, will modify
the logical dumps. Some implicit ordering could be enforced by pg_dump
if possible, for example when a primary key exists. Does it make sense?
Is it even possible?

Thanks,
Dimitris

Re: In-order pg_dump (or in-order COPY TO)

От

Dimitrios Apostolou

Дата:

27 августа, 15:40:57

On Tuesday 2025-08-26 22:31, David G. Johnston wrote:

>The logical dump has no ordering - it will come out however it comes out.  "COPY <table> TO ..." doesn't have an order
byclause - there is no way to make or communicate to it that ordering is important.  For adhoc work you can use "COPY
<query>TO ..." and put and order by in the query. 

Thank you, so it's not possible currently.

How would "COPY <query> TO" behave for copying very large tables?
Would it make sense to optionally have that in pg_dump?

Or would it make sense as a new feature, to optionally order "COPY
<table> TO ..." based on primary key where available, and use that in
pg_dump option?

Dimitris

Re: In-order pg_dump (or in-order COPY TO)

От

Dimitrios Apostolou

Дата:

27 августа, 15:52:39

On Wednesday 2025-08-27 00:00, Ron Johnson wrote:
>
>This might be a silly question, but why are you using -Fc to create multi-TB dumps instead of -Fd?  If nothing else,
theparallelization granted by -Fd will greatly speed up the dumps. 

Hi Ron,

the primary reason is space. With -Fc I don't have to store the 10TB
archive locally. And I don't have to wait for pg_dump to finish. I pipe
it directly to the "borg create" command which does compression and
deduplication and sends new chunks to a remote borgbackup server.

Regards,
Dimitris

P.S. please use "Reply-all" when replying, I've missed quite a few
      replies that I found on the list archives. :-)

Re: In-order pg_dump (or in-order COPY TO)

От

Greg Sabino Mullane

Дата:

27 августа, 17:06:40

On Tue, Aug 26, 2025 at 3:44 PM Dimitrios Apostolou <jimis@gmx.net> wrote:

I am storing dumps of a database (pg_dump custom format) in a
de-duplicating backup server. Each dump is many terabytes in size, so
deduplication is very important. And de-duplication itself is based on
rolling checksums which is pretty flexible, it can compensate for blocks
moving by some offset.

I suggest looking into pgBackRest, and it's block incremental feature, which sounds similar to what you are doing. But it also does it with parallel processes, and can do things like grab backup files from your replicas, plus a lot of other features.

Cheers,

Greg

Crunchy Data - https://www.crunchydata.com

Enterprise Postgres Software Products & Tech Support

Re: In-order pg_dump (or in-order COPY TO)

От

Tom Lane

Дата:

27 августа, 17:16:33

Dimitrios Apostolou <jimis@gmx.net> writes:
> Dump is from PostgreSQL 16, it's pg_dump writing to stdout:

> pg_dump -v --format=custom --compress=none --no-toast-compression --serializable-deferrable db_name  |  borg create
...

Don't use --format=custom (and not -v either).  That causes pg_dump to
include the OIDs and pg_dump object IDs of all the tables and other
objects, which will all be different in a dump from the new server.
The actual data contents of the tables should be the same, but
apparently the differences in the entry headers are enough to
mislead borgbackup.

You might be well advised to manually examine the data you are
stuffing into borgbackup.  Right now we seem to be operating on
hypotheses, not facts, about what that looks like and how it's
different between your old and new server.

            regards, tom lane

Re: In-order pg_dump (or in-order COPY TO)

От

Ron Johnson

Дата:

27 августа, 17:31:24

On Wed, Aug 27, 2025 at 10:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dimitrios Apostolou <jimis@gmx.net> writes:
> Dump is from PostgreSQL 16, it's pg_dump writing to stdout:

> pg_dump -v --format=custom --compress=none --no-toast-compression --serializable-deferrable db_name | borg create ...

Don't use --format=custom (and not -v either). That causes pg_dump to
include the OIDs and pg_dump object IDs of all the tables and other
objects,

That's interesting. Why? (Since isn't it supposed to be Bad to rely on OIDs?)

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

Re: In-order pg_dump (or in-order COPY TO)

От

Tom Lane

Дата:

27 августа, 17:42:57

Ron Johnson <ronljohnsonjr@gmail.com> writes:
> On Wed, Aug 27, 2025 at 10:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Don't use --format=custom (and not -v either).  That causes pg_dump to
>> include the OIDs and pg_dump object IDs of all the tables and other
>> objects,

> That's interesting.  Why?  (Since isn't it supposed to be Bad to rely on
> OIDs?)

-v in a text-format dump includes that data for debugging purposes:

--
-- TOC entry 1401 (class 1255 OID 16499)
-- Name: fipshash(text); Type: FUNCTION; Schema: public; Owner: postgres
--

(The "TOC entry" comment line wouldn't be there without -v.)
Then custom format has to store the same info so that pg_restore
can produce this identical text output on demand.

Yeah, this is all pretty historical, but nobody wants to change it
at this point.

            regards, tom lane

Re: In-order pg_dump (or in-order COPY TO)

От

Ron Johnson

Дата:

27 августа, 18:12:50

On Wed, Aug 27, 2025 at 10:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ron Johnson <ronljohnsonjr@gmail.com> writes:
> On Wed, Aug 27, 2025 at 10:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Don't use --format=custom (and not -v either). That causes pg_dump to
>> include the OIDs and pg_dump object IDs of all the tables and other
>> objects,

> That's interesting. Why? (Since isn't it supposed to be Bad to rely on
> OIDs?)

-v in a text-format dump includes that data for debugging purposes:

--
-- TOC entry 1401 (class 1255 OID 16499)
-- Name: fipshash(text); Type: FUNCTION; Schema: public; Owner: postgres
--

(The "TOC entry" comment line wouldn't be there without -v.)
Then custom format has to store the same info so that pg_restore
can produce this identical text output on demand.

Ah, so the culprit is "-v". I like using -v, redirecting it to a log file (more info is almost always better), but then I rarely use pg_dump, and never pipe it to de-duplicators. (ExaGrid is supposed to deduplicate, but that's not going to stop me from using pgbackrest, compression and encryption; PCI auditors care about that, not deduplication.)

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

Re: In-order pg_dump (or in-order COPY TO)

От

Adrian Klaver

Дата:

27 августа, 18:25:15

On 8/27/25 05:09, Dimitrios Apostolou wrote:
> On Wednesday 2025-08-27 00:54, Adrian Klaver wrote:
> 

>> What are the pg_dump/pg_restore commands?
>>
>> What are the Postgres versions involved?
>>
>> Are they community versions of Postgres or something else?
>>
>> What is the depduplication program?
>>
>>
> 
>

Comments in line below.

> Dump is from PostgreSQL 16, it's pg_dump writing to stdout:
> 
> pg_dump -v --format=custom --compress=none --no-toast-compression -- 
> serializable-deferrable db_name  |  borg create ...
> 
> 
> As you can see the backup (and deduplicating) program is borgbackup.

Ok, I use BorgBackup and it is fairly forgiving of normal changes.

FYI, if you ever want to use compression check out gzip --rsyncable, I 
have found it plays well with Borg. For more  information see:

https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/

> 
> 
> Restore is in PostgreSQL 17:
> 
> I first create the empty tables by running the DDL commands in version 
> control to setup the database. And then I do pg_restore --data-only:
> 
> pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public -- 
> section=data  dump_file

If you are using only the --data section why not --data-only in the pg_dump?

Or is the pg_dump output used for other purposes?

> 
> 
> Worth noting is that the above pg_restore goes through the WAL, i.e. all 
> writes are done by walwriter, not the backend directly.

Please explain the above further.

The problem occurs when you do the pg_dump after this restore, correct?

Is it the same pg_dump command as you show above?

> 
> Postgres is standard open source running on own server. It has a couple 
> of custom patches that shouldn't matter in this codepath.

For completeness and just in case they may affect the output what do the 
patches do?

> 
> 
>>>  Thanks in advance,
>>>  Dimitris

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: In-order pg_dump (or in-order COPY TO)

От

Dimitrios Apostolou

Дата:

27 августа, 19:10:58

On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:

> Comments in line below.
>
>>  Dump is from PostgreSQL 16, it's pg_dump writing to stdout:
>>
>>  pg_dump -v --format=custom --compress=none --no-toast-compression --
>>  serializable-deferrable db_name  |  borg create ...
>>
>>
>>  As you can see the backup (and deduplicating) program is borgbackup.
>
> Ok, I use BorgBackup and it is fairly forgiving of normal changes.
>
> FYI, if you ever want to use compression check out gzip --rsyncable, I have
> found it plays well with Borg. For more  information see:
>
> https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/

Yes, zstd has also --rsyncable. In this case I let borg do per-chunk
compression after deduplication, it has worked well so far.

>>  Restore is in PostgreSQL 17:
>>
>>  I first create the empty tables by running the DDL commands in version
>>  control to setup the database. And then I do pg_restore --data-only:
>>
>>  pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public --
>>  section=data  dump_file
>
> If you are using only the --data section why not --data-only in the pg_dump?

I want the dump to be as complete as possible. Didn't think it would
create issues.

>
> Or is the pg_dump output used for other purposes?

It has happened that I have selectively restored user schemas from that
dump.

>>  Worth noting is that the above pg_restore goes through the WAL, i.e. all
>>  writes are done by walwriter, not the backend directly.
>
> Please explain the above further.

The COPY FROM data is going through the WAL, as usual INSERTS do. The
writes to disk happen by the walwriter process.

OTOH, If you have configured the server with wal_level=minimal and
BEGIN a transaction, CREATE or TRUNCATE a table, and then COPY FROM into
that table, then the backend process writes directly to the table
without logging to the WAL.

This can be much faster, but most importantly it avoids situations of
WAL overflow that are very difficult to predict and can mess your server
up completely. [1]

[1] https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net

My patches are for activating that codepath in pg_restore, but they were
not used on purpose and I took notice that the writes went via WAL.

>
> The problem occurs when you do the pg_dump after this restore, correct?

Correct. The first pg_dump from the restored pg17 is not deduplicated at
all. Most of the tables have not changed (logically at least; apparently
they have changed physically).

>
> Is it the same pg_dump command as you show above?

Yes.

>
>>
>>  Postgres is standard open source running on own server. It has a couple of
>>  custom patches that shouldn't matter in this codepath.
>
> For completeness and just in case they may affect the output what do the
> patches do?

Two patches for speeding up scanning an archive without TOC, like the
one I'm having (because it is piped into borg, instead of written to
file). These were activated, but shouldn't matter. They only build the
TOC in pg_restore's memory.

https://commitfest.postgresql.org/patch/5809/
https://commitfest.postgresql.org/patch/5817/

And two patches for speeding up pg_restore like mentioned above, under
specific arguments that I didn't provide. (one speedup needs --clean,
and the other needs --freeze).

https://commitfest.postgresql.org/patch/5821/
https://commitfest.postgresql.org/patch/5826/

IIRC I did not activate them (via --clean) because TRUNCATE fails when
foreign keys exist. See the discussion threads.

Dimitris

Re: In-order pg_dump (or in-order COPY TO)

От

Adrian Klaver

Дата:

30 августа, 21:19:13

On 8/27/25 09:10, Dimitrios Apostolou wrote:
> 
> On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:


>>
>> For completeness and just in case they may affect the output what do 
>> the patches do?
> 
> Two patches for speeding up scanning an archive without TOC, like the 
> one I'm having (because it is piped into borg, instead of written to 
> file). These were activated, but shouldn't matter. They only build the 
> TOC in pg_restore's memory.

Are you sure about that?

I just did:

pg_dump -Fc --compress=none --no-toast-compression -d test -U postgres | 
borg create --stats --stdin-name pg_file  --stdin-user aklaver 
--stdin-group aklaver borg_test/::PgTest -

Then:

borg mount borg_test/ mnt_tmp/
cd mnt_tmp/PgTest/

and then:

pg_restore -l pg_file

and I got a TOC.

Or are you streaming the data out of the Borg archive?

> 
> https://commitfest.postgresql.org/patch/5809/
> https://commitfest.postgresql.org/patch/5817/
> 
> And two patches for speeding up pg_restore like mentioned above, under 
> specific arguments that I didn't provide. (one speedup needs --clean, 
> and the other needs --freeze).
> 
> https://commitfest.postgresql.org/patch/5821/
> https://commitfest.postgresql.org/patch/5826/
> 
> IIRC I did not activate them (via --clean) because TRUNCATE fails when 
> foreign keys exist. See the discussion threads.
> 
> 
> Dimitris


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: In-order pg_dump (or in-order COPY TO)

От

Dimitrios Apostolou

Дата:

31 августа, 04:21:50

Sorry I was not remembering the details. Probably there is a TOC in your dump file, but it does not contain any
positionsfor the data. The pg_restore command has to scan the whole file in advance, and fill in the TOC offsets in
memory.

This scanning happens in a very inefficient way, with many seek calls and small block reads. Try strace to see them.
Thisinitial phase can take hours in a huge dump file, before even starting any actual restoration. 

Thank you for testing.
Dimitris

On 30 August 2025 20:19:13 CEST, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>On 8/27/25 09:10, Dimitrios Apostolou wrote:
>>
>> On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:
>
>
>>>
>>> For completeness and just in case they may affect the output what do the patches do?
>>
>> Two patches for speeding up scanning an archive without TOC, like the one I'm having (because it is piped into borg,
insteadof written to file). These were activated, but shouldn't matter. They only build the TOC in pg_restore's memory. 
>
>Are you sure about that?
>
>I just did:
>
>pg_dump -Fc --compress=none --no-toast-compression -d test -U postgres | borg create --stats --stdin-name pg_file
--stdin-useraklaver --stdin-group aklaver borg_test/::PgTest - 
>
>Then:
>
>borg mount borg_test/ mnt_tmp/
>cd mnt_tmp/PgTest/
>
>and then:
>
>pg_restore -l pg_file
>
>and I got a TOC.
>
>Or are you streaming the data out of the Borg archive?
>
>>
>> https://commitfest.postgresql.org/patch/5809/
>> https://commitfest.postgresql.org/patch/5817/
>>
>> And two patches for speeding up pg_restore like mentioned above, under specific arguments that I didn't provide.
(onespeedup needs --clean, and the other needs --freeze). 
>>
>> https://commitfest.postgresql.org/patch/5821/
>> https://commitfest.postgresql.org/patch/5826/
>>
>> IIRC I did not activate them (via --clean) because TRUNCATE fails when foreign keys exist. See the discussion
threads.
>>
>>
>> Dimitris
>
>

Re: In-order pg_dump (or in-order COPY TO)

От

Adrian Klaver

Дата:

31 августа, 18:41:34

On 8/30/25 18:21, Dimitrios Apostolou wrote:
> Sorry I was not remembering the details. Probably there is a TOC in your dump file, but it does not contain any
positionsfor the data. The pg_restore command has to scan the whole file in advance, and fill in the TOC offsets in
memory.
> 
> This scanning happens in a very inefficient way, with many seek calls and small block reads. Try strace to see them.
Thisinitial phase can take hours in a huge dump file, before even starting any actual restoration.

It may be that my coffee is not strong enough, but I don't understand 
what you are trying to say.

Are you using, from previous post, the following?:

"Two patches for speeding up scanning an archive without TOC, like the 
one I'm having (because it is piped into borg, instead of written to 
file). These were activated, but shouldn't matter. They only build the 
TOC in pg_restore's memory. "

The part I don't see is how you get a dump file without a TOC?

When I do the pg_dump and pipe it to Borg the resulting file has a TOC.

Can you show the rest of the  |  borg create ...  command?

> 
> Thank you for testing.
> Dimitris
> 
> On 30 August 2025 20:19:13 CEST, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: In-order pg_dump (or in-order COPY TO)

От

Dimitrios Apostolou

Дата:

31 августа, 20:52:55

When I first said "dump file without TOC" I actually meant "without offsets in the TOC".

The fact that you get a TOC printed does not prove that the dump file includes a TOC with offsets.

All pg_dump -Fc commands that write to stdout, produce a file without offsets in the TOC. It has nothing to do with
borg.ToC offsets must be filled in right before streaming each table, but this is impossible when the whole TOC has
alreadybeen written to stdout in the beginning. 

Dimitris

On 31 August 2025 17:41:34 CEST, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>On 8/30/25 18:21, Dimitrios Apostolou wrote:
>> Sorry I was not remembering the details. Probably there is a TOC in your dump file, but it does not contain any
positionsfor the data. The pg_restore command has to scan the whole file in advance, and fill in the TOC offsets in
memory.
>>
>> This scanning happens in a very inefficient way, with many seek calls and small block reads. Try strace to see them.
Thisinitial phase can take hours in a huge dump file, before even starting any actual restoration. 
>
>It may be that my coffee is not strong enough, but I don't understand what you are trying to say.
>
>Are you using, from previous post, the following?:
>
>"Two patches for speeding up scanning an archive without TOC, like the one I'm having (because it is piped into borg,
insteadof written to file). These were activated, but shouldn't matter. They only build the TOC in pg_restore's memory.
"
>
>The part I don't see is how you get a dump file without a TOC?
>
>When I do the pg_dump and pipe it to Borg the resulting file has a TOC.
>
>Can you show the rest of the  |  borg create ...  command?
>
>>
>> Thank you for testing.
>> Dimitris
>>
>> On 30 August 2025 20:19:13 CEST, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
>
>

Re: In-order pg_dump (or in-order COPY TO)

От

Adrian Klaver

Дата:

01 сентября, 20:24:15

On 8/31/25 10:52, Dimitrios Apostolou wrote:
> When I first said "dump file without TOC" I actually meant "without offsets in the TOC".
> 
> The fact that you get a TOC printed does not prove that the dump file includes a TOC with offsets.

I did some digging in the code and see that the TOC is more then that, 
it stores a range of data. Still have not part where the offsets are 
ignored for writes to stdout, but will keep on digging.

Getting back to your OP:

"Unfortunately after I did pg_restore to a new server, I notice that the
dumps from the new server are not being de-duplicated, all blocks are
considered new."

I ran a test on a much smaller database and I do not see the above. The 
commands and the Borg info for the archive are in attached file.
> 
> All pg_dump -Fc commands that write to stdout, produce a file without offsets in the TOC. It has nothing to do with
borg.ToC offsets must be filled in right before streaming each table, but this is impossible when the whole TOC has
alreadybeen written to stdout in the beginning.
 
> 
> Dimitris



-- 
Adrian Klaver
adrian.klaver@aklaver.com

Вложения

pg_borg_test.txt

Re: In-order pg_dump (or in-order COPY TO)

От

Tom Lane

Дата:

01 сентября, 20:54:16

Adrian Klaver <adrian.klaver@aklaver.com> writes:
> I did some digging in the code and see that the TOC is more then that, 
> it stores a range of data. Still have not part where the offsets are 
> ignored for writes to stdout, but will keep on digging.

The TOC is initially written out with zeroes for the offsets.
Then the per-table data parts are written out, tracking where
each one begins.  At the end, if the output file is seekable,
pg_dump seeks back to the start and re-writes the whole TOC
section, now with data offsets populated.  But output to a
pipe won't be seekable.

            regards, tom lane

Re: In-order pg_dump (or in-order COPY TO)

От

Adrian Klaver

Дата:

01 сентября, 22:02:15

On 9/1/25 10:54, Tom Lane wrote:
> Adrian Klaver <adrian.klaver@aklaver.com> writes:
>> I did some digging in the code and see that the TOC is more then that,
>> it stores a range of data. Still have not part where the offsets are
>> ignored for writes to stdout, but will keep on digging.
> 
> The TOC is initially written out with zeroes for the offsets.
> Then the per-table data parts are written out, tracking where
> each one begins.  At the end, if the output file is seekable,
> pg_dump seeks back to the start and re-writes the whole TOC
> section, now with data offsets populated.  But output to a
> pipe won't be seekable.

Got it, thanks.

> 
>             regards, tom lane


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: In-order pg_dump (or in-order COPY TO)

От

Álvaro Herrera

Дата:

04 сентября, 15:02:15

On 2025-Aug-26, Dimitrios Apostolou wrote:

> I am storing dumps of a database (pg_dump custom format) in a de-duplicating
> backup server. Each dump is many terabytes in size, so deduplication is very
> important. And de-duplication itself is based on rolling checksums which is
> pretty flexible, it can compensate for blocks moving by some offset.

Hello,

It's generally considered nowadays that pg_dump is not the best option
to create backups of very large databases.  You may be better served by
using a binary backup tool -- something like Barman.  With current
Postgres releases you can create incremental backups, which would
probably be more effective at deduplicating than playing with pg_dump's
TOC, because it's based on what actually happens to the data.  Barman
provides support for hook scripts, which perhaps can be used to transfer
the backup files to Borg.  (I haven't actually tried to do this, but the
Barman developers talk about using them to transfer the backups to tape,
so I imagine getting them to play with Borg it's a Simple Matter of
Programming.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"On the other flipper, one wrong move and we're Fatal Exceptions"
(T.U.X.: Term Unit X  - http://www.thelinuxreview.com/TUX/)

Re: In-order pg_dump (or in-order COPY TO)

От

Adrian Klaver

Дата:

04 сентября, 17:49:43

On 9/4/25 05:02, Álvaro Herrera wrote:
> On 2025-Aug-26, Dimitrios Apostolou wrote:
> 
>> I am storing dumps of a database (pg_dump custom format) in a de-duplicating
>> backup server. Each dump is many terabytes in size, so deduplication is very
>> important. And de-duplication itself is based on rolling checksums which is
>> pretty flexible, it can compensate for blocks moving by some offset.
> 
> Hello,
> 
> It's generally considered nowadays that pg_dump is not the best option
> to create backups of very large databases.  You may be better served by
> using a binary backup tool -- something like Barman.  With current
> Postgres releases you can create incremental backups, which would
> probably be more effective at deduplicating than playing with pg_dump's
> TOC, because it's based on what actually happens to the data.  Barman

As I understand it the TOC issue was with pg_restore and it having to 
generate the offsets as they where not included in the backup file as it 
was streamed, not written to a file. The deduplication became an issue 
when changing Postgres versions per:

https://www.postgresql.org/message-id/4ss66r31-558o-qq24-332q-no351p7n5osr%40tzk.arg

"
 > The problem occurs when you do the pg_dump after this restore, correct?


Correct. The first pg_dump from the restored pg17 is not deduplicated at
all. Most of the tables have not changed (logically at least; apparently
they have changed physically).
"


> provides support for hook scripts, which perhaps can be used to transfer
> the backup files to Borg.  (I haven't actually tried to do this, but the
> Barman developers talk about using them to transfer the backups to tape,
> so I imagine getting them to play with Borg it's a Simple Matter of
> Programming.)
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: In-order pg_dump (or in-order COPY TO)

От

Dimitrios Apostolou

Дата:

04 сентября, 19:08:12

Hi Álvaro and Greg,

On Thursday 2025-09-04 14:02, Álvaro Herrera wrote:

> It's generally considered nowadays that pg_dump is not the best option
> to create backups of very large databases.  You may be better served by
> using a binary backup tool -- something like Barman.  With current
> Postgres releases you can create incremental backups, which would
> probably be more effective at deduplicating than playing with pg_dump's
> TOC, because it's based on what actually happens to the data.  Barman
> provides support for hook scripts, which perhaps can be used to transfer
> the backup files to Borg.  (I haven't actually tried to do this, but the
> Barman developers talk about using them to transfer the backups to tape,
> so I imagine getting them to play with Borg it's a Simple Matter of
> Programming.)

On Wed, 27 Aug 2025, Greg Sabino Mullane wrote:

> I suggest looking into pgBackRest, and it's block incremental feature,
> which sounds similar to what you are doing. But it also does it with
> parallel processes, and can do things like grab backup files from your
> replicas, plus a lot of other features.


if I'm not mistaken, both Barman and pgBackRest are based on physical
dumps of the database (pg_basebackup). At the start of this project I
had evaluated pg_basebackup, but decided logical backup fitted my needs
better.

+ pg_basebackup was slower, measuring speeds of around 10MB/s, because
   of issues with 8KB page size and compressed btrfs (see [1]; situation
   has been improved both on the postgres side and the kernel side;
   I'm not sure how pg_basebackup fares today).

+ pg_basebackup was much bigger, because of including indices etc.  As a
   result of size and speed, pg_basebackup was also taking a longer time.

+ physical dumps would change a lot during maintenance (vacuum full,
   cluster etc) while the data would remain the same. This would
   reduce the effect of deduplication and increase size requirements even
   further. At that point in time I did not expect logical dumps to
   change too, when the data hasn't changed.

+ I use logical dumps as a tool, not only as a backup, to copy the
   database to other servers with different postgresql versions.

+ I also use it to verify the VCS-committed SQL schema: doing pg_restore
   --data-only on an already created database will fail if the SQL schema
   had been modified on the original server without committing the
   changes.

+ Finally I don't really need all the advanced features that physical
   replication offers, like HA, PITR, load balancing.  It's a
   non-mission-critical service that can take a little time off in case
   of disaster recovery.

[1] https://www.postgresql.org/message-id/flat/218fa2e0-bc58-e469-35dd-c5cb35906064%40gmx.net

Regards,
Dimitris

Re: In-order pg_dump (or in-order COPY TO)

От

Achilleas Mantzios

Дата:

04 сентября, 19:59:43

On 9/4/25 19:08, Dimitrios Apostolou wrote:
> Hi Álvaro and Greg,
>
> On Thursday 2025-09-04 14:02, Álvaro Herrera wrote:
>
>> It's generally considered nowadays that pg_dump is not the best option
>> to create backups of very large databases.  You may be better served by
>> using a binary backup tool -- something like Barman.  With current
>> Postgres releases you can create incremental backups, which would
>> probably be more effective at deduplicating than playing with pg_dump's
>> TOC, because it's based on what actually happens to the data. Barman
>> provides support for hook scripts, which perhaps can be used to transfer
>> the backup files to Borg.  (I haven't actually tried to do this, but the
>> Barman developers talk about using them to transfer the backups to tape,
>> so I imagine getting them to play with Borg it's a Simple Matter of
>> Programming.)
>
> On Wed, 27 Aug 2025, Greg Sabino Mullane wrote:
>
>> I suggest looking into pgBackRest, and it's block incremental
>> feature, which sounds similar to what you are doing. But it also does
>> it with parallel processes, and can do things like grab backup files
>> from your replicas, plus a lot of other features.
>
>
> if I'm not mistaken, both Barman and pgBackRest are based on physical
> dumps of the database (pg_basebackup). At the start of this project I
> had evaluated pg_basebackup, but decided logical backup fitted my
> needs better.
>
> + pg_basebackup was slower, measuring speeds of around 10MB/s, because
>   of issues with 8KB page size and compressed btrfs (see [1]; situation
>   has been improved both on the postgres side and the kernel side;
>   I'm not sure how pg_basebackup fares today).
>
> + pg_basebackup was much bigger, because of including indices etc.  As a
>   result of size and speed, pg_basebackup was also taking a longer time.
>
> + physical dumps would change a lot during maintenance (vacuum full,
>   cluster etc) while the data would remain the same. This would
>   reduce the effect of deduplication and increase size requirements even
>   further. At that point in time I did not expect logical dumps to
>   change too, when the data hasn't changed.
>
> + I use logical dumps as a tool, not only as a backup, to copy the
>   database to other servers with different postgresql versions.
>
> + I also use it to verify the VCS-committed SQL schema: doing pg_restore
>   --data-only on an already created database will fail if the SQL schema
>   had been modified on the original server without committing the
>   changes.
>
> + Finally I don't really need all the advanced features that physical
>   replication offers, like HA, PITR, load balancing.  It's a
>   non-mission-critical service that can take a little time off in case
>   of disaster recovery.
>
> [1]
> https://www.postgresql.org/message-id/flat/218fa2e0-bc58-e469-35dd-c5cb35906064%40gmx.net
>

if that helps , this is a writeup I had some some years ago comparing
pgbackrest , barman and probackup :

https://severalnines.com/blog/current-state-open-source-backup-management-postgresql/

>
> Regards,
> Dimitris

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: In-order pg_dump (or in-order COPY TO)

Вложения