Обсуждение: Adding REPACK [concurrently]

Поиск
Список
Период
Сортировка

Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
Hello,

Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1].  The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
FULL does that, but it's not concurrent.  Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite.  The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.

Essentially, this first patch just reshuffles the CLUSTER code to create
the REPACK command.

I made a few changes from Antonin's original at [2].  First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name.  With this change, all possibilities of the old commands are
covered, which gives us the chance to flag them as obsolete.  (This is
good, because having VACUUM FULL do something completely different from
regular VACUUM confuses users all the time; and on the other hand,
having a command called CLUSTER which is at odds with what most people
think of as a "database cluster" is also confusing.)

Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:

-- re-clusters all tables that have a clustered index set
CLUSTER                     -> REPACK USING INDEX

-- clusters the given table using the given index
CLUSTER tab USING idx       -> REPACK tab USING INDEX idx

-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab                 -> REPACK tab USING INDEX

-- vacuum-full all tables
VACUUM FULL                 -> REPACK

-- vacuum-full the specified table
VACUUM FULL tab             -> REPACK tab


My other change to Antonin's patch is that I made REPACK USING INDEX set
the 'indisclustered' flag to the index being used, so REPACK behaves
identically to CLUSTER.  We can discuss whether we really want this.
For instance we could add an option so that by default REPACK omits
persisting the clustered index, and instead it only does that when you
give it some special option, say something like
  "REPACK (persist_clustered_index=true) tab USING INDEX idx"
Overall I'm not sure this is terribly interesting, since clustered
indexes are not very useful for most users anyway.

I made a few other minor changes not worthy of individual mention, and
there are a few others pending, such as updates to the
pg_stat_progress_repack view infrastructure, as well as phasing out
pg_stat_progress_cluster (maybe the latter would offer a subset of the
former; not yet sure about this.)  Also, I'd like to work on adding a
`repackdb` command for completeness.

On repackdb: I think is going to be very similar to vacuumdb, mostly in
that it is going to need to be able to run tasks in parallel; but there
are things it doesn't have to deal with, such as analyze-in-stages,
which I think is a large burden.  I estimate about 1k LOC there,
extremely similar to vacuumdb.  Maybe it makes sense to share the source
code and make the new executable a symlink instead, with some additional
code to support the two different modes.  Again, I'm not sure about
this -- I like the idea, but I'd have to see the implementation.

I'll be rebasing the rest of Antonin's patch series afterwards,
including the logical decoding changes necessary for CONCURRENTLY.  In
the meantime, if people want to review those, which would be very
valuable, they can go back to branch master from around the time he
submitted it and apply the old patches there.

[1] https://postgr.es/m/76278.1724760050@antos
[2] https://postgr.es/m/152010.1751307725@localhost

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Вложения

Re: Adding REPACK [concurrently]

От
Robert Treat
Дата:
On Sat, Jul 26, 2025 at 5:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Hello,
>
> Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
> This is coming from [1].  The ultimate goal is to have an in-core tool
> to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
> FULL does that, but it's not concurrent.  Users have resorted to using
> the pg_repack third-party tool, which is ancient and uses a weird
> internal implementation, as well as pg_squeeze, which uses logical
> decoding to capture changes that occur during the table rewrite.  The
> patch submitted here, largely by Antonin Houska with some changes by me,
> is based on the the pg_squeeze code which he authored, and first
> introduces a new command called REPACK to absorb both VACUUM FULL and
> CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
> of REPACK to operate online using logical decoding.
>
> Essentially, this first patch just reshuffles the CLUSTER code to create
> the REPACK command.
>

Thanks for keeping this ball rolling.

>
> My other change to Antonin's patch is that I made REPACK USING INDEX set
> the 'indisclustered' flag to the index being used, so REPACK behaves
> identically to CLUSTER.  We can discuss whether we really want this.
> For instance we could add an option so that by default REPACK omits
> persisting the clustered index, and instead it only does that when you
> give it some special option, say something like
>   "REPACK (persist_clustered_index=true) tab USING INDEX idx"
> Overall I'm not sure this is terribly interesting, since clustered
> indexes are not very useful for most users anyway.
>

I think I would lean towards having it work like CLUSTER (preserve the
index), since that helps people making the transition, and it doesn't
feel terribly useful to invent new syntax for a feature that I would
agree isn't very useful for most people.

> I made a few other minor changes not worthy of individual mention, and
> there are a few others pending, such as updates to the
> pg_stat_progress_repack view infrastructure, as well as phasing out
> pg_stat_progress_cluster (maybe the latter would offer a subset of the
> former; not yet sure about this.)  Also, I'd like to work on adding a
> `repackdb` command for completeness.
>
> On repackdb: I think is going to be very similar to vacuumdb, mostly in
> that it is going to need to be able to run tasks in parallel; but there
> are things it doesn't have to deal with, such as analyze-in-stages,
> which I think is a large burden.  I estimate about 1k LOC there,
> extremely similar to vacuumdb.  Maybe it makes sense to share the source
> code and make the new executable a symlink instead, with some additional
> code to support the two different modes.  Again, I'm not sure about
> this -- I like the idea, but I'd have to see the implementation.
>
> I'll be rebasing the rest of Antonin's patch series afterwards,
> including the logical decoding changes necessary for CONCURRENTLY.  In
> the meantime, if people want to review those, which would be very
> valuable, they can go back to branch master from around the time he
> submitted it and apply the old patches there.
>

For clarity, are you intending to commit this patch before having the
other parts ready? (If that sounds like an objection, it isn't) After
a first pass, I think there's some confusing bits in the new docs that
could use straightening out, but there likely going to overlap changes
once concurrently is brought in, so it might make sense to hold off on
those. Either way I definitely want to dive into this a bit deeper
with some fresh eyes, there's a lot to digest... speaking of, for this
bit in src/backend/commands/cluster.c

+    switch (cmd)
+    {
+        case REPACK_COMMAND_REPACK:
+            return "REPACK";
+        case REPACK_COMMAND_VACUUMFULL:
+            return "VACUUM";
+        case REPACK_COMMAND_CLUSTER:
+            return "VACUUM";
+    }
+    return "???";

The last one should return "CLUSTER" no?


Robert Treat
https://xzilla.net



Re: Adding REPACK [concurrently]

От
Fujii Masao
Дата:
On Sun, Jul 27, 2025 at 6:56 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Hello,
>
> Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
> This is coming from [1].  The ultimate goal is to have an in-core tool
> to allow concurrent table rewrite to get rid of bloat;

+1


> right now, VACUUM
> FULL does that, but it's not concurrent.  Users have resorted to using
> the pg_repack third-party tool, which is ancient and uses a weird
> internal implementation, as well as pg_squeeze, which uses logical
> decoding to capture changes that occur during the table rewrite.  The
> patch submitted here, largely by Antonin Houska with some changes by me,
> is based on the the pg_squeeze code which he authored, and first
> introduces a new command called REPACK to absorb both VACUUM FULL and
> CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
> of REPACK to operate online using logical decoding.

Does this mean REPACK CONCURRENTLY requires wal_level = logical,
while plain REPACK (without CONCURRENTLY) works with any wal_level
setting? If we eventually deprecate VACUUM FULL and CLUSTER,
I think plain REPACK should still be allowed with wal_level = minimal
or replica, so users with those settings can perform equivalent
processing.


+ if (!cluster_is_permitted_for_relation(tableOid, userid,
+    CLUSTER_COMMAND_CLUSTER))

As for the patch you attached, it seems to be an early WIP and
might not be ready for review yet?? BTW, I got the following
compilation failure and probably CLUSTER_COMMAND_CLUSTER
the above should be GetUserId().

-----------------
cluster.c:455:14: error: use of undeclared identifier 'CLUSTER_COMMAND_CLUSTER'
  455 |
                    CLUSTER_COMMAND_CLUSTER))
      |
                    ^
1 error generated.
-----------------

Regards,

--
Fujii Masao



Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
On 2025-Jul-26, Robert Treat wrote:

> For clarity, are you intending to commit this patch before having the
> other parts ready? (If that sounds like an objection, it isn't) After
> a first pass, I think there's some confusing bits in the new docs that
> could use straightening out, but there likely going to overlap changes
> once concurrently is brought in, so it might make sense to hold off on
> those.

I'm aiming at getting 0001 committed during the September commitfest,
and the CONCURRENTLY flag addition later in the pg19 cycle.  But I'd
rather have good-enough docs at every step of the way.  They don't have
to be *perfect* if we want to get everything in pg19, but I'd rather not
leave anything openly confusing even transiently.

That said, I did not review the docs this time around, so here's them
the same as they were in the previous post.  But if you want to suggest
changes for the docs in 0001, please do.  Just don't get too carried
away.

> speaking of, for this bit in src/backend/commands/cluster.c
> 
> +    switch (cmd)
> +    {
> +        case REPACK_COMMAND_REPACK:
> +            return "REPACK";
> +        case REPACK_COMMAND_VACUUMFULL:
> +            return "VACUUM";
> +        case REPACK_COMMAND_CLUSTER:
> +            return "VACUUM";
> +    }
> +    return "???";
> 
> The last one should return "CLUSTER" no?

Absolutely -- my blunder.

On 2025-Jul-27, Fujii Masao wrote:

> > The patch submitted here, largely by Antonin Houska with some
> > changes by me, is based on the the pg_squeeze code which he
> > authored, and first introduces a new command called REPACK to absorb
> > both VACUUM FULL and CLUSTER, followed by addition of a CONCURRENTLY
> > flag to allow some forms of REPACK to operate online using logical
> > decoding.
> 
> Does this mean REPACK CONCURRENTLY requires wal_level = logical, while
> plain REPACK (without CONCURRENTLY) works with any wal_level setting?
> If we eventually deprecate VACUUM FULL and CLUSTER, I think plain
> REPACK should still be allowed with wal_level = minimal or replica, so
> users with those settings can perform equivalent processing.

Absolutely.

One of the later patches in the series, which I have not included yet,
intends to implement the idea of transiently enabling wal_level=logical
for the table being repacked concurrently, so that you can still use
the concurrent mode if you have a non-logical-wal_level instance.

> + if (!cluster_is_permitted_for_relation(tableOid, userid,
> +    CLUSTER_COMMAND_CLUSTER))
> 
> As for the patch you attached, it seems to be an early WIP and
> might not be ready for review yet?? BTW, I got the following
> compilation failure and probably CLUSTER_COMMAND_CLUSTER
> the above should be GetUserId().

This was a silly merge mistake, caused by my squashing Antonin's 0004
(trivial code restructuring) into 0001 at the last minute and failing to
"git add" the compile fixes before doing git-format-patch.

Here's v17.  (I decided that calling my previous one "v1" after Antonin
had gone all the way to v15 was stupid on my part.)  The important part
here is that I rebased Antonin 0004's, that is, the addition of the
CONCURRENTLY flag, plus 0005 regression tests.

The only interesting change here is that I decided to not mess with the
grammar by allowing an unparenthesized CONCURRENTLY keyword; if you want
concurrent, you have to say "REPACK (CONCURRENTLY)".  This is at odds
with the way we use the keyword in other commands, but ISTM we don't
_need_ to support that legacy syntax.  Anyway, this is easy to put back
afterwards, if enough people find it not useless.

I've not reviewed 0003 in depth yet, just rebased it.  But it works to
the point that CI is happy with it.

I've not yet included Antonin's 0006 and 0007.


TODO list for 0001:

- addition of src/bin/scripts/repackdb
- clean up the progress report infrastructure
- doc review

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
Thou shalt check the array bounds of all strings (indeed, all arrays), for
surely where thou typest "foo" someone someday shall type
"supercalifragilisticexpialidocious" (5th Commandment for C programmers)

Вложения

Re: Adding REPACK [concurrently]

От
Fujii Masao
Дата:
On Fri, Aug 1, 2025 at 1:50 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> One of the later patches in the series, which I have not included yet,
> intends to implement the idea of transiently enabling wal_level=logical
> for the table being repacked concurrently, so that you can still use
> the concurrent mode if you have a non-logical-wal_level instance.

Sounds good to me!


> Here's v17.

I just tried REPACK command and observed a few things:

When I repeatedly ran REPACK on the regression database
while make installcheck was running, I got the following error:

        ERROR:  StartTransactionCommand: unexpected state STARTED

"REPACK (VERBOSE);" failed with the following error.

        ERROR:  syntax error at or near ";"

REPACK (CONCURRENTLY) USING INDEX failed with the following error,
while the same command without CONCURRENTLY completed successfully:

        =# REPACK (CONCURRENTLY) parallel_vacuum_table using index
regular_sized_index ;
        ERROR:  cannot process relation "parallel_vacuum_table"
        HINT:  Relation "parallel_vacuum_table" has no identity index.

When I ran REPACK (CONCURRENTLY) on a table that's also a logical
replication target, I saw the following log messages. Is this expected?

        =# REPACK (CONCURRENTLY) t;
        LOG:  logical decoding found consistent point at 1/00021F20
        DETAIL:  There are no running transactions.
        STATEMENT:  REPACK (CONCURRENTLY) t;

Regards,

--
Fujii Masao



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello Álvaro,

Should we skip non-ready indexes in build_new_indexes?

Also, in the current implementation, concurrent mode is marked as
non-MVCC safe. From my point of view, this is a significant limitation
for practical use.
Should we consider an option to exchange non-MVCC issues to short
exclusive lock of ProcArrayLock + cancellation of some transactions
with older xmin?

Best regards,
Mikhail



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> I made a few changes from Antonin's original at [2].  First, I modified
> the grammar to support "REPACK [tab] USING INDEX" without specifying the
> index name.  With this change, all possibilities of the old commands are
> covered,

...

> Here's a list of existing commands, and how to write them in the current
> patch's proposal for REPACK:
> 
> -- re-clusters all tables that have a clustered index set
> CLUSTER                     -> REPACK USING INDEX
> 
> -- clusters the given table using the given index
> CLUSTER tab USING idx       -> REPACK tab USING INDEX idx
> 
> -- clusters this table using a clustered index; error if no index clustered
> CLUSTER tab                 -> REPACK tab USING INDEX
> 
> -- vacuum-full all tables
> VACUUM FULL                 -> REPACK
> 
> -- vacuum-full the specified table
> VACUUM FULL tab             -> REPACK tab
> 

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

(Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
think these have no effect.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello!

One more thing - I think build_new_indexes and
index_concurrently_create_copy are very close in semantics, so it
might be a good idea to refactor them a bit.

I’m still concerned about MVCC-related issues. For multiple
applications, this is a dealbreaker, because in some cases correctness
is a higher priority than availability.

Possible options:

1) Terminate connections with old snapshots.

Add a flag to terminate all connections with snapshots during the
ExclusiveLock period for the swap. From the application’s perspective,
this is not a big deal - it's similar to a primary switch. We would
also need to prevent new snapshots from being taken during the swap
transaction, so a short exclusive lock on ProcArrayLock would also be
required.

2) MVCC-safe two-phase approach (inspired by CREATE INDEX).

- copy the data from T1 to the new table T2.
- apply the log.
- take a table-exclusive lock on T1
- apply the log again.
- instead of swapping, mark the T2 as a kind of shadow table - any
transaction applying changes to T1 must also apply them to T2, while
reads still use T1 as the source of truth.
- commit (and record the transaction ID as XID1).
- at this point, all changes are applied to both tables with the same
XIDs because of the "shadow table" mechanism.
- wait until older snapshots no longer treat XID1 as uncommitted.
- now the tables are identical from the MVCC perspective.
- take an exclusive lock on both T1 and T2.
- perform the swap and drop T1.
- commit.

This is more complex and would require implementing some sort of
"shadow table" mechanism, so it might not be worth the effort. Option
1 feels more appealing to me.

If others think this is a good idea, I might try implementing a proof
of concept.

Best regards,
Mikhail



Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
On 2025-Aug-09, Mihail Nikalayeu wrote:

> Hello!
> 
> One more thing - I think build_new_indexes and
> index_concurrently_create_copy are very close in semantics, so it
> might be a good idea to refactor them a bit.
> 
> I’m still concerned about MVCC-related issues. For multiple
> applications, this is a dealbreaker, because in some cases correctness
> is a higher priority than availability.

Please note that Antonin already implemented this.  See his patches
here:
https://www.postgresql.org/message-id/77690.1725610115%40antos
I proposed to leave this part out initially, which is why it hasn't been
reposted.  We can review and discuss after the initial patches are in.
Because having an MVCC-safe mode has drawbacks, IMO we should make it
optional.

But you're welcome to review that part specifically if you're so
inclined, and offer feedback on it.  (I suggest to rewind back your
checked-out tree to branch master at the time that patch was posted, for
easy application.  We can deal with a rebase later.)

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> One more thing - I think build_new_indexes and
> index_concurrently_create_copy are very close in semantics, so it
> might be a good idea to refactor them a bit.

You're right. I think I even used the latter for reference when writing the
first.

0002 in the attached series tries to fix that. build_new_indexes() (in 0004)
is simpler now.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Вложения

Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Antonin Houska <ah@cybertec.at> wrote:

> Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:
> 
> > One more thing - I think build_new_indexes and
> > index_concurrently_create_copy are very close in semantics, so it
> > might be a good idea to refactor them a bit.
> 
> You're right. I think I even used the latter for reference when writing the
> first.
> 
> 0002 in the attached series tries to fix that. build_new_indexes() (in 0004)
> is simpler now.

This is v18 again. Parts 0001 through 0004 are unchanged, however 0005 is
added. It implements a new client application pg_repackdb. (If I posted 0005
alone its regression tests would not work. I wonder if the cfbot handles the
repeated occurence of the 'v18-' prefix correctly.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Вложения

Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
On 2025-Aug-15, Antonin Houska wrote:

> This is v18 again.

Thanks for this!

> Parts 0001 through 0004 are unchanged, however 0005 is added. It
> implements a new client application pg_repackdb. (If I posted 0005
> alone its regression tests would not work. I wonder if the cfbot
> handles the repeated occurence of the 'v18-' prefix correctly.)

Yeah, the cfbot is just going to take the attachments from the latest
email in the thread that has any, and assume they are the whole that
make up the patch.  It wouldn't work to post just v18-0005 and assume
that the bot is going grab patches 0001 through 0004 from a previous
email, if that's what you're thinking.  In short, what you did is
correct and necessary.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Adding REPACK [concurrently]

От
Robert Treat
Дата:
On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > I made a few changes from Antonin's original at [2].  First, I modified
> > the grammar to support "REPACK [tab] USING INDEX" without specifying the
> > index name.  With this change, all possibilities of the old commands are
> > covered,
>
> ...
>
> > Here's a list of existing commands, and how to write them in the current
> > patch's proposal for REPACK:
> >
> > -- re-clusters all tables that have a clustered index set
> > CLUSTER                     -> REPACK USING INDEX
> >
> > -- clusters the given table using the given index
> > CLUSTER tab USING idx       -> REPACK tab USING INDEX idx
> >
> > -- clusters this table using a clustered index; error if no index clustered
> > CLUSTER tab                 -> REPACK tab USING INDEX
> >

In the v18 patch, the docs say that repack doesn't remember the index,
but it seems we are still calling mark_index_clustered, so I think the
above is true but we need to update the docs(?).

> > -- vacuum-full all tables
> > VACUUM FULL                 -> REPACK
> >
> > -- vacuum-full the specified table
> > VACUUM FULL tab             -> REPACK tab
> >
>
> Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
> options of VACUUM FULL. I found two items not supported by REPACK (but also
> not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
> let's mention that in the user documentation of REPACK?
>

I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.

> (Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
> think these have no effect.)
>

Yeah, these seem safe to ignore.

Robert Treat
https://xzilla.net



Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
On 2025-Aug-16, Robert Treat wrote:

> On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:

> > Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
> > options of VACUUM FULL. I found two items not supported by REPACK (but also
> > not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
> > let's mention that in the user documentation of REPACK?
> 
> I would note that both pg_repack and pg_squeeze analyze by default,
> and running "vacuum full analyze" is the recommended behavior, so not
> having analyze included is a step backwards.

Make sense to add ANALYZE as an option to repack, yeah.

So if I repack a single table with
  REPACK (ANALYZE) table USING INDEX;

then do you expect that this would first cluster the table under
AccessExclusiveLock, then release the lock to do the analyze step, or
would the analyze be done under the same lock?  This is significant for
a query that starts while repack is running, because if we release the
AEL then the query is planned when there are no stats for the table,
which might be bad.

I think the time to run the analyze step should be considerable shorter
than the time to run the repacking step, so running both together under
the same lock should be okay.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Computing is too important to be left to men." (Karen Spärck Jones)



Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
On 2025-Aug-16, Robert Treat wrote:

> In the v18 patch, the docs say that repack doesn't remember the index,
> but it seems we are still calling mark_index_clustered, so I think the
> above is true but we need to update the docs(?).

Yes, the docs are obsolete on this point, I'm in the process of updating
them.  Thanks for pointing this out.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"



Re: Adding REPACK [concurrently]

От
Álvaro Herrera
Дата:
Hello,

Here's a second cut of the initial REPACK work.  Antonin added an
implementation of pg_repackdb, and there's also a couple of bug fixes
that were reported in the thread.  I also added support for the ANALYZE
option as noted by Robert Treat, though it only works if you specify a
single non-partitioned table.  Adding for the multi-table case is likely
easy, but I didn't try.

I purposefully do not include the CONCURRENTLY work yet -- I want to get
this part commitable-clean first, then we can continue work on the
logical decoding work on top of that.


Note choice of shell command name: though all the other programs in
src/bin/scripts do not use the "pg_" prefix, this one does; we thought
it made no sense to follow the old programs as precedent because there
seems to be a lament for the lack of pg_ prefix in those, and we only
keep what they are because of their long history.  This one has no
history.

Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way.  The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works.  If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb.  Or maybe we could have a
common source file that both utilities call.


I edited the docs a bit, limiting the exposure of CLUSTER and VACUUM
FULL, and instead redirecting the user to the REPACK docs.  In the
REPACK docs I modified things for additional clarity.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Selbst das größte Genie würde nicht weit kommen, wenn es
alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe)
               Ni aún el genio más grande llegaría muy lejos si
                    quisiera sacarlo todo de su propio interior.

Вложения

Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> On 2025-Aug-16, Robert Treat wrote:
>
> > On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:
>
> > > Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
> > > options of VACUUM FULL. I found two items not supported by REPACK (but also
> > > not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
> > > let's mention that in the user documentation of REPACK?
> >
> > I would note that both pg_repack and pg_squeeze analyze by default,
> > and running "vacuum full analyze" is the recommended behavior, so not
> > having analyze included is a step backwards.
>
> Make sense to add ANALYZE as an option to repack, yeah.
>
> So if I repack a single table with
>   REPACK (ANALYZE) table USING INDEX;
>
> then do you expect that this would first cluster the table under
> AccessExclusiveLock, then release the lock to do the analyze step, or
> would the analyze be done under the same lock?  This is significant for
> a query that starts while repack is running, because if we release the
> AEL then the query is planned when there are no stats for the table,
> which might be bad.
>
> I think the time to run the analyze step should be considerable shorter
> than the time to run the repacking step, so running both together under
> the same lock should be okay.

AFAICS, VACUUM FULL first releases the AEL, then it analyzes the table. If
users did not complain so far, I'd assume that vacuum_rel() (effectively
cluster_rel() in the FULL case) does not change the stats that much.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Álvaro Herrera <alvherre@kurilemu.de> wrote:

> Still on pg_repackdb, the implementation here is to install a symlink
> called pg_repackdb which points to vacuumdb, and make the program behave
> differently when called in this way.  The amount of additional code for
> this is relatively small, so I think this is a worthy technique --
> assuming it works.  If it doesn't, Antonin proposed a separate binary
> that just calls some functions from vacuumdb.  Or maybe we could have a
> common source file that both utilities call.

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

(The reason it does not happen on other platforms might be that the build is
slower on Windows, and thus it's more prone to some specific race conditions.)

It appears that the 'point_to' argument of the 'install_symlink()' function
[1] is only a string rather than a "real target" [2]. That's likely the reason
the function does not wait for the creation of the 'vacuumdb' executable.

I could not find another symlink of this kind in the tree. (AFAICS, the
postmaster->postgres symlink had been removed before Meson has been
introduced.)

Does anyone happen to have a clue? Thanks.

[1] https://mesonbuild.com/Reference-manual_functions.html#install_symlink
[2] https://mesonbuild.com/Reference-manual_returned_tgt.html

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Álvaro Herrera
Дата:
On 2025-Aug-20, Antonin Houska wrote:

> There's an issue with the symlink, maybe some meson expert can help. In
> particular, the CI on Windows ends up with the following error:
> 
> ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj 
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include"
"-I..\src\include\port\win32""-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq"
"/MDd""/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__"
"/D__WIN32__""/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101"
"/wd4102""/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb"
/Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj"/c" ../src/bin/scripts/vacuumdb.c
 
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens.  Running now:
https://cirrus-ci.com/build/4961902171783168

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Álvaro Herrera <alvherre@kurilemu.de> wrote:

> On 2025-Aug-20, Antonin Houska wrote:
>
> > There's an issue with the symlink, maybe some meson expert can help. In
> > particular, the CI on Windows ends up with the following error:
> >
> > ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
>
> Hmm, that's not the problem I see in the CI run from the commitfest app:
>
> https://cirrus-ci.com/task/5608274336153600

I was referring to the other build that you shared off-list (probably
independent from cfbot):

https://cirrus-ci.com/build/4726227505774592

> [19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
> [19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include"
"-I..\src\include\port\win32""-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq"
"/MDd""/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__"
"/D__WIN32__""/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101"
"/wd4102""/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb"
/Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj"/c" ../src/bin/scripts/vacuumdb.c 
> [19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
> [19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0
>
> The real problem here seems to be the empty long_options_repack array.
> I removed it and started a new run to see what happens.  Running now:
> https://cirrus-ci.com/build/4961902171783168

The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Andres Freund
Дата:
Hi,

On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:
> Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> > On 2025-Aug-20, Antonin Houska wrote:
> >
> > > There's an issue with the symlink, maybe some meson expert can help. In
> > > particular, the CI on Windows ends up with the following error:
> > >
> > > ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
> >
> > Hmm, that's not the problem I see in the CI run from the commitfest app:
> >
> > https://cirrus-ci.com/task/5608274336153600
>
> I was referring to the other build that you shared off-list (probably
> independent from cfbot):
>
> https://cirrus-ci.com/build/4726227505774592
>
> > [19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
> > [19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include"
"-Ic:\openssl\1.1\include""-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq"
"-I..\src\interfaces\libpq""/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32"
"/DWINDOWS""/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244"
"/wd4273""/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb"
/Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj"/c" ../src/bin/scripts/vacuumdb.c
 
> > [19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
> > [19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0
> >
> > The real problem here seems to be the empty long_options_repack array.
> > I removed it and started a new run to see what happens.  Running now:
> > https://cirrus-ci.com/build/4961902171783168
>
> The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
> the code compiled well. The compilation failure mentioned above comes from
> "Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
> that the symlink issue will occur there once the compilation is fixed.

FWIW, I don't think it's particularly wise to rely on symlinks on windows -
IIRC they will often not be enabled outside of development environments.

Greetings,

Andres Freund



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello everyone!

Alvaro Herrera <alvherre@alvh.no-ip.org>:
> Please note that Antonin already implemented this.  See his patches
> here:
> https://www.postgresql.org/message-id/77690.1725610115%40antos
> I proposed to leave this part out initially, which is why it hasn't been
> reposted.  We can review and discuss after the initial patches are in.

I think it is worth pushing it at least in the same release cycle.

> But you're welcome to review that part specifically if you're so
> inclined, and offer feedback on it.  (I suggest to rewind back your
> checked-out tree to branch master at the time that patch was posted, for
> easy application.  We can deal with a rebase later.)

I have rebased that on top of v18 (attached).

Also, I think I found an issue (or lost something during rebase): we
must preserve xmin,cmin during initial copy
to make sure that data is going to be visible by snapshots of
concurrent changes later:

static void
reform_and_rewrite_tuple(......)
.....
      /*It is also crucial to stamp the new record with the exact same
xid and cid,
      * because the tuple must be visible to the snapshot of the
applied concurrent
      * change later.
      */
      CommandId      cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
      TransactionId   xid = HeapTupleHeaderGetXmin(tuple->t_data);

      heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);

I'll try to polish that part a little bit.

> Because having an MVCC-safe mode has drawbacks, IMO we should make it
> optional.
Do you mean some option for the command? Like REPACK (CONCURRENTLY, SAFE)?

Best regards,
Mikhail.

Вложения

Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Also, I think I found an issue (or lost something during rebase): we
> must preserve xmin,cmin during initial copy
> to make sure that data is going to be visible by snapshots of
> concurrent changes later:
>
> static void
> reform_and_rewrite_tuple(......)
> .....
>       /*It is also crucial to stamp the new record with the exact same
> xid and cid,
>       * because the tuple must be visible to the snapshot of the
> applied concurrent
>       * change later.
>       */
>       CommandId      cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
>       TransactionId   xid = HeapTupleHeaderGetXmin(tuple->t_data);
>
>       heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);

When posting version 12 of the patch [1] I raised a concern that the the MVCC
safety is too expensive when it comes to logical decoding. Therefore, I
abandoned the concept for now, and v13 [2] uses plain heap_insert(). Once we
implement the MVCC safety, we simply rewrite the tuple like v12 did - that's
the simplest way to preserve fields like xmin, cmin, ...

[1] https://www.postgresql.org/message-id/178741.1743514291%40localhost
[2] https://www.postgresql.org/message-id/97795.1744363522%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Andres Freund <andres@anarazel.de> wrote:

> Hi,
>
> On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:
> > Álvaro Herrera <alvherre@kurilemu.de> wrote:
> >
> > > On 2025-Aug-20, Antonin Houska wrote:
> > >
> > > > There's an issue with the symlink, maybe some meson expert can help. In
> > > > particular, the CI on Windows ends up with the following error:
> > > >
> > > > ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
> > >
> > > Hmm, that's not the problem I see in the CI run from the commitfest app:
> > >
> > > https://cirrus-ci.com/task/5608274336153600
> >
> > I was referring to the other build that you shared off-list (probably
> > independent from cfbot):
> >
> > https://cirrus-ci.com/build/4726227505774592
> >
> > > [19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
> > > [19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include"
"-Ic:\openssl\1.1\include""-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq"
"-I..\src\interfaces\libpq""/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32"
"/DWINDOWS""/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244"
"/wd4273""/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb"
/Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj"/c" ../src/bin/scripts/vacuumdb.c 
> > > [19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
> > > [19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0
> > >
> > > The real problem here seems to be the empty long_options_repack array.
> > > I removed it and started a new run to see what happens.  Running now:
> > > https://cirrus-ci.com/build/4961902171783168
> >
> > The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
> > the code compiled well. The compilation failure mentioned above comes from
> > "Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
> > that the symlink issue will occur there once the compilation is fixed.
>
> FWIW, I don't think it's particularly wise to rely on symlinks on windows -
> IIRC they will often not be enabled outside of development environments.

ok, installing a copy of the same executable with a different name seems more
reliable. At least that's how the postmaster->postgres link used to be
handled, if I read Makefile correctly. Thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Andres Freund
Дата:
Hi,

On 2025-08-21 20:14:14 +0200, Antonin Houska wrote:
> ok, installing a copy of the same executable with a different name seems more
> reliable. At least that's how the postmaster->postgres link used to be
> handled, if I read Makefile correctly. Thanks.

I have not followed this thread, but I don't think the whole thing of having a
single executable with multiple names is worth doing. Just make whatever an
option, instead of having multiple "executables".

Greetings,

Andres



Re: Adding REPACK [concurrently]

От
Robert Treat
Дата:
On Tue, Aug 19, 2025 at 2:53 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> Note choice of shell command name: though all the other programs in
> src/bin/scripts do not use the "pg_" prefix, this one does; we thought
> it made no sense to follow the old programs as precedent because there
> seems to be a lament for the lack of pg_ prefix in those, and we only
> keep what they are because of their long history.  This one has no
> history.
>
> Still on pg_repackdb, the implementation here is to install a symlink
> called pg_repackdb which points to vacuumdb, and make the program behave
> differently when called in this way.  The amount of additional code for
> this is relatively small, so I think this is a worthy technique --
> assuming it works.  If it doesn't, Antonin proposed a separate binary
> that just calls some functions from vacuumdb.  Or maybe we could have a
> common source file that both utilities call.
>

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.


Robert Treat
https://xzilla.net



Re: Adding REPACK [concurrently]

От
Álvaro Herrera
Дата:
On 2025-Aug-21, Robert Treat wrote:

> What's the plan for clusterdb? It seems like we'd ideally create a
> stand alone pg_repackdb which replaces clusterdb and also allows us to
> remove the FULL options from vacuumdb.

I don't think we should remove clusterdb, to avoid breaking any scripts
that work today.  As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone.  Removing the obsolete commands and
options can be done in a few years.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Adding REPACK [concurrently]

От
"Euler Taveira"
Дата:
On Fri, Aug 22, 2025, at 6:40 AM, Álvaro Herrera wrote:
> On 2025-Aug-21, Robert Treat wrote:
>
>> What's the plan for clusterdb? It seems like we'd ideally create a
>> stand alone pg_repackdb which replaces clusterdb and also allows us to
>> remove the FULL options from vacuumdb.
>
> I don't think we should remove clusterdb, to avoid breaking any scripts
> that work today.  As you say, I would create the standalone pg_repackdb
> to do what we need it to do (namely: run the REPACK commands) and leave
> vacuumdb and clusterdb alone.  Removing the obsolete commands and
> options can be done in a few years.
>

I would say that we need to plan the removal of these binaries (clusterdb and
vacuumdb). We can start with a warning into clusterdb saying they should use
pg_repackdb. In a few years, we can remove clusterdb. There were complaints
about binary names without a pg_ prefix in the past [1].

I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.


[1] https://www.postgresql.org/message-id/CAJgfmqXYYKXR%2BQUhEa3cq6pc8dV0Hu7QvOUccm7R0TkC%3DT-%2B%3DA%40mail.gmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/



Re: Adding REPACK [concurrently]

От
Michael Banck
Дата:
Hi,

On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:
> On Fri, Aug 22, 2025, at 6:40 AM, Álvaro Herrera wrote:
> > On 2025-Aug-21, Robert Treat wrote:
> >> What's the plan for clusterdb? It seems like we'd ideally create a
> >> stand alone pg_repackdb which replaces clusterdb and also allows us to
> >> remove the FULL options from vacuumdb.
> >
> > I don't think we should remove clusterdb, to avoid breaking any scripts
> > that work today.  As you say, I would create the standalone pg_repackdb
> > to do what we need it to do (namely: run the REPACK commands) and leave
> > vacuumdb and clusterdb alone.  Removing the obsolete commands and
> > options can be done in a few years.
> 
> I would say that we need to plan the removal of these binaries (clusterdb and
> vacuumdb). We can start with a warning into clusterdb saying they should use
> pg_repackdb. In a few years, we can remove clusterdb. There were complaints
> about binary names without a pg_ prefix in the past [1].

Yeah.
 
> I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
> to pg_repackdb. We can add a similar warning message saying they should use
> pg_repackdb if the symlink is used.

Unless pg_repack has the same (or a superset of) CLI and behaviour as
vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
with a symlink to pg_repack will lead to much more breakage in existing
scripts/automation than clusterdb, which I guess is used orders of
magnitude less frequently than vacumdb.


Michael



Re: Adding REPACK [concurrently]

От
Álvaro Herrera
Дата:
On 2025-08-23, Michael Banck wrote:

> On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:

>> I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
>> to pg_repackdb. We can add a similar warning message saying they should use
>> pg_repackdb if the symlink is used.
>
> Unless pg_repack has the same (or a superset of) CLI and behaviour as
> vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
> with a symlink to pg_repack will lead to much more breakage in existing
> scripts/automation than clusterdb, which I guess is used orders of
> magnitude less frequently than vacumdb.

Yeah, I completely disagree with the idea of getting rid of vacuumdb. We can, maybe, in a distant future, get rid of
the--full option to vacuumdb.  But the rest of the vacuumdb behavior must stay, I think, because REPACK is not VACUUM —
itis only VACUUM FULL. And we want to make that distinction very clear. 

We can also, in a few years, get rid of clusterdb.  But I don't think we need to deprecate it just yet.

--
Álvaro Herrera



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello, Antonin!

> When posting version 12 of the patch [1] I raised a concern that the the MVCC
> safety is too expensive when it comes to logical decoding. Therefore, I
> abandoned the concept for now, and v13 [2] uses plain heap_insert(). Once we
> implement the MVCC safety, we simply rewrite the tuple like v12 did - that's
> the simplest way to preserve fields like xmin, cmin, ...

Thanks for the explanation.

I was looking into catalog-related logical decoding features, and it
seems like they are clearly overkill for the repack case.
We don't need CID tracking or even a snapshot for each commit if we’re
okay with passing xmin/xmax as arguments.

What do you think about the following approach for replaying:
* use the extracted XID as the value for xmin/xmax.
* use SnapshotSelf to find the tuple for update/delete operations.

SnapshotSelf seems like a good fit here:
* it sees the last "existing" version.
* any XID set as xmin/xmax in the repacked version is already
committed - so each update/insert is effectively "committed" once
written.
* it works with multiple updates of the same tuple within a single
transaction - SnapshotSelf sees the last version.
* all updates are ordered and replayed sequentially - so the last
version is always the one we want.

If I'm not missing anything, this looks like something worth including
in the patch set.
If so, I can try implementing a test version.

Best regards,
Mikhail



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> I was looking into catalog-related logical decoding features, and it
> seems like they are clearly overkill for the repack case.
> We don't need CID tracking or even a snapshot for each commit if we’re
> okay with passing xmin/xmax as arguments.

I assume you are concerned with the patch part 0005 of the v12 patch
("Preserve visibility information of the concurrent data changes."), aren't
you?

> What do you think about the following approach for replaying:
> * use the extracted XID as the value for xmin/xmax.
> * use SnapshotSelf to find the tuple for update/delete operations.
>
> SnapshotSelf seems like a good fit here:
> * it sees the last "existing" version.
> * any XID set as xmin/xmax in the repacked version is already
> committed - so each update/insert is effectively "committed" once
> written.
> * it works with multiple updates of the same tuple within a single
> transaction - SnapshotSelf sees the last version.
> * all updates are ordered and replayed sequentially - so the last
> version is always the one we want.
>
> If I'm not missing anything, this looks like something worth including
> in the patch set.
> If so, I can try implementing a test version.

Not sure I understand in all details, but I don't think SnapshotSelf is the
correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its
'snapshot' argument at all. Instead, it considers the set of running
transactions as it is at the time the function is called.

One particular problem I imagine is replaying an UPDATE to a row that some
later transaction will eventually delete, but the transaction that ran the
UPDATE obviously had to see it. When looking for the old version during the
replay, HeapTupleSatisfiesMVCC() will find the old version as long as we pass
the correct snapshot to it.

However, at the time we're replaying the UPDATE in the new table, the tuple
may have been already deleted from the old table, and the deleting transaction
may already have committed. In such a case, HeapTupleSatisfiesSelf() will
conclude the old version invisible and the we'll fail to replay the UPDATE.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hi, Antonin!

> I assume you are concerned with the patch part 0005 of the v12 patch
> ("Preserve visibility information of the concurrent data changes."), aren't
> you?

Yes, of course. I got an idea while trying to find a way to optimize it.

> Not sure I understand in all details, but I don't think SnapshotSelf is the
> correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its
> 'snapshot' argument at all. Instead, it considers the set of running
> transactions as it is at the time the function is called.

Yes, and it is almost the same behavior when a typical MVCC snapshot
encounters a tuple created by its own transaction.

So, how it works in the non MVCC-safe case (current patch behaviour):

1) we have a whole initial table snapshot with all the xmin = repack XID
2) appling transaction sees ALL the self-alive (no xmax) tuples in it
because all tuples created\deleted by transaction itself
3) each update/delete during the replay selects the last existing
tuple version, updates it xmax and inserts a new one
4) so, there is no any real MVCC involved - just find the latest
version and create a new version
5) and it works correctly because all ordering issues were resolved by
locking mechanisms on the original table or by reordering buffer

How it maps to MVCC-safe case (SnapshotSelf):

1) we have a whole initial table snapshot with all xmin copied from
the original table. All such xmin are committed.
2) appling transaction sees ALL the self-alive (no xmax) tuple in it
because its xmin\xmax is committed and SnapshotSelf is happy with it
3) each update/delete during the replay selects the last existing
tuple version, updates it xmax=original xid and inserts a new one
keeping with xmin=orignal xid
4) --//--
5) --//--


> However, at the time we're replaying the UPDATE in the new table, the tuple
> may have been already deleted from the old table, and the deleting transaction
> may already have committed. In such a case, HeapTupleSatisfiesSelf() will
> conclude the old version invisible and the we'll fail to replay the UPDATE.

No, it will see it - because its xmax will be empty in the repacked
version of the table.

From your question I feel you understood the concept - but feel free
to ask for an additional explanation/scheme.

Best regards,
Mikhail.



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> > Not sure I understand in all details, but I don't think SnapshotSelf is the
> > correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its
> > 'snapshot' argument at all. Instead, it considers the set of running
> > transactions as it is at the time the function is called.
>
> Yes, and it is almost the same behavior when a typical MVCC snapshot
> encounters a tuple created by its own transaction.
>
> So, how it works in the non MVCC-safe case (current patch behaviour):
>
> 1) we have a whole initial table snapshot with all the xmin = repack XID
> 2) appling transaction sees ALL the self-alive (no xmax) tuples in it
> because all tuples created\deleted by transaction itself
> 3) each update/delete during the replay selects the last existing
> tuple version, updates it xmax and inserts a new one
> 4) so, there is no any real MVCC involved - just find the latest
> version and create a new version
> 5) and it works correctly because all ordering issues were resolved by
> locking mechanisms on the original table or by reordering buffer

ok

> How it maps to MVCC-safe case (SnapshotSelf):
>
> 1) we have a whole initial table snapshot with all xmin copied from
> the original table. All such xmin are committed.
> 2) appling transaction sees ALL the self-alive (no xmax) tuple in it
> because its xmin\xmax is committed and SnapshotSelf is happy with it

How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a
snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ?

> 3) each update/delete during the replay selects the last existing
> tuple version, updates it xmax=original xid and inserts a new one
> keeping with xmin=orignal xid
> 4) --//--
> 5) --//--

> > However, at the time we're replaying the UPDATE in the new table, the tuple
> > may have been already deleted from the old table, and the deleting transaction
> > may already have committed. In such a case, HeapTupleSatisfiesSelf() will
> > conclude the old version invisible and the we'll fail to replay the UPDATE.
>
> No, it will see it - because its xmax will be empty in the repacked
> version of the table.

You're right, it'll be empty in the new table.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Robert Treat
Дата:
On Sat, Aug 23, 2025 at 10:23 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> On 2025-08-23, Michael Banck wrote:
> > On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:
>
> >> I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
> >> to pg_repackdb. We can add a similar warning message saying they should use
> >> pg_repackdb if the symlink is used.
> >
> > Unless pg_repack has the same (or a superset of) CLI and behaviour as
> > vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
> > with a symlink to pg_repack will lead to much more breakage in existing
> > scripts/automation than clusterdb, which I guess is used orders of
> > magnitude less frequently than vacumdb.
>
> Yeah, I completely disagree with the idea of getting rid of vacuumdb. We can, maybe, in a distant future, get rid of
the--full option to vacuumdb.  But the rest of the vacuumdb behavior must stay, I think, because REPACK is not VACUUM —
itis only VACUUM FULL. And we want to make that distinction very clear. 
>

Or to put it the other way, VACUUM FULL is not really VACUUM either,
it is really a form of "repack".

> We can also, in a few years, get rid of clusterdb.  But I don't think we need to deprecate it just yet.
>

Yeah, ISTM the long term goal should be two binaries, one of which
manages aspects of clustering/repacking type of activities, and one
which manages vacuum type activities. I don't think that's different
that what Alvaro is proposing, FWIW my original question was about
confirming that was the end goal, but also trying to understand the
coordination of when these changes would take place, because the
changes to the code, changes to the SQL commands and their docs, and
changes to the command line tools, seem to be working at different
cadences. Which can be fine if it's on purpose, but maybe needs to be
tightened up if not; for example, the current patchset doesn't make
any changes to clusterdb, which one might expect to emit a warning
about being deprecated in favor of pg_repackdb, if not just a complete
punting to use pg_repackdb instead.

Robert Treat
https://xzilla.net



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hi, Antonin

> How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a
> snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ?

Yes, TransactionIdDidCommit. Another option is just invent a new
snapshot type - SnapshotBelieveEverythingCommitted - for that
particular case it should work - because all xmin/xmax written into
the new table are committed by design.



Re: Adding REPACK [concurrently]

От
Robert Treat
Дата:
On Mon, Aug 25, 2025 at 10:15 AM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:
> 1) we have a whole initial table snapshot with all xmin copied from
> the original table. All such xmin are committed.
> 2) appling transaction sees ALL the self-alive (no xmax) tuple in it
> because its xmin\xmax is committed and SnapshotSelf is happy with it
> 3) each update/delete during the replay selects the last existing
> tuple version, updates it xmax=original xid and inserts a new one
> keeping with xmin=orignal xid
> 4) --//--
> 5) --//--
>

Advancing the tables min xid to at least repack XID is a pretty big
feature, but the above scenario sounds like it would result in any
non-modified pre-existing tuples ending up with their original xmin
rather than repack XID, which seems like it could lead to weird
side-effects. Maybe I am mis-thinking it though?

Robert Treat
https://xzilla.net



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Robert Treat <rob@xzilla.net> wrote:

> On Mon, Aug 25, 2025 at 10:15 AM Mihail Nikalayeu
> <mihailnikalayeu@gmail.com> wrote:
> > 1) we have a whole initial table snapshot with all xmin copied from
> > the original table. All such xmin are committed.
> > 2) appling transaction sees ALL the self-alive (no xmax) tuple in it
> > because its xmin\xmax is committed and SnapshotSelf is happy with it
> > 3) each update/delete during the replay selects the last existing
> > tuple version, updates it xmax=original xid and inserts a new one
> > keeping with xmin=orignal xid
> > 4) --//--
> > 5) --//--
> >
>
> Advancing the tables min xid to at least repack XID is a pretty big
> feature, but the above scenario sounds like it would result in any
> non-modified pre-existing tuples ending up with their original xmin
> rather than repack XID, which seems like it could lead to weird
> side-effects. Maybe I am mis-thinking it though?

What we discuss here is how to keep visibility information of tuples (xmin,
xmax, ...) unchanged. Both CLUSTER and VACUUM FULL already do that. However
it's not trivial to ensure that REPACK with the CONCURRENTLY option does as
well.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Hi, Antonin
>
> > How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a
> > snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ?
>
> Yes, TransactionIdDidCommit.

I think the problem is that HeapTupleSatisfiesSelf() uses
TransactionIdIsInProgress() instead of checking the snapshot:

        ...
        else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
                return false;
        else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
        ...

When decoding (and replaying) data changes, you deal with the database state
as it was (far) in the past. However TransactionIdIsInProgress() is not
suitable for this purpose.

And since CommitTransaction() updates the commit log before removing the
transaction from ProcArray, I can even imagine race conditions: if a
transaction is committed and decoded fast enough, TransactionIdIsInProgress()
might still return true. In such a case, HeapTupleSatisfiesSelf() returns
false instead of calling TransactionIdDidCommit().

> Another option is just invent a new
> snapshot type - SnapshotBelieveEverythingCommitted - for that
> particular case it should work - because all xmin/xmax written into
> the new table are committed by design.

I'd prefer optimization of the logical decoding for REPACK CONCURRENTLY, and
using the MVCC snapshots.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Antonin Houska <ah@cybertec.at>:
> I think the problem is that HeapTupleSatisfiesSelf() uses
> TransactionIdIsInProgress() instead of checking the snapshot:

Yes, some issues might be possible for SnapshotSelf.
Possible solution is to override TransactionIdIsCurrentTransactionId
to true (like you did with nParallelCurrentXids but just return true).
IIUC, in that case all checks are going to behave the same way as in v5 version.

> I'd prefer optimization of the logical decoding for REPACK CONCURRENTLY, and
> using the MVCC snapshots.

It is also possible, but it is much more complex and feels like overkill to me.
We need just a way to find the latest version of row in the world of
all-committed transactions without any concurrent writers - I am
pretty sure it is possible to achieve in a more simple and effective
way.



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Antonin Houska <ah@cybertec.at>:
> > I think the problem is that HeapTupleSatisfiesSelf() uses
> > TransactionIdIsInProgress() instead of checking the snapshot:
>
> Yes, some issues might be possible for SnapshotSelf.
> Possible solution is to override TransactionIdIsCurrentTransactionId
> to true (like you did with nParallelCurrentXids but just return true).
> IIUC, in that case all checks are going to behave the same way as in v5 version.

I assume you mean v12-0005. Yes, that modifies
TransactionIdIsCurrentTransactionId(), so that the the transaction being
replayed recognizes if it (or its subtransaction) performed particular change
itself.

Although it could work, I think it'd be confusing to consider the transactions
being replayed as "current" from the point of view of the backend that
executes REPACK CONCURRENTLY.

But the primary issue is that in v12-0005,
TransactionIdIsCurrentTransactionId() gets the information on "current
transactions" from snapshots - see the calls of SetRepackCurrentXids() before
each scan. It's probably not what you want.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Antonin Houska <ah@cybertec.at>:

> Although it could work, I think it'd be confusing to consider the transactions
> being replayed as "current" from the point of view of the backend that
> executes REPACK CONCURRENTLY.

Just realized SnapshotDirty is the thing that fits into the role - it
respects not-yet committed transactions, giving enough information to
wait for them.
It is already used in a similar pattern in
check_exclusion_or_unique_constraint and RelationFindReplTupleByIndex.

So, it is easy to detect the case of the race you described previously
and retry + there is no sense to hack around
TransactionIdIsCurrentTransactionId.

BWT, btree + SnapshotDirty has issue [0], but it is a different story
and happens only with concurrent updates which are not present in the
current scope.

[0]:
https://www.postgresql.org/message-id/flat/CADzfLwXGhH_qD6RGqPyEeKdmHgr-HpA-tASYdi5onP%2BRyP5TCw%40mail.gmail.com#77f6426ef2d282198f2d930d5334e3fa



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Antonin Houska <ah@cybertec.at>:
>
> > Although it could work, I think it'd be confusing to consider the transactions
> > being replayed as "current" from the point of view of the backend that
> > executes REPACK CONCURRENTLY.
>
> Just realized SnapshotDirty is the thing that fits into the role - it
> respects not-yet committed transactions, giving enough information to
> wait for them.
> It is already used in a similar pattern in
> check_exclusion_or_unique_constraint and RelationFindReplTupleByIndex.
>
> So, it is easy to detect the case of the race you described previously
> and retry + there is no sense to hack around
> TransactionIdIsCurrentTransactionId.

Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is
visible? TransactionIdIsCurrentTransactionId() will not do w/o the
modifications that you proposed earlier [1] and TransactionIdIsInProgress() is
not suitable as I explained in [2].

I understand your idea that there are no transaction aborts in the new table,
which makes things simpler. I cannot judge if it's worth inventing a new kind
of snapshot. Anyway, I think you'd then also need to hack
HeapTupleSatisfiesUpdate(). Isn't that too invasive?

[1] https://www.postgresql.org/message-id/CADzfLwUqyOmpkLmciecBy4aBN1sohQVZ2Hgc6m-tjSUqDRHwyQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/24483.1756142534%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello, Antonin!

Antonin Houska <ah@cybertec.at>:
>
> Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is
> visible? TransactionIdIsCurrentTransactionId() will not do w/o the
> modifications that you proposed earlier [1] and TransactionIdIsInProgress() is
> not suitable as I explained in [2].

HeapTupleSatisfiesDirty is designed to respect even not-yet-committed
transactions and provides additional related information.

    else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
    {
       /*
        * Return the speculative token to caller.  Caller can worry about
        * xmax, since it requires a conclusively locked row version, and
        * a concurrent update to this tuple is a conflict of its
        * purposes.
        */
       if (HeapTupleHeaderIsSpeculative(tuple))
       {
          snapshot->speculativeToken =
             HeapTupleHeaderGetSpeculativeToken(tuple);

          Assert(snapshot->speculativeToken != 0);
       }

       snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
       /* XXX shouldn't we fall through to look at xmax? */
       return true;      /* in insertion by other */
    }

So, it returns true when TransactionIdIsInProgress is true.
However, that alone is not sufficient to trust the result in the common case.

You may check check_exclusion_or_unique_constraint or
RelationFindReplTupleByIndex for that pattern:
if xmin is set in the snapshot (a special hack in SnapshotDirty to
provide additional information from the check), we wait for the
ongoing transaction (or one that is actually committed but not yet
properly reflected in the proc array), and then retry the entire tuple
search.

So, the race condition you explained in [2] will be resolved by a
retry, and the changes to TransactionIdIsInProgress described in [1]
are not necessary.

> I understand your idea that there are no transaction aborts in the new table,
> which makes things simpler. I cannot judge if it's worth inventing a new kind
> of snapshot. Anyway, I think you'd then also need to hack
> HeapTupleSatisfiesUpdate(). Isn't that too invasive?

It seems that HeapTupleSatisfiesUpdate is also fine as it currently
exists (we'll see the committed version after retry)..

The solution appears to be non-invasive:
* uses the existing snapshot type
* follows the existing usage pattern
* leaves TransactionIdIsInProgress and HeapTupleSatisfiesUpdate unchanged

The main change is that xmin/xmax values are forced from the arguments
- but that seems unavoidable in any case.

I'll try to make some kind of prototype this weekend + cover race
condition you mentioned in specs.
Maybe some corner cases will appear.

By the way, there's one more optimization we could apply in both
MVCC-safe and non-MVCC-safe cases: setting the HEAP_XMIN_COMMITTED /
HEAP_XMAX_COMMITTED bit in the new table:
* in the MVCC-safe approach, the transaction is already committed.
* in the non-MVCC-safe case, it isn’t committed yet - but no one will
examine that bit before it commits (though this approach does feel
more fragile).

This could help avoid potential storms of full-page writes caused by
SetHintBit after the table switch.

Best regards,
Mikhail.



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Hello, Antonin!
>
> Antonin Houska <ah@cybertec.at>:
> >
> > Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is
> > visible? TransactionIdIsCurrentTransactionId() will not do w/o the
> > modifications that you proposed earlier [1] and TransactionIdIsInProgress() is
> > not suitable as I explained in [2].
>
> HeapTupleSatisfiesDirty is designed to respect even not-yet-committed
> transactions and provides additional related information.
>
>     else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
>     {
>        /*
>         * Return the speculative token to caller.  Caller can worry about
>         * xmax, since it requires a conclusively locked row version, and
>         * a concurrent update to this tuple is a conflict of its
>         * purposes.
>         */
>        if (HeapTupleHeaderIsSpeculative(tuple))
>        {
>           snapshot->speculativeToken =
>              HeapTupleHeaderGetSpeculativeToken(tuple);
>
>           Assert(snapshot->speculativeToken != 0);
>        }
>
>        snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
>        /* XXX shouldn't we fall through to look at xmax? */
>        return true;      /* in insertion by other */
>     }
>
> So, it returns true when TransactionIdIsInProgress is true.
> However, that alone is not sufficient to trust the result in the common case.
>
> You may check check_exclusion_or_unique_constraint or
> RelationFindReplTupleByIndex for that pattern:
> if xmin is set in the snapshot (a special hack in SnapshotDirty to
> provide additional information from the check), we wait for the
> ongoing transaction (or one that is actually committed but not yet
> properly reflected in the proc array), and then retry the entire tuple
> search.
>
> So, the race condition you explained in [2] will be resolved by a
> retry, and the changes to TransactionIdIsInProgress described in [1]
> are not necessary.

I insist that this is a misuse of TransactionIdIsInProgress(). When dealing
with logical decoding, only WAL should tell whether particular transaction is
still running. AFAICS this is how reorderbuffer.c works.

A new kind of snapshot seems like (much) cleaner solution at the moment.

> I'll try to make some kind of prototype this weekend + cover race
> condition you mentioned in specs.
> Maybe some corner cases will appear.

No rush. First, the MVCC safety is not likely to be included in v19
[1]. Second, I think it's good to let others propose their ideas before
writing code.

> By the way, there's one more optimization we could apply in both
> MVCC-safe and non-MVCC-safe cases: setting the HEAP_XMIN_COMMITTED /
> HEAP_XMAX_COMMITTED bit in the new table:
> * in the MVCC-safe approach, the transaction is already committed.
> * in the non-MVCC-safe case, it isn’t committed yet - but no one will
> examine that bit before it commits (though this approach does feel
> more fragile).
>
> This could help avoid potential storms of full-page writes caused by
> SetHintBit after the table switch.

Good idea, thanks.

[1] https://www.postgresql.org/message-id/202504040733.ysuy5gad55md%40alvherre.pgsql

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Antonin Houska <ah@cybertec.at>:

> I insist that this is a misuse of TransactionIdIsInProgress(). When dealing
> with logical decoding, only WAL should tell whether particular transaction is
> still running. AFAICS this is how reorderbuffer.c works.

Hm... Maybe, but at the same time we already have SnapshotDirty used
in that way and it even deals with the same race....
But I agree - a special kind of snapshot is a more accurate solution.

> A new kind of snapshot seems like (much) cleaner solution at the moment.

Do you mean some kind of snapshot which only uses
TransactionIdDidCommit/Abort ignoring
TransactionIdIsCurrentTransactionId/TransactionIdIsInProgress?
Actually it behaves like SnapshotBelieveEverythingCommitted in that
particular case, but TransactionIdDidCommit/Abort may be used as some
kind of assert/error source to be sure everything is going as
designed.
And, yes, for the new snapshot we need to have
HeapTupleSatisfiesUpdate to be modified.

Also, to deal with that particular race we may just use
XactLockTableWait(xid, NULL, NULL, XLTW_None) before starting
transaction replay.

> No rush. First, the MVCC safety is not likely to be included in v19 [1].

That worries me - it is not the behaviour someone expects from a
database by default. At least the warning should be much more visible
and obvious.
I think most of user will expect the same guarantees as [CREATE|RE]
INDEX CONCURRENTLY provides.



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> > A new kind of snapshot seems like (much) cleaner solution at the moment.
> 
> Do you mean some kind of snapshot which only uses
> TransactionIdDidCommit/Abort ignoring
> TransactionIdIsCurrentTransactionId/TransactionIdIsInProgress?
> Actually it behaves like SnapshotBelieveEverythingCommitted in that
> particular case, but TransactionIdDidCommit/Abort may be used as some
> kind of assert/error source to be sure everything is going as
> designed.

Given that there should be no (sub)transaction aborts in the new table, I
think you only need to check that the tuple has valid xmin and invalid xmax.

I think the XID should be in the commit log at the time the transaction is
being replayed, so it should be legal to use TransactionIdDidCommit/Abort in
Assert() statements. (And as long as REPACK CONCURRENTLY will use
ShareUpdateExclusiveLock, which conflicts with VACUUM, pg_class(relfrozenxid)
for given table should not advance during the processing, and therefore the
replayed XIDs should not be truncated from the commit log while REPACK
CONCURRENTLY is running.)

> And, yes, for the new snapshot we need to have
> HeapTupleSatisfiesUpdate to be modified.
> 
> Also, to deal with that particular race we may just use
> XactLockTableWait(xid, NULL, NULL, XLTW_None) before starting
> transaction replay.

Do you mean the race related to TransactionIdIsInProgress()? Not sure I
understand, as you suggested above that you no longer need the function.

> > No rush. First, the MVCC safety is not likely to be included in v19 [1].
> 
> That worries me - it is not the behaviour someone expects from a
> database by default. At least the warning should be much more visible
> and obvious.
> I think most of user will expect the same guarantees as [CREATE|RE]
> INDEX CONCURRENTLY provides.

It does not really worry me. The pg_squeeze extension is not MVCC-safe and I
remember there were only 1 or 2 related complaints throughout its
existence. (pg_repack isn't MVCC-safe as well, but I don't keep track of its
issues.)

Of course, user documentation should warn about the problem, in a way it does
for other commands (typically ALTER TABLE).

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Antonin Houska <ah@cybertec.at>:

> Do you mean the race related to TransactionIdIsInProgress()? Not sure I
> understand, as you suggested above that you no longer need the function.

The "lightweight" approaches I see so far:
* XactLockTableWait before replay + SnapshotSelf(GetLatestSnapshot?)
* SnapshotDirty + retry logic
* SnapshotBelieveEverythingCommitted + modification of
HeapTupleSatisfiesUpdate (because it called by heap_update and looks
into TransactionIdIsInProgress)

> It does not really worry me. The pg_squeeze extension is not MVCC-safe and I
> remember there were only 1 or 2 related complaints throughout its
> existence. (pg_repack isn't MVCC-safe as well, but I don't keep track of its
> issues.)

But pg_squeeze and pg_repack are extensions. If we are moving that
mechanics into core I'd expect some improvements over pg_squeeze.
MVCC-safety of REINDEX CONCURRENTLY makes it possible to run it on a
regular basis as some kind of background job. It would be nice to have
something like this for the heap.

I agree the initial approach is too invasive, complex and
performance-heavy to push it forward now.
But, any of "lightweight" feels like a good candidate to be shipped
with the feature itself - relatively easy and non-invasive.

Best regards,
Mikhail.



Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
On 2025-Aug-21, Mihail Nikalayeu wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org>:

> > I proposed to leave this part out initially, which is why it hasn't been
> > reposted.  We can review and discuss after the initial patches are in.
> 
> I think it is worth pushing it at least in the same release cycle.

If others are motivated enough to certify it, maybe we can consider it.
But I don't think I'm going to have time to get this part reviewed and
committed in time for 19, so you might need another committer.

> > Because having an MVCC-safe mode has drawbacks, IMO we should make it
> > optional.
>
> Do you mean some option for the command? Like REPACK (CONCURRENTLY, SAFE)?

Yes, exactly that.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La grandeza es una experiencia transitoria.  Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello, Álvaro!

> If others are motivated enough to certify it, maybe we can consider it.
> But I don't think I'm going to have time to get this part reviewed and
> committed in time for 19, so you might need another committer.

I don't think it is realistic to involve another committer - it is
just a well-known curse of all non-committers :)

> > > Because having an MVCC-safe mode has drawbacks, IMO we should make it
> > > optional.

As far as I can see, the proposed "lightweight" solutions don't
introduce any drawbacks - unless something has been overlooked.

> > Do you mean some option for the command? Like REPACK (CONCURRENTLY, SAFE)?
> Yes, exactly that.

To be honest that approach feels a little bit strange for me. I work
in the database-consumer (not database-developer) industry and 90% of
DevOps engineers (or similar roles who deal with database maintenance
now) have no clue what MVCC is - and it is industry standard nowadays.

From my perspective - it is better to have a less performant, but
MVCC-safe approach by default, with some "more performance, less
safety" flag for those who truly understand the trade-offs.
All kinds of safety and correctness is probably the main reason to use
classic databases instead of storing data in s3\elastis\mongo\etc
these days.

In case of some incident related to that (in a large well-known
company) the typical takeaway for readers of tech blogs will simply be
"some command in Postgres is broken". And maybe also "the database
with a name starting with 'O' is not affected by that flaw".

Yes, some ALTER table-rewriting commands are not MVCC-safe, but those
typically block everything for hours - so they're avoided unless
absolutely necessary, and usually performed during backend outages to
prevent systems from getting stuck waiting on millions of locks.

"CONCURRENTLY" is something that feels like a "working in background,
don't worry, do not stop your 100k RPS" thing, especially in Postgres
because of CIC.

I also have personal experience debugging incidents caused by
incorrect Postgres behavior - and it’s absolute hell.

Sorry, I made a drama here....

I fully understand resource limitations... Maybe Postgres 19 could
introduce something like REPACK (CONCURRENTLY, UNSAFE) first, and
later add a safer REPACK (CONCURRENTLY)?

Best regards,
Mikhail.



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> In case of some incident related to that (in a large well-known
> company) the typical takeaway for readers of tech blogs will simply be
> "some command in Postgres is broken".

For _responsible_ users, the message will rather be that "some tech bloggers
do not bother to read user documentation".

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
Hello,

Here's v19 of this patchset.  This is mostly Antonin's v18.  I added a
preparatory v19-0001 commit, which splits vacuumdb.c to create a new
file, vacuuming.c (and its header file vacuuming.h).  If you look at it
under 'git show --color-moved=zebra' you should notice that most of it
is just code movement; there's hardly any code changes.

v19-0002 has absorbed Antonin's v18-0005 (the pg_repackdb binary)
together with the introduction of the REPACK command proper; but instead
of using a symlink, I just created a separate pg_repackdb.c source file
for it and we compile that small new source file with vacuuming.c to
create a regular binary.  BTW the meson.build changes look somewhat
duplicative; maybe there's a less dumb way to go about this.  (For
instance, maybe just have libscripts.a include vacuuming.o, though it's
not used by any of the other programs in that subdir.)

I'm not wedded to the name "vacuuming.c"; happy to take suggestions.

After 0002, the pg_repackdb utility should be ready to take clusterdb's
place, and also vacuumdb --full, with one gotcha: if you try to use
pg_repackdb with an older server version, it will fail, claiming that
REPACK is not supported.  This is not ideal.  Instead, we should make it
run VACUUM FULL (or CLUSTER); so if you have a fleet including older
servers you can use the new utils there too.

All the logic for vacuumdb to select tables to operate on has been moved
to vacuuming.c verbatim.  This means this logic applies to pg_repackdb
as well.  As long as you stick to repacking a single table this is okay
(read: it won't be used at all), but if you want to use parallel mode
(say to process multiple schemas), we might need to change it.  For the
same reason, I think we should add an option to it (--index[=indexname])
to select whether to use the USING INDEX clause or not, and optionally
indicate which index to use; right now there's no way to select which
logic (cluster's or vacuum full's) to use.


Then v19-0003 through v19-0005 are Antonin's subsequent patches to add
the CONCURRENTLY option; I have not reviewed these at all, so I'm
including them here just for completion.  I also included v18-0006 as
posted by Mihail previously, though I have little faith that we're going
to include it in this release.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

Вложения

Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
Apparently I mismerged src/bin/scripts/meson.build.  This v20 is
identical to v19, where that mistake has been corrected.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".

Вложения

Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello!

I started an attempt to make a "lightweight" MVCC-safe prototype and
stuck into the "it is not working" issue.
After some debugging I realized Antonin's variant (catalog-mode based)
seems to be broken also...

And after a few more hours I realized non-MVCC is broken as well :)

This is a patch with a test to reproduce the issue related to repack +
concurrent modifications.
Seems like some updates may be lost.

I hope the patch logic is clear - but feel free to ask if not.

Best regards,
Mikhail.

Вложения

Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
On 2025-Aug-31, Mihail Nikalayeu wrote:

> I started an attempt to make a "lightweight" MVCC-safe prototype and
> stuck into the "it is not working" issue.
> After some debugging I realized Antonin's variant (catalog-mode based)
> seems to be broken also...
> 
> And after a few more hours I realized non-MVCC is broken as well :)

Ugh.  Well, obviously we need to get this fixed if we want CONCURRENTLY
at all :-)

Please don't post patches that aren't the commitfest item's main patch
as attachment with .patch extension.  This confuses the CFbot into
thinking your patch is the patch-of-record (which it isn't) and reports
that the patch fails CI.  See here:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5117
(For the same reason, it isn't useful to number them as if they were
part of the patch series).

If you want to post secondary patches, please rename them to end in
something like .txt or .nocfbot or whatever.  See here:
https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches?

Thanks for your interest in this topic,

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: Adding REPACK [concurrently]

От
Michael Paquier
Дата:
On Wed, Aug 27, 2025 at 10:22:24AM +0200, Mihail Nikalayeu wrote:
> That worries me - it is not the behaviour someone expects from a
> database by default. At least the warning should be much more visible
> and obvious.
> I think most of user will expect the same guarantees as [CREATE|RE]
> INDEX CONCURRENTLY provides.

Having a unified path for the handling of the waits and the locking
sounds to me like a pretty good argument in favor of a basic
implementation.

In my experience, users do not really care about the time it takes to
complete a operation involving CONCURRENTLY if we allow concurrent
reads and writes in parallel of it.  I have not looked at the proposal
in details, but before trying a more folkloric MVCC approach, relying
on basics that we know have been working for some time seems like a
good and sufficient initial step in terms of handling the waits and
the locks with table AMs (aka heap or something else).

Just my 2c.
--
Michael

Вложения

Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Hello!
> 
> I started an attempt to make a "lightweight" MVCC-safe prototype and
> stuck into the "it is not working" issue.
> After some debugging I realized Antonin's variant (catalog-mode based)
> seems to be broken also...
> 
> And after a few more hours I realized non-MVCC is broken as well :)
> 
> This is a patch with a test to reproduce the issue related to repack +
> concurrent modifications.
> Seems like some updates may be lost.
> 
> I hope the patch logic is clear - but feel free to ask if not.

Are you sure the test is complete? I see no occurrence of the REPACK command
in it.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello!

Antonin Houska <ah@cybertec.at>:
> Are you sure the test is complete? I see no occurrence of the REPACK command
> in it.
Oops, send invalid file. The correct one in attachment.

Вложения

Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello, Álvaro!

Alvaro Herrera <alvherre@alvh.no-ip.org>:
> If you want to post secondary patches, please rename them to end in
> something like .txt or .nocfbot or whatever.  See here:
> https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches?

Sorry, I missed that.
But now it is possible to send ".patch" without changing the extension [0].

> It also ignores any files that start with "nocfbot".

[0]: https://discord.com/channels/1258108670710124574/1328362897189113867/1412021226528051250



Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Antonin Houska <ah@cybertec.at>:
> > Are you sure the test is complete? I see no occurrence of the REPACK command
> > in it.
> Oops, send invalid file. The correct one in attachment.

Thanks!

The problem was that when removing the original "preserve visibility patch"
v12-0005 [1] from the series, I forgot to change the value of
'need_full_snapshot' argument of CreateInitDecodingContext().

v12 and earlier treated the repacked table like system catalog, so it was
o.k. to pass need_full_snapshot=false. However, it must be true now, otherwise
the snapshot created for the initial copy does not see commits of transactions
that do not change regular catalogs.

The fix is as simple as

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index f481a3cec6d..7866ac01278 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -502,6 +502,7 @@ SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
        StringInfo      buf = makeStringInfo();

        Assert(builder->state == SNAPBUILD_CONSISTENT);
+       Assert(builder->building_full_snapshot);

        snap = SnapBuildBuildSnapshot(builder);


I'll apply it to the next version of the "Add CONCURRENTLY option to REPACK
command" patch.


[1] https://www.postgresql.org/message-id/flat/CAFj8pRDK89FtY_yyGw7-MW-zTaHOCY4m6qfLRittdoPocz+dMQ@mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello!

Antonin Houska <ah@cybertec.at>:
> I'll apply it to the next version of the "Add CONCURRENTLY option to REPACK
> command" patch.
I have added it to the v21 patchset.

Also, I’ve updated the MVCC-safe patch:
* it uses the "XactLockTableWait before replay + SnapshotSelf" approach from [0]
* it includes a TAP test to ensure MVCC safety - not intended to be
committed in its current form (too heavy)
* documentation has been updated.

It's now much simpler and does not negatively impact performance. It
is less aggressive in tuple freezing, but can be updated to match the
non-MVCC-safe version if needed.

While testing MVCC-safe version with stress-tests
007_repack_concurrently_mvcc.pl I encountered some random crashes with
such logs:

25-09-02 12:24:40.039 CEST client backend[261907]
007_repack_concurrently_mvcc.pl ERROR:  relcache reference
0x7715b9f394a8 is not owned by resource owner TopTransaction
2025-09-02 12:24:40.039 CEST client backend[261907]
007_repack_concurrently_mvcc.pl STATEMENT:  REPACK (CONCURRENTLY) tbl1
USING INDEX tbl1_pkey;
TRAP: failed Assert("rel->rd_refcnt > 0"), File:
"../src/backend/utils/cache/relcache.c", Line: 6992, PID: 261907
postgres: CIC_test: nkey postgres [local]
REPACK(ExceptionalCondition+0xbe)[0x5b7ac41d79f9]
postgres: CIC_test: nkey postgres [local] REPACK(+0x852d2e)[0x5b7ac41cbd2e]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aa4a6)[0x5b7ac42234a6]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aad3b)[0x5b7ac4223d3b]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aac69)[0x5b7ac4223c69]
postgres: CIC_test: nkey postgres [local]
REPACK(ResourceOwnerRelease+0x32)[0x5b7ac4223c26]
postgres: CIC_test: nkey postgres [local] REPACK(+0x1f43bf)[0x5b7ac3b6d3bf]
postgres: CIC_test: nkey postgres [local] REPACK(+0x1f4dfa)[0x5b7ac3b6ddfa]
postgres: CIC_test: nkey postgres [local]
REPACK(AbortCurrentTransaction+0xe)[0x5b7ac3b6dd6b]
postgres: CIC_test: nkey postgres [local]
REPACK(PostgresMain+0x57d)[0x5b7ac3fd7238]
postgres: CIC_test: nkey postgres [local] REPACK(+0x654102)[0x5b7ac3fcd102]
postgres: CIC_test: nkey postgres [local]
REPACK(postmaster_child_launch+0x191)[0x5b7ac3eceb7a]
postgres: CIC_test: nkey postgres [local] REPACK(+0x55c8c1)[0x5b7ac3ed58c1]
postgres: CIC_test: nkey postgres [local] REPACK(+0x559d1e)[0x5b7ac3ed2d1e]
postgres: CIC_test: nkey postgres [local]
REPACK(PostmasterMain+0x168a)[0x5b7ac3ed25f8]
postgres: CIC_test: nkey postgres [local] REPACK(main+0x3a1)[0x5b7ac3da2bd6]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7715b962a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7715b962a28b]

This time I was clever and tried to attempt to reproduce the issue on
a non-MVCC safe version at first - and it is reproducible.

Just comment \if :p_t1 != :p_t2 (and its internals, because they
catching non-mvcc behaviour which is expected without 0006 patch); and
set
'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=25000'

It takes about a minute on my PC to get the crash.

[0]:
https://www.postgresql.org/message-id/flat/CADzfLwXCTXNdxK-XGTKmObvT%3D_QnaCviwgrcGtG9chsj5sYzrg%40mail.gmail.com#6d6ccf34e6debcd386d735db593379d8

Best regards,
Mikhail.

Вложения

Re: Adding REPACK [concurrently]

От
Antonin Houska
Дата:
Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> While testing MVCC-safe version with stress-tests
> 007_repack_concurrently_mvcc.pl I encountered some random crashes with
> such logs:
> 
> 25-09-02 12:24:40.039 CEST client backend[261907]
> 007_repack_concurrently_mvcc.pl ERROR:  relcache reference
> 0x7715b9f394a8 is not owned by resource owner TopTransaction
> ...
> This time I was clever and tried to attempt to reproduce the issue on
> a non-MVCC safe version at first - and it is reproducible.

Thanks again for a thorough testing!

I think this should be fixed separately [1].

[1] https://www.postgresql.org/message-id/119497.1756892972%40localhost

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: Adding REPACK [concurrently]

От
Alvaro Herrera
Дата:
Hello,

Barring further commentary, I intend to get 0001 committed tomorrow, and
0002 some time later -- perhaps by end of this week, or sometime next
week.

Regards

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: Adding REPACK [concurrently]

От
Álvaro Herrera
Дата:
After looking at this some more, I realized that 0001 had been written a
bit too hastily and that it could use with some more cleanup -- in
particular, we don't need to export most of the function prototypes
other than vacuuming_main() (and the trivial escape_quotes helper).  I
made the other functions static.  Also, prepare_vacuum_command() also
needs the encoding in order to do fmtIdEnc() on a given index name (for
`pg_repackdb -t table --index=foobar`), so I changed it to take the
PGconn instead of just the serverVersion.  I realized that it makes no
sense that objfilter is a global variable instead of living inside
`main` and be passed as argument where needed.  (Heck, maybe it should
be inside vacuumingOpts).  Lastly, it seemed weird coding that the
functions would sometimes exit(1) instead of returning a result code, so
I made them do that and have the callers react appropriately.  These are
all fairly straightforward changes.

So here's v22 with those and rebased to current sources.  Only the first
two patches this time, which are the ones I would be glad to receive
input on.

I also wonder if analyze_only and analyze_in_stages should be new values
in RunMode rather than separate booleans ... I think that might make the
code simpler.  I didn't try though.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)

Вложения

Re: Adding REPACK [concurrently]

От
Marcos Pegoraro
Дата:
Em qui., 25 de set. de 2025 às 15:12, Álvaro Herrera <alvherre@kurilemu.de> escreveu:

Some typos I've found on usage of pg_repackdb.

+ printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
+ printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
both options can point to a single schema, so "(s)" should be removed.
"in the specified schema(s)" should be "in the specified schema"

Same occurs on this one, which should be table, not table(s)
+ printf(_("  -t, --table='TABLE'             repack specific table(s) only\n"));

regards
Marcos

Re: Adding REPACK [concurrently]

От
Robert Treat
Дата:
On Thu, Sep 25, 2025 at 4:21 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> Em qui., 25 de set. de 2025 às 15:12, Álvaro Herrera <alvherre@kurilemu.de> escreveu:
>
> Some typos I've found on usage of pg_repackdb.
>
> + printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
> + printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
> both options can point to a single schema, so "(s)" should be removed.
> "in the specified schema(s)" should be "in the specified schema"
>
> Same occurs on this one, which should be table, not table(s)
> + printf(_("  -t, --table='TABLE'             repack specific table(s) only\n"));
>

This pattern is used because you can pass more than one argument, for
example, something like

pg_repackdb -d pagila -v -n public -n legacy

While I agree that the wording is a little awkward; I'd prefer "repack
tables only in the specified schema(s)"; but this follows the same
pattern as pg_dump and friends.

Robert Treat
https://xzilla.net



Re: Adding REPACK [concurrently]

От
Marcos Pegoraro
Дата:
Em qui., 25 de set. de 2025 às 18:31, Robert Treat <rob@xzilla.net> escreveu:
This pattern is used because you can pass more than one argument, for
example, something like

I know that

While I agree that the wording is a little awkwardthis follows the same
pattern as pg_dump and friends.

well, I think pg_dump looks wrong too. Because if you explain that it's a 
single table or single schema on docs, why you write on plural on usage ?
+        Repack or analyze all tables in
+        <replaceable class="parameter">schema</replaceable> only.  Multiple
+        schemas can be repacked by writing multiple <option>-n</option>
+        switches.

instead of 
+ printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
maybe this ?
+ printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema, can be used several times\n"));

regards
Marcos

Re: Adding REPACK [concurrently]

От
Mihail Nikalayeu
Дата:
Hello!

Álvaro Herrera <alvherre@kurilemu.de>:
> So here's v22 with those and rebased to current sources.  Only the first
> two patches this time, which are the ones I would be glad to receive
> input on.

> get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
>                          Oid relid, bool rel_is_index)

Should we rename it to repack_context to be aligned with the calling side?

---------
'cmd' in

> static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
>                           MemoryContext permcxt);

but 'command' in

> get_tables_to_repack(RepackCommand command, bool usingindex,
>                 MemoryContext permcxt)

---------

> cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",

May be changed to RepackCommandAsString

-----------

if (cmd == REPACK_COMMAND_REPACK)
    pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
                          PROGRESS_REPACK_COMMAND_REPACK);
else if (cmd == REPACK_COMMAND_CLUSTER)
{
    pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
                          PROGRESS_CLUSTER_COMMAND_CLUSTER);
} else ....

'{' and '}' looks a little bit weird.

--------
Documentation of pg_repackdb contains a lot of "analyze" and even
"--analyze" parameter - but I can't see anything related in the code.


Best regards,
Mikhail.



Re: Adding REPACK [concurrently]

От
Robert Treat
Дата:
On Thu, Sep 25, 2025 at 2:12 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> So here's v22 with those and rebased to current sources.  Only the first
> two patches this time, which are the ones I would be glad to receive
> input on.
>

A number of small issues I noticed. I don't know that they all need
addressing right now, but seems worth asking the questions...

#1
"pg_repackdb --help" does not mention the --index option, although the
flag is accepted. I'm not sure if this is meant to match clusterdb,
but since we need the index option to invoke the clustering behavior,
I think it needs to be there.

#2
[xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
--index=idx_last_name
pg_repackdb: repacking database "pagila"
INFO:  clustering "public.customer" using sequential scan and sort

[xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
pg_repackdb: repacking database "pagila"
INFO:  vacuuming "public.customer"

This was less confusing once I figured out we could pass the --index
option, but even with that it is a little confusing, I think mostly
because it looks like we are "vacuuming" the table, which in a world
of repack and vacuum (ie. no vacuum full) doesn't make sense. I think
the right thing to do here would be to modify it to be "repacking %s"
in both cases, with the "using sequential scan and sort" as the means
to understand which version of repack is being executed.

#3
pg_repackdb does not offer an --analyze option, which istm it should
to match the REPACK command

#4
SQL level REPACK help shows:

where option can be one of:
    VERBOSE [ boolean ]
    ANALYSE | ANALYZE

but SQL level VACUUM does
    VERBOSE [ boolean ]
    ANALYZE [ boolean ]

These operate the same way, so I would expect it to match the language
in vacuum.

#5
[xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index
pg_repackdb: repacking database "pagila"

In the above scenario, I am repacking without having previously
specified an index. At the SQL level this would throw an error, at the
command line it gives me a heart attack. :-)
It's actually not that bad, because we don't actually do anything, but
maybe we should throw an error?

#6
On the individual command pages (like sql-repack.html), I think there
should be more cross-linking, ie. repack should probably say "see also
cluster" and vice versa. Likely similarly with vacuum and repack.

#7
Is there some reason you chose to intermingle the repack regression
tests with the existing tests? I feel like it'd be easier to
differentiate potential regressions and new functionality if these
were separated.


Robert Treat
https://xzilla.net