Обсуждение: Adding REPACK [concurrently]

Поиск

Список

Период

Сортировка

Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

27 июля, 00:56:04

Hello,

Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
This is coming from [1].  The ultimate goal is to have an in-core tool
to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
FULL does that, but it's not concurrent.  Users have resorted to using
the pg_repack third-party tool, which is ancient and uses a weird
internal implementation, as well as pg_squeeze, which uses logical
decoding to capture changes that occur during the table rewrite.  The
patch submitted here, largely by Antonin Houska with some changes by me,
is based on the the pg_squeeze code which he authored, and first
introduces a new command called REPACK to absorb both VACUUM FULL and
CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
of REPACK to operate online using logical decoding.

Essentially, this first patch just reshuffles the CLUSTER code to create
the REPACK command.

I made a few changes from Antonin's original at [2].  First, I modified
the grammar to support "REPACK [tab] USING INDEX" without specifying the
index name.  With this change, all possibilities of the old commands are
covered, which gives us the chance to flag them as obsolete.  (This is
good, because having VACUUM FULL do something completely different from
regular VACUUM confuses users all the time; and on the other hand,
having a command called CLUSTER which is at odds with what most people
think of as a "database cluster" is also confusing.)

Here's a list of existing commands, and how to write them in the current
patch's proposal for REPACK:

-- re-clusters all tables that have a clustered index set
CLUSTER                     -> REPACK USING INDEX

-- clusters the given table using the given index
CLUSTER tab USING idx       -> REPACK tab USING INDEX idx

-- clusters this table using a clustered index; error if no index clustered
CLUSTER tab                 -> REPACK tab USING INDEX

-- vacuum-full all tables
VACUUM FULL                 -> REPACK

-- vacuum-full the specified table
VACUUM FULL tab             -> REPACK tab


My other change to Antonin's patch is that I made REPACK USING INDEX set
the 'indisclustered' flag to the index being used, so REPACK behaves
identically to CLUSTER.  We can discuss whether we really want this.
For instance we could add an option so that by default REPACK omits
persisting the clustered index, and instead it only does that when you
give it some special option, say something like
  "REPACK (persist_clustered_index=true) tab USING INDEX idx"
Overall I'm not sure this is terribly interesting, since clustered
indexes are not very useful for most users anyway.

I made a few other minor changes not worthy of individual mention, and
there are a few others pending, such as updates to the
pg_stat_progress_repack view infrastructure, as well as phasing out
pg_stat_progress_cluster (maybe the latter would offer a subset of the
former; not yet sure about this.)  Also, I'd like to work on adding a
`repackdb` command for completeness.

On repackdb: I think is going to be very similar to vacuumdb, mostly in
that it is going to need to be able to run tasks in parallel; but there
are things it doesn't have to deal with, such as analyze-in-stages,
which I think is a large burden.  I estimate about 1k LOC there,
extremely similar to vacuumdb.  Maybe it makes sense to share the source
code and make the new executable a symlink instead, with some additional
code to support the two different modes.  Again, I'm not sure about
this -- I like the idea, but I'd have to see the implementation.

I'll be rebasing the rest of Antonin's patch series afterwards,
including the logical decoding changes necessary for CONCURRENTLY.  In
the meantime, if people want to review those, which would be very
valuable, they can go back to branch master from around the time he
submitted it and apply the old patches there.

[1] https://postgr.es/m/76278.1724760050@antos
[2] https://postgr.es/m/152010.1751307725@localhost

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Вложения

v1-0001-Add-REPACK-command.patch

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

27 июля, 04:59:10

On Sat, Jul 26, 2025 at 5:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Hello,
>
> Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
> This is coming from [1].  The ultimate goal is to have an in-core tool
> to allow concurrent table rewrite to get rid of bloat; right now, VACUUM
> FULL does that, but it's not concurrent.  Users have resorted to using
> the pg_repack third-party tool, which is ancient and uses a weird
> internal implementation, as well as pg_squeeze, which uses logical
> decoding to capture changes that occur during the table rewrite.  The
> patch submitted here, largely by Antonin Houska with some changes by me,
> is based on the the pg_squeeze code which he authored, and first
> introduces a new command called REPACK to absorb both VACUUM FULL and
> CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
> of REPACK to operate online using logical decoding.
>
> Essentially, this first patch just reshuffles the CLUSTER code to create
> the REPACK command.
>

Thanks for keeping this ball rolling.

>
> My other change to Antonin's patch is that I made REPACK USING INDEX set
> the 'indisclustered' flag to the index being used, so REPACK behaves
> identically to CLUSTER.  We can discuss whether we really want this.
> For instance we could add an option so that by default REPACK omits
> persisting the clustered index, and instead it only does that when you
> give it some special option, say something like
>   "REPACK (persist_clustered_index=true) tab USING INDEX idx"
> Overall I'm not sure this is terribly interesting, since clustered
> indexes are not very useful for most users anyway.
>

I think I would lean towards having it work like CLUSTER (preserve the
index), since that helps people making the transition, and it doesn't
feel terribly useful to invent new syntax for a feature that I would
agree isn't very useful for most people.

> I made a few other minor changes not worthy of individual mention, and
> there are a few others pending, such as updates to the
> pg_stat_progress_repack view infrastructure, as well as phasing out
> pg_stat_progress_cluster (maybe the latter would offer a subset of the
> former; not yet sure about this.)  Also, I'd like to work on adding a
> `repackdb` command for completeness.
>
> On repackdb: I think is going to be very similar to vacuumdb, mostly in
> that it is going to need to be able to run tasks in parallel; but there
> are things it doesn't have to deal with, such as analyze-in-stages,
> which I think is a large burden.  I estimate about 1k LOC there,
> extremely similar to vacuumdb.  Maybe it makes sense to share the source
> code and make the new executable a symlink instead, with some additional
> code to support the two different modes.  Again, I'm not sure about
> this -- I like the idea, but I'd have to see the implementation.
>
> I'll be rebasing the rest of Antonin's patch series afterwards,
> including the logical decoding changes necessary for CONCURRENTLY.  In
> the meantime, if people want to review those, which would be very
> valuable, they can go back to branch master from around the time he
> submitted it and apply the old patches there.
>

For clarity, are you intending to commit this patch before having the
other parts ready? (If that sounds like an objection, it isn't) After
a first pass, I think there's some confusing bits in the new docs that
could use straightening out, but there likely going to overlap changes
once concurrently is brought in, so it might make sense to hold off on
those. Either way I definitely want to dive into this a bit deeper
with some fresh eyes, there's a lot to digest... speaking of, for this
bit in src/backend/commands/cluster.c

+    switch (cmd)
+    {
+        case REPACK_COMMAND_REPACK:
+            return "REPACK";
+        case REPACK_COMMAND_VACUUMFULL:
+            return "VACUUM";
+        case REPACK_COMMAND_CLUSTER:
+            return "VACUUM";
+    }
+    return "???";

The last one should return "CLUSTER" no?


Robert Treat
https://xzilla.net

Re: Adding REPACK [concurrently]

От

Fujii Masao

Дата:

27 июля, 09:00:51

On Sun, Jul 27, 2025 at 6:56 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Hello,
>
> Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it.
> This is coming from [1].  The ultimate goal is to have an in-core tool
> to allow concurrent table rewrite to get rid of bloat;

+1


> right now, VACUUM
> FULL does that, but it's not concurrent.  Users have resorted to using
> the pg_repack third-party tool, which is ancient and uses a weird
> internal implementation, as well as pg_squeeze, which uses logical
> decoding to capture changes that occur during the table rewrite.  The
> patch submitted here, largely by Antonin Houska with some changes by me,
> is based on the the pg_squeeze code which he authored, and first
> introduces a new command called REPACK to absorb both VACUUM FULL and
> CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms
> of REPACK to operate online using logical decoding.

Does this mean REPACK CONCURRENTLY requires wal_level = logical,
while plain REPACK (without CONCURRENTLY) works with any wal_level
setting? If we eventually deprecate VACUUM FULL and CLUSTER,
I think plain REPACK should still be allowed with wal_level = minimal
or replica, so users with those settings can perform equivalent
processing.


+ if (!cluster_is_permitted_for_relation(tableOid, userid,
+    CLUSTER_COMMAND_CLUSTER))

As for the patch you attached, it seems to be an early WIP and
might not be ready for review yet?? BTW, I got the following
compilation failure and probably CLUSTER_COMMAND_CLUSTER
the above should be GetUserId().

-----------------
cluster.c:455:14: error: use of undeclared identifier 'CLUSTER_COMMAND_CLUSTER'
  455 |
                    CLUSTER_COMMAND_CLUSTER))
      |
                    ^
1 error generated.
-----------------

Regards,

--
Fujii Masao

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

31 июля, 19:50:11

On 2025-Jul-26, Robert Treat wrote:

> For clarity, are you intending to commit this patch before having the
> other parts ready? (If that sounds like an objection, it isn't) After
> a first pass, I think there's some confusing bits in the new docs that
> could use straightening out, but there likely going to overlap changes
> once concurrently is brought in, so it might make sense to hold off on
> those.

I'm aiming at getting 0001 committed during the September commitfest,
and the CONCURRENTLY flag addition later in the pg19 cycle.  But I'd
rather have good-enough docs at every step of the way.  They don't have
to be *perfect* if we want to get everything in pg19, but I'd rather not
leave anything openly confusing even transiently.

That said, I did not review the docs this time around, so here's them
the same as they were in the previous post.  But if you want to suggest
changes for the docs in 0001, please do.  Just don't get too carried
away.

> speaking of, for this bit in src/backend/commands/cluster.c
> 
> +    switch (cmd)
> +    {
> +        case REPACK_COMMAND_REPACK:
> +            return "REPACK";
> +        case REPACK_COMMAND_VACUUMFULL:
> +            return "VACUUM";
> +        case REPACK_COMMAND_CLUSTER:
> +            return "VACUUM";
> +    }
> +    return "???";
> 
> The last one should return "CLUSTER" no?

Absolutely -- my blunder.

On 2025-Jul-27, Fujii Masao wrote:

> > The patch submitted here, largely by Antonin Houska with some
> > changes by me, is based on the the pg_squeeze code which he
> > authored, and first introduces a new command called REPACK to absorb
> > both VACUUM FULL and CLUSTER, followed by addition of a CONCURRENTLY
> > flag to allow some forms of REPACK to operate online using logical
> > decoding.
> 
> Does this mean REPACK CONCURRENTLY requires wal_level = logical, while
> plain REPACK (without CONCURRENTLY) works with any wal_level setting?
> If we eventually deprecate VACUUM FULL and CLUSTER, I think plain
> REPACK should still be allowed with wal_level = minimal or replica, so
> users with those settings can perform equivalent processing.

Absolutely.

One of the later patches in the series, which I have not included yet,
intends to implement the idea of transiently enabling wal_level=logical
for the table being repacked concurrently, so that you can still use
the concurrent mode if you have a non-logical-wal_level instance.

> + if (!cluster_is_permitted_for_relation(tableOid, userid,
> +    CLUSTER_COMMAND_CLUSTER))
> 
> As for the patch you attached, it seems to be an early WIP and
> might not be ready for review yet?? BTW, I got the following
> compilation failure and probably CLUSTER_COMMAND_CLUSTER
> the above should be GetUserId().

This was a silly merge mistake, caused by my squashing Antonin's 0004
(trivial code restructuring) into 0001 at the last minute and failing to
"git add" the compile fixes before doing git-format-patch.

Here's v17.  (I decided that calling my previous one "v1" after Antonin
had gone all the way to v15 was stupid on my part.)  The important part
here is that I rebased Antonin 0004's, that is, the addition of the
CONCURRENTLY flag, plus 0005 regression tests.

The only interesting change here is that I decided to not mess with the
grammar by allowing an unparenthesized CONCURRENTLY keyword; if you want
concurrent, you have to say "REPACK (CONCURRENTLY)".  This is at odds
with the way we use the keyword in other commands, but ISTM we don't
_need_ to support that legacy syntax.  Anyway, this is easy to put back
afterwards, if enough people find it not useless.

I've not reviewed 0003 in depth yet, just rebased it.  But it works to
the point that CI is happy with it.

I've not yet included Antonin's 0006 and 0007.

TODO list for 0001:

- addition of src/bin/scripts/repackdb
- clean up the progress report infrastructure
- doc review

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
Thou shalt check the array bounds of all strings (indeed, all arrays), for
surely where thou typest "foo" someone someday shall type
"supercalifragilisticexpialidocious" (5th Commandment for C programmers)

Вложения

Re: Adding REPACK [concurrently]

От

Fujii Masao

Дата:

01 августа, 14:07:05

On Fri, Aug 1, 2025 at 1:50 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> One of the later patches in the series, which I have not included yet,
> intends to implement the idea of transiently enabling wal_level=logical
> for the table being repacked concurrently, so that you can still use
> the concurrent mode if you have a non-logical-wal_level instance.

Sounds good to me!

> Here's v17.

I just tried REPACK command and observed a few things:

When I repeatedly ran REPACK on the regression database
while make installcheck was running, I got the following error:

        ERROR:  StartTransactionCommand: unexpected state STARTED

"REPACK (VERBOSE);" failed with the following error.

        ERROR:  syntax error at or near ";"

REPACK (CONCURRENTLY) USING INDEX failed with the following error,
while the same command without CONCURRENTLY completed successfully:

        =# REPACK (CONCURRENTLY) parallel_vacuum_table using index
regular_sized_index ;
        ERROR:  cannot process relation "parallel_vacuum_table"
        HINT:  Relation "parallel_vacuum_table" has no identity index.

When I ran REPACK (CONCURRENTLY) on a table that's also a logical
replication target, I saw the following log messages. Is this expected?

        =# REPACK (CONCURRENTLY) t;
        LOG:  logical decoding found consistent point at 1/00021F20
        DETAIL:  There are no running transactions.
        STATEMENT:  REPACK (CONCURRENTLY) t;

Regards,

--
Fujii Masao

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

05 августа, 02:21:00

Hello Álvaro,

Should we skip non-ready indexes in build_new_indexes?

Also, in the current implementation, concurrent mode is marked as
non-MVCC safe. From my point of view, this is a significant limitation
for practical use.
Should we consider an option to exchange non-MVCC issues to short
exclusive lock of ProcArrayLock + cancellation of some transactions
with older xmin?

Best regards,
Mikhail

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

05 августа, 11:58:44

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> I made a few changes from Antonin's original at [2].  First, I modified
> the grammar to support "REPACK [tab] USING INDEX" without specifying the
> index name.  With this change, all possibilities of the old commands are
> covered,

...

> Here's a list of existing commands, and how to write them in the current
> patch's proposal for REPACK:
> 
> -- re-clusters all tables that have a clustered index set
> CLUSTER                     -> REPACK USING INDEX
> 
> -- clusters the given table using the given index
> CLUSTER tab USING idx       -> REPACK tab USING INDEX idx
> 
> -- clusters this table using a clustered index; error if no index clustered
> CLUSTER tab                 -> REPACK tab USING INDEX
> 
> -- vacuum-full all tables
> VACUUM FULL                 -> REPACK
> 
> -- vacuum-full the specified table
> VACUUM FULL tab             -> REPACK tab
> 

Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
options of VACUUM FULL. I found two items not supported by REPACK (but also
not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
let's mention that in the user documentation of REPACK?

(Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
think these have no effect.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

09 августа, 15:55:00

Hello!

One more thing - I think build_new_indexes and
index_concurrently_create_copy are very close in semantics, so it
might be a good idea to refactor them a bit.

I’m still concerned about MVCC-related issues. For multiple
applications, this is a dealbreaker, because in some cases correctness
is a higher priority than availability.

Possible options:

1) Terminate connections with old snapshots.

Add a flag to terminate all connections with snapshots during the
ExclusiveLock period for the swap. From the application’s perspective,
this is not a big deal - it's similar to a primary switch. We would
also need to prevent new snapshots from being taken during the swap
transaction, so a short exclusive lock on ProcArrayLock would also be
required.

2) MVCC-safe two-phase approach (inspired by CREATE INDEX).

- copy the data from T1 to the new table T2.
- apply the log.
- take a table-exclusive lock on T1
- apply the log again.
- instead of swapping, mark the T2 as a kind of shadow table - any
transaction applying changes to T1 must also apply them to T2, while
reads still use T1 as the source of truth.
- commit (and record the transaction ID as XID1).
- at this point, all changes are applied to both tables with the same
XIDs because of the "shadow table" mechanism.
- wait until older snapshots no longer treat XID1 as uncommitted.
- now the tables are identical from the MVCC perspective.
- take an exclusive lock on both T1 and T2.
- perform the swap and drop T1.
- commit.

This is more complex and would require implementing some sort of
"shadow table" mechanism, so it might not be worth the effort. Option
1 feels more appealing to me.

If others think this is a good idea, I might try implementing a proof
of concept.

Best regards,
Mikhail

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

09 августа, 16:33:31

On 2025-Aug-09, Mihail Nikalayeu wrote:

> Hello!
> 
> One more thing - I think build_new_indexes and
> index_concurrently_create_copy are very close in semantics, so it
> might be a good idea to refactor them a bit.
> 
> I’m still concerned about MVCC-related issues. For multiple
> applications, this is a dealbreaker, because in some cases correctness
> is a higher priority than availability.

Please note that Antonin already implemented this.  See his patches
here:
https://www.postgresql.org/message-id/77690.1725610115%40antos
I proposed to leave this part out initially, which is why it hasn't been
reposted.  We can review and discuss after the initial patches are in.
Because having an MVCC-safe mode has drawbacks, IMO we should make it
optional.

But you're welcome to review that part specifically if you're so
inclined, and offer feedback on it.  (I suggest to rewind back your
checked-out tree to branch master at the time that patch was posted, for
easy application.  We can deal with a rebase later.)

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

11 августа, 17:22:00

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> One more thing - I think build_new_indexes and
> index_concurrently_create_copy are very close in semantics, so it
> might be a good idea to refactor them a bit.

You're right. I think I even used the latter for reference when writing the
first.

0002 in the attached series tries to fix that. build_new_indexes() (in 0004)
is simpler now.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Вложения

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

15 августа, 15:32:09

Antonin Houska <ah@cybertec.at> wrote:

> Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:
> 
> > One more thing - I think build_new_indexes and
> > index_concurrently_create_copy are very close in semantics, so it
> > might be a good idea to refactor them a bit.
> 
> You're right. I think I even used the latter for reference when writing the
> first.
> 
> 0002 in the attached series tries to fix that. build_new_indexes() (in 0004)
> is simpler now.

This is v18 again. Parts 0001 through 0004 are unchanged, however 0005 is
added. It implements a new client application pg_repackdb. (If I posted 0005
alone its regression tests would not work. I wonder if the cfbot handles the
repeated occurence of the 'v18-' prefix correctly.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Вложения

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

15 августа, 15:48:17

On 2025-Aug-15, Antonin Houska wrote:

> This is v18 again.

Thanks for this!

> Parts 0001 through 0004 are unchanged, however 0005 is added. It
> implements a new client application pg_repackdb. (If I posted 0005
> alone its regression tests would not work. I wonder if the cfbot
> handles the repeated occurence of the 'v18-' prefix correctly.)

Yeah, the cfbot is just going to take the attachments from the latest
email in the thread that has any, and assume they are the whole that
make up the patch.  It wouldn't work to post just v18-0005 and assume
that the bot is going grab patches 0001 through 0004 from a previous
email, if that's what you're thinking.  In short, what you did is
correct and necessary.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

16 августа, 16:41:57

On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > I made a few changes from Antonin's original at [2].  First, I modified
> > the grammar to support "REPACK [tab] USING INDEX" without specifying the
> > index name.  With this change, all possibilities of the old commands are
> > covered,
>
> ...
>
> > Here's a list of existing commands, and how to write them in the current
> > patch's proposal for REPACK:
> >
> > -- re-clusters all tables that have a clustered index set
> > CLUSTER                     -> REPACK USING INDEX
> >
> > -- clusters the given table using the given index
> > CLUSTER tab USING idx       -> REPACK tab USING INDEX idx
> >
> > -- clusters this table using a clustered index; error if no index clustered
> > CLUSTER tab                 -> REPACK tab USING INDEX
> >

In the v18 patch, the docs say that repack doesn't remember the index,
but it seems we are still calling mark_index_clustered, so I think the
above is true but we need to update the docs(?).

> > -- vacuum-full all tables
> > VACUUM FULL                 -> REPACK
> >
> > -- vacuum-full the specified table
> > VACUUM FULL tab             -> REPACK tab
> >
>
> Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
> options of VACUUM FULL. I found two items not supported by REPACK (but also
> not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
> let's mention that in the user documentation of REPACK?
>

I would note that both pg_repack and pg_squeeze analyze by default,
and running "vacuum full analyze" is the recommended behavior, so not
having analyze included is a step backwards.

> (Besides that, VACUUM FULL accepts TRUNCATE and INDEX_CLEANUP options, but I
> think these have no effect.)
>

Yeah, these seem safe to ignore.

Robert Treat
https://xzilla.net

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

19 августа, 15:22:47

On 2025-Aug-16, Robert Treat wrote:

> On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:

> > Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
> > options of VACUUM FULL. I found two items not supported by REPACK (but also
> > not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
> > let's mention that in the user documentation of REPACK?
> 
> I would note that both pg_repack and pg_squeeze analyze by default,
> and running "vacuum full analyze" is the recommended behavior, so not
> having analyze included is a step backwards.

Make sense to add ANALYZE as an option to repack, yeah.

So if I repack a single table with
  REPACK (ANALYZE) table USING INDEX;

then do you expect that this would first cluster the table under
AccessExclusiveLock, then release the lock to do the analyze step, or
would the analyze be done under the same lock?  This is significant for
a query that starts while repack is running, because if we release the
AEL then the query is planned when there are no stats for the table,
which might be bad.

I think the time to run the analyze step should be considerable shorter
than the time to run the repacking step, so running both together under
the same lock should be okay.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Computing is too important to be left to men." (Karen Spärck Jones)

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

19 августа, 15:23:32

On 2025-Aug-16, Robert Treat wrote:

> In the v18 patch, the docs say that repack doesn't remember the index,
> but it seems we are still calling mark_index_clustered, so I think the
> above is true but we need to update the docs(?).

Yes, the docs are obsolete on this point, I'm in the process of updating
them.  Thanks for pointing this out.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"

Re: Adding REPACK [concurrently]

От

Álvaro Herrera

Дата:

19 августа, 21:53:32

Hello,

Here's a second cut of the initial REPACK work.  Antonin added an
implementation of pg_repackdb, and there's also a couple of bug fixes
that were reported in the thread.  I also added support for the ANALYZE
option as noted by Robert Treat, though it only works if you specify a
single non-partitioned table.  Adding for the multi-table case is likely
easy, but I didn't try.

I purposefully do not include the CONCURRENTLY work yet -- I want to get
this part commitable-clean first, then we can continue work on the
logical decoding work on top of that.


Note choice of shell command name: though all the other programs in
src/bin/scripts do not use the "pg_" prefix, this one does; we thought
it made no sense to follow the old programs as precedent because there
seems to be a lament for the lack of pg_ prefix in those, and we only
keep what they are because of their long history.  This one has no
history.

Still on pg_repackdb, the implementation here is to install a symlink
called pg_repackdb which points to vacuumdb, and make the program behave
differently when called in this way.  The amount of additional code for
this is relatively small, so I think this is a worthy technique --
assuming it works.  If it doesn't, Antonin proposed a separate binary
that just calls some functions from vacuumdb.  Or maybe we could have a
common source file that both utilities call.


I edited the docs a bit, limiting the exposure of CLUSTER and VACUUM
FULL, and instead redirecting the user to the REPACK docs.  In the
REPACK docs I modified things for additional clarity.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Selbst das größte Genie würde nicht weit kommen, wenn es
alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe)
               Ni aún el genio más grande llegaría muy lejos si
                    quisiera sacarlo todo de su propio interior.

Вложения

v2-0001-Add-REPACK-command.patch

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

20 августа, 11:33:49

Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> On 2025-Aug-16, Robert Treat wrote:
>
> > On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska <ah@cybertec.at> wrote:
>
> > > Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked the
> > > options of VACUUM FULL. I found two items not supported by REPACK (but also
> > > not supported by by CLUSTER): ANALYZE and SKIP_DATABASE_STATS. Maybe just
> > > let's mention that in the user documentation of REPACK?
> >
> > I would note that both pg_repack and pg_squeeze analyze by default,
> > and running "vacuum full analyze" is the recommended behavior, so not
> > having analyze included is a step backwards.
>
> Make sense to add ANALYZE as an option to repack, yeah.
>
> So if I repack a single table with
>   REPACK (ANALYZE) table USING INDEX;
>
> then do you expect that this would first cluster the table under
> AccessExclusiveLock, then release the lock to do the analyze step, or
> would the analyze be done under the same lock?  This is significant for
> a query that starts while repack is running, because if we release the
> AEL then the query is planned when there are no stats for the table,
> which might be bad.
>
> I think the time to run the analyze step should be considerable shorter
> than the time to run the repacking step, so running both together under
> the same lock should be okay.

AFAICS, VACUUM FULL first releases the AEL, then it analyzes the table. If
users did not complain so far, I'd assume that vacuum_rel() (effectively
cluster_rel() in the FULL case) does not change the stats that much.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

20 августа, 11:53:13

Álvaro Herrera <alvherre@kurilemu.de> wrote:

> Still on pg_repackdb, the implementation here is to install a symlink
> called pg_repackdb which points to vacuumdb, and make the program behave
> differently when called in this way.  The amount of additional code for
> this is relatively small, so I think this is a worthy technique --
> assuming it works.  If it doesn't, Antonin proposed a separate binary
> that just calls some functions from vacuumdb.  Or maybe we could have a
> common source file that both utilities call.

There's an issue with the symlink, maybe some meson expert can help. In
particular, the CI on Windows ends up with the following error:

ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

(The reason it does not happen on other platforms might be that the build is
slower on Windows, and thus it's more prone to some specific race conditions.)

It appears that the 'point_to' argument of the 'install_symlink()' function
[1] is only a string rather than a "real target" [2]. That's likely the reason
the function does not wait for the creation of the 'vacuumdb' executable.

I could not find another symlink of this kind in the tree. (AFAICS, the
postmaster->postgres symlink had been removed before Meson has been
introduced.)

Does anyone happen to have a clue? Thanks.

[1] https://mesonbuild.com/Reference-manual_functions.html#install_symlink
[2] https://mesonbuild.com/Reference-manual_returned_tgt.html

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Álvaro Herrera

Дата:

20 августа, 15:07:37

On 2025-Aug-20, Antonin Houska wrote:

> There's an issue with the symlink, maybe some meson expert can help. In
> particular, the CI on Windows ends up with the following error:
> 
> ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb

Hmm, that's not the problem I see in the CI run from the commitfest app:

https://cirrus-ci.com/task/5608274336153600

[19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj 
[19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include"
"-I..\src\include\port\win32""-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq"
"/MDd""/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__"
"/D__WIN32__""/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101"
"/wd4102""/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb"
/Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj"/c" ../src/bin/scripts/vacuumdb.c
 
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
[19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0

The real problem here seems to be the empty long_options_repack array.
I removed it and started a new run to see what happens.  Running now:
https://cirrus-ci.com/build/4961902171783168

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

20 августа, 17:22:41

Álvaro Herrera <alvherre@kurilemu.de> wrote:

> On 2025-Aug-20, Antonin Houska wrote:
>
> > There's an issue with the symlink, maybe some meson expert can help. In
> > particular, the CI on Windows ends up with the following error:
> >
> > ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
>
> Hmm, that's not the problem I see in the CI run from the commitfest app:
>
> https://cirrus-ci.com/task/5608274336153600

I was referring to the other build that you shared off-list (probably
independent from cfbot):

https://cirrus-ci.com/build/4726227505774592

> [19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
> [19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include"
"-I..\src\include\port\win32""-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq" "-I..\src\interfaces\libpq"
"/MDd""/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__"
"/D__WIN32__""/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101"
"/wd4102""/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb"
/Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj"/c" ../src/bin/scripts/vacuumdb.c 
> [19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
> [19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0
>
> The real problem here seems to be the empty long_options_repack array.
> I removed it and started a new run to see what happens.  Running now:
> https://cirrus-ci.com/build/4961902171783168

The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
the code compiled well. The compilation failure mentioned above comes from
"Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
that the symlink issue will occur there once the compilation is fixed.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Andres Freund

Дата:

20 августа, 19:11:39

Hi,

On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:
> Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> > On 2025-Aug-20, Antonin Houska wrote:
> >
> > > There's an issue with the symlink, maybe some meson expert can help. In
> > > particular, the CI on Windows ends up with the following error:
> > >
> > > ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
> >
> > Hmm, that's not the problem I see in the CI run from the commitfest app:
> >
> > https://cirrus-ci.com/task/5608274336153600
>
> I was referring to the other build that you shared off-list (probably
> independent from cfbot):
>
> https://cirrus-ci.com/build/4726227505774592
>
> > [19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
> > [19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include"
"-Ic:\openssl\1.1\include""-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq"
"-I..\src\interfaces\libpq""/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32"
"/DWINDOWS""/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244"
"/wd4273""/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb"
/Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj"/c" ../src/bin/scripts/vacuumdb.c
 
> > [19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
> > [19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0
> >
> > The real problem here seems to be the empty long_options_repack array.
> > I removed it and started a new run to see what happens.  Running now:
> > https://cirrus-ci.com/build/4961902171783168
>
> The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
> the code compiled well. The compilation failure mentioned above comes from
> "Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
> that the symlink issue will occur there once the compilation is fixed.

FWIW, I don't think it's particularly wise to rely on symlinks on windows -
IIRC they will often not be enabled outside of development environments.

Greetings,

Andres Freund

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

21 августа, 02:44:00

Hello everyone!

Alvaro Herrera <alvherre@alvh.no-ip.org>:
> Please note that Antonin already implemented this.  See his patches
> here:
> https://www.postgresql.org/message-id/77690.1725610115%40antos
> I proposed to leave this part out initially, which is why it hasn't been
> reposted.  We can review and discuss after the initial patches are in.

I think it is worth pushing it at least in the same release cycle.

> But you're welcome to review that part specifically if you're so
> inclined, and offer feedback on it.  (I suggest to rewind back your
> checked-out tree to branch master at the time that patch was posted, for
> easy application.  We can deal with a rebase later.)

I have rebased that on top of v18 (attached).

Also, I think I found an issue (or lost something during rebase): we
must preserve xmin,cmin during initial copy
to make sure that data is going to be visible by snapshots of
concurrent changes later:

static void
reform_and_rewrite_tuple(......)
.....
      /*It is also crucial to stamp the new record with the exact same
xid and cid,
      * because the tuple must be visible to the snapshot of the
applied concurrent
      * change later.
      */
      CommandId      cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
      TransactionId   xid = HeapTupleHeaderGetXmin(tuple->t_data);

      heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);

I'll try to polish that part a little bit.

> Because having an MVCC-safe mode has drawbacks, IMO we should make it
> optional.
Do you mean some option for the command? Like REPACK (CONCURRENTLY, SAFE)?

Best regards,
Mikhail.

Вложения

v18-0006-Preserve-visibility-information-of-the-concurren.patch

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

21 августа, 21:07:04

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Also, I think I found an issue (or lost something during rebase): we
> must preserve xmin,cmin during initial copy
> to make sure that data is going to be visible by snapshots of
> concurrent changes later:
>
> static void
> reform_and_rewrite_tuple(......)
> .....
>       /*It is also crucial to stamp the new record with the exact same
> xid and cid,
>       * because the tuple must be visible to the snapshot of the
> applied concurrent
>       * change later.
>       */
>       CommandId      cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
>       TransactionId   xid = HeapTupleHeaderGetXmin(tuple->t_data);
>
>       heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);

When posting version 12 of the patch [1] I raised a concern that the the MVCC
safety is too expensive when it comes to logical decoding. Therefore, I
abandoned the concept for now, and v13 [2] uses plain heap_insert(). Once we
implement the MVCC safety, we simply rewrite the tuple like v12 did - that's
the simplest way to preserve fields like xmin, cmin, ...

[1] https://www.postgresql.org/message-id/178741.1743514291%40localhost
[2] https://www.postgresql.org/message-id/97795.1744363522%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

21 августа, 21:14:14

Andres Freund <andres@anarazel.de> wrote:

> Hi,
>
> On 2025-08-20 16:22:41 +0200, Antonin Houska wrote:
> > Álvaro Herrera <alvherre@kurilemu.de> wrote:
> >
> > > On 2025-Aug-20, Antonin Houska wrote:
> > >
> > > > There's an issue with the symlink, maybe some meson expert can help. In
> > > > particular, the CI on Windows ends up with the following error:
> > > >
> > > > ERROR: Tried to install symlink to missing file C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb
> > >
> > > Hmm, that's not the problem I see in the CI run from the commitfest app:
> > >
> > > https://cirrus-ci.com/task/5608274336153600
> >
> > I was referring to the other build that you shared off-list (probably
> > independent from cfbot):
> >
> > https://cirrus-ci.com/build/4726227505774592
> >
> > > [19:11:00.642] FAILED: [code=2] src/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj
> > > [19:11:00.642] "cl" "-Isrc\bin\scripts\vacuumdb.exe.p" "-Isrc\include" "-I..\src\include"
"-Ic:\openssl\1.1\include""-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "-Isrc/interfaces/libpq"
"-I..\src\interfaces\libpq""/MDd" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/Zc:preprocessor" "/DWIN32"
"/DWINDOWS""/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244"
"/wd4273""/wd4101" "/wd4102" "/wd4090" "/wd4267" "/Fdsrc\bin\scripts\vacuumdb.exe.p\vacuumdb.c.pdb"
/Fosrc/bin/scripts/vacuumdb.exe.p/vacuumdb.c.obj"/c" ../src/bin/scripts/vacuumdb.c 
> > > [19:11:00.642] ../src/bin/scripts/vacuumdb.c(186): error C2059: syntax error: '}'
> > > [19:11:00.642] ../src/bin/scripts/vacuumdb.c(197): warning C4034: sizeof returns 0
> > >
> > > The real problem here seems to be the empty long_options_repack array.
> > > I removed it and started a new run to see what happens.  Running now:
> > > https://cirrus-ci.com/build/4961902171783168
> >
> > The symlink issue occurred at "Windows - Server 2019, MinGW64 - Meson", where
> > the code compiled well. The compilation failure mentioned above comes from
> > "Windows - Server 2019, VS 2019 - Meson & ninja". I think it's still possible
> > that the symlink issue will occur there once the compilation is fixed.
>
> FWIW, I don't think it's particularly wise to rely on symlinks on windows -
> IIRC they will often not be enabled outside of development environments.

ok, installing a copy of the same executable with a different name seems more
reliable. At least that's how the postmaster->postgres link used to be
handled, if I read Makefile correctly. Thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Andres Freund

Дата:

21 августа, 21:16:28

Hi,

On 2025-08-21 20:14:14 +0200, Antonin Houska wrote:
> ok, installing a copy of the same executable with a different name seems more
> reliable. At least that's how the postmaster->postgres link used to be
> handled, if I read Makefile correctly. Thanks.

I have not followed this thread, but I don't think the whole thing of having a
single executable with multiple names is worth doing. Just make whatever an
option, instead of having multiple "executables".

Greetings,

Andres

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

22 августа, 01:06:13

On Tue, Aug 19, 2025 at 2:53 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> Note choice of shell command name: though all the other programs in
> src/bin/scripts do not use the "pg_" prefix, this one does; we thought
> it made no sense to follow the old programs as precedent because there
> seems to be a lament for the lack of pg_ prefix in those, and we only
> keep what they are because of their long history.  This one has no
> history.
>
> Still on pg_repackdb, the implementation here is to install a symlink
> called pg_repackdb which points to vacuumdb, and make the program behave
> differently when called in this way.  The amount of additional code for
> this is relatively small, so I think this is a worthy technique --
> assuming it works.  If it doesn't, Antonin proposed a separate binary
> that just calls some functions from vacuumdb.  Or maybe we could have a
> common source file that both utilities call.
>

What's the plan for clusterdb? It seems like we'd ideally create a
stand alone pg_repackdb which replaces clusterdb and also allows us to
remove the FULL options from vacuumdb.


Robert Treat
https://xzilla.net

Re: Adding REPACK [concurrently]

От

Álvaro Herrera

Дата:

22 августа, 12:40:40

On 2025-Aug-21, Robert Treat wrote:

> What's the plan for clusterdb? It seems like we'd ideally create a
> stand alone pg_repackdb which replaces clusterdb and also allows us to
> remove the FULL options from vacuumdb.

I don't think we should remove clusterdb, to avoid breaking any scripts
that work today.  As you say, I would create the standalone pg_repackdb
to do what we need it to do (namely: run the REPACK commands) and leave
vacuumdb and clusterdb alone.  Removing the obsolete commands and
options can be done in a few years.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Adding REPACK [concurrently]

От

"Euler Taveira"

Дата:

22 августа, 23:32:34

On Fri, Aug 22, 2025, at 6:40 AM, Álvaro Herrera wrote:
> On 2025-Aug-21, Robert Treat wrote:
>
>> What's the plan for clusterdb? It seems like we'd ideally create a
>> stand alone pg_repackdb which replaces clusterdb and also allows us to
>> remove the FULL options from vacuumdb.
>
> I don't think we should remove clusterdb, to avoid breaking any scripts
> that work today.  As you say, I would create the standalone pg_repackdb
> to do what we need it to do (namely: run the REPACK commands) and leave
> vacuumdb and clusterdb alone.  Removing the obsolete commands and
> options can be done in a few years.
>

I would say that we need to plan the removal of these binaries (clusterdb and
vacuumdb). We can start with a warning into clusterdb saying they should use
pg_repackdb. In a few years, we can remove clusterdb. There were complaints
about binary names without a pg_ prefix in the past [1].

I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
to pg_repackdb. We can add a similar warning message saying they should use
pg_repackdb if the symlink is used.

[1] https://www.postgresql.org/message-id/CAJgfmqXYYKXR%2BQUhEa3cq6pc8dV0Hu7QvOUccm7R0TkC%3DT-%2B%3DA%40mail.gmail.com

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Adding REPACK [concurrently]

От

Michael Banck

Дата:

23 августа, 09:56:12

Hi,

On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:
> On Fri, Aug 22, 2025, at 6:40 AM, Álvaro Herrera wrote:
> > On 2025-Aug-21, Robert Treat wrote:
> >> What's the plan for clusterdb? It seems like we'd ideally create a
> >> stand alone pg_repackdb which replaces clusterdb and also allows us to
> >> remove the FULL options from vacuumdb.
> >
> > I don't think we should remove clusterdb, to avoid breaking any scripts
> > that work today.  As you say, I would create the standalone pg_repackdb
> > to do what we need it to do (namely: run the REPACK commands) and leave
> > vacuumdb and clusterdb alone.  Removing the obsolete commands and
> > options can be done in a few years.
> 
> I would say that we need to plan the removal of these binaries (clusterdb and
> vacuumdb). We can start with a warning into clusterdb saying they should use
> pg_repackdb. In a few years, we can remove clusterdb. There were complaints
> about binary names without a pg_ prefix in the past [1].

Yeah.
 
> I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
> to pg_repackdb. We can add a similar warning message saying they should use
> pg_repackdb if the symlink is used.

Unless pg_repack has the same (or a superset of) CLI and behaviour as
vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
with a symlink to pg_repack will lead to much more breakage in existing
scripts/automation than clusterdb, which I guess is used orders of
magnitude less frequently than vacumdb.


Michael

Re: Adding REPACK [concurrently]

От

Álvaro Herrera

Дата:

23 августа, 17:22:11

On 2025-08-23, Michael Banck wrote:

> On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:

>> I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
>> to pg_repackdb. We can add a similar warning message saying they should use
>> pg_repackdb if the symlink is used.
>
> Unless pg_repack has the same (or a superset of) CLI and behaviour as
> vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
> with a symlink to pg_repack will lead to much more breakage in existing
> scripts/automation than clusterdb, which I guess is used orders of
> magnitude less frequently than vacumdb.

Yeah, I completely disagree with the idea of getting rid of vacuumdb. We can, maybe, in a distant future, get rid of
the--full option to vacuumdb.  But the rest of the vacuumdb behavior must stay, I think, because REPACK is not VACUUM —
itis only VACUUM FULL. And we want to make that distinction very clear. 

We can also, in a few years, get rid of clusterdb.  But I don't think we need to deprecate it just yet.

--
Álvaro Herrera

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

24 августа, 19:52:00

Hello, Antonin!

> When posting version 12 of the patch [1] I raised a concern that the the MVCC
> safety is too expensive when it comes to logical decoding. Therefore, I
> abandoned the concept for now, and v13 [2] uses plain heap_insert(). Once we
> implement the MVCC safety, we simply rewrite the tuple like v12 did - that's
> the simplest way to preserve fields like xmin, cmin, ...

Thanks for the explanation.

I was looking into catalog-related logical decoding features, and it
seems like they are clearly overkill for the repack case.
We don't need CID tracking or even a snapshot for each commit if we’re
okay with passing xmin/xmax as arguments.

What do you think about the following approach for replaying:
* use the extracted XID as the value for xmin/xmax.
* use SnapshotSelf to find the tuple for update/delete operations.

SnapshotSelf seems like a good fit here:
* it sees the last "existing" version.
* any XID set as xmin/xmax in the repacked version is already
committed - so each update/insert is effectively "committed" once
written.
* it works with multiple updates of the same tuple within a single
transaction - SnapshotSelf sees the last version.
* all updates are ordered and replayed sequentially - so the last
version is always the one we want.

If I'm not missing anything, this looks like something worth including
in the patch set.
If so, I can try implementing a test version.

Best regards,
Mikhail

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

25 августа, 16:09:18

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> I was looking into catalog-related logical decoding features, and it
> seems like they are clearly overkill for the repack case.
> We don't need CID tracking or even a snapshot for each commit if we’re
> okay with passing xmin/xmax as arguments.

I assume you are concerned with the patch part 0005 of the v12 patch
("Preserve visibility information of the concurrent data changes."), aren't
you?

> What do you think about the following approach for replaying:
> * use the extracted XID as the value for xmin/xmax.
> * use SnapshotSelf to find the tuple for update/delete operations.
>
> SnapshotSelf seems like a good fit here:
> * it sees the last "existing" version.
> * any XID set as xmin/xmax in the repacked version is already
> committed - so each update/insert is effectively "committed" once
> written.
> * it works with multiple updates of the same tuple within a single
> transaction - SnapshotSelf sees the last version.
> * all updates are ordered and replayed sequentially - so the last
> version is always the one we want.
>
> If I'm not missing anything, this looks like something worth including
> in the patch set.
> If so, I can try implementing a test version.

Not sure I understand in all details, but I don't think SnapshotSelf is the
correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its
'snapshot' argument at all. Instead, it considers the set of running
transactions as it is at the time the function is called.

One particular problem I imagine is replaying an UPDATE to a row that some
later transaction will eventually delete, but the transaction that ran the
UPDATE obviously had to see it. When looking for the old version during the
replay, HeapTupleSatisfiesMVCC() will find the old version as long as we pass
the correct snapshot to it.

However, at the time we're replaying the UPDATE in the new table, the tuple
may have been already deleted from the old table, and the deleting transaction
may already have committed. In such a case, HeapTupleSatisfiesSelf() will
conclude the old version invisible and the we'll fail to replay the UPDATE.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

25 августа, 17:15:16

Hi, Antonin!

> I assume you are concerned with the patch part 0005 of the v12 patch
> ("Preserve visibility information of the concurrent data changes."), aren't
> you?

Yes, of course. I got an idea while trying to find a way to optimize it.

> Not sure I understand in all details, but I don't think SnapshotSelf is the
> correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its
> 'snapshot' argument at all. Instead, it considers the set of running
> transactions as it is at the time the function is called.

Yes, and it is almost the same behavior when a typical MVCC snapshot
encounters a tuple created by its own transaction.

So, how it works in the non MVCC-safe case (current patch behaviour):

1) we have a whole initial table snapshot with all the xmin = repack XID
2) appling transaction sees ALL the self-alive (no xmax) tuples in it
because all tuples created\deleted by transaction itself
3) each update/delete during the replay selects the last existing
tuple version, updates it xmax and inserts a new one
4) so, there is no any real MVCC involved - just find the latest
version and create a new version
5) and it works correctly because all ordering issues were resolved by
locking mechanisms on the original table or by reordering buffer

How it maps to MVCC-safe case (SnapshotSelf):

1) we have a whole initial table snapshot with all xmin copied from
the original table. All such xmin are committed.
2) appling transaction sees ALL the self-alive (no xmax) tuple in it
because its xmin\xmax is committed and SnapshotSelf is happy with it
3) each update/delete during the replay selects the last existing
tuple version, updates it xmax=original xid and inserts a new one
keeping with xmin=orignal xid
4) --//--
5) --//--


> However, at the time we're replaying the UPDATE in the new table, the tuple
> may have been already deleted from the old table, and the deleting transaction
> may already have committed. In such a case, HeapTupleSatisfiesSelf() will
> conclude the old version invisible and the we'll fail to replay the UPDATE.

No, it will see it - because its xmax will be empty in the repacked
version of the table.

From your question I feel you understood the concept - but feel free
to ask for an additional explanation/scheme.

Best regards,
Mikhail.

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

25 августа, 18:42:15

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> > Not sure I understand in all details, but I don't think SnapshotSelf is the
> > correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its
> > 'snapshot' argument at all. Instead, it considers the set of running
> > transactions as it is at the time the function is called.
>
> Yes, and it is almost the same behavior when a typical MVCC snapshot
> encounters a tuple created by its own transaction.
>
> So, how it works in the non MVCC-safe case (current patch behaviour):
>
> 1) we have a whole initial table snapshot with all the xmin = repack XID
> 2) appling transaction sees ALL the self-alive (no xmax) tuples in it
> because all tuples created\deleted by transaction itself
> 3) each update/delete during the replay selects the last existing
> tuple version, updates it xmax and inserts a new one
> 4) so, there is no any real MVCC involved - just find the latest
> version and create a new version
> 5) and it works correctly because all ordering issues were resolved by
> locking mechanisms on the original table or by reordering buffer

ok

> How it maps to MVCC-safe case (SnapshotSelf):
>
> 1) we have a whole initial table snapshot with all xmin copied from
> the original table. All such xmin are committed.
> 2) appling transaction sees ALL the self-alive (no xmax) tuple in it
> because its xmin\xmax is committed and SnapshotSelf is happy with it

How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a
snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ?

> 3) each update/delete during the replay selects the last existing
> tuple version, updates it xmax=original xid and inserts a new one
> keeping with xmin=orignal xid
> 4) --//--
> 5) --//--

> > However, at the time we're replaying the UPDATE in the new table, the tuple
> > may have been already deleted from the old table, and the deleting transaction
> > may already have committed. In such a case, HeapTupleSatisfiesSelf() will
> > conclude the old version invisible and the we'll fail to replay the UPDATE.
>
> No, it will see it - because its xmax will be empty in the repacked
> version of the table.

You're right, it'll be empty in the new table.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

25 августа, 19:03:03

On Sat, Aug 23, 2025 at 10:23 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> On 2025-08-23, Michael Banck wrote:
> > On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote:
>
> >> I don't think we need to keep vacuumdb. Packagers can keep a symlink (vacuumdb)
> >> to pg_repackdb. We can add a similar warning message saying they should use
> >> pg_repackdb if the symlink is used.
> >
> > Unless pg_repack has the same (or a superset of) CLI and behaviour as
> > vacuumdb (I haven't checked, but doubt it?), I think replacing vacuumdb
> > with a symlink to pg_repack will lead to much more breakage in existing
> > scripts/automation than clusterdb, which I guess is used orders of
> > magnitude less frequently than vacumdb.
>
> Yeah, I completely disagree with the idea of getting rid of vacuumdb. We can, maybe, in a distant future, get rid of
the--full option to vacuumdb.  But the rest of the vacuumdb behavior must stay, I think, because REPACK is not VACUUM —
itis only VACUUM FULL. And we want to make that distinction very clear. 
>

Or to put it the other way, VACUUM FULL is not really VACUUM either,
it is really a form of "repack".

> We can also, in a few years, get rid of clusterdb.  But I don't think we need to deprecate it just yet.
>

Yeah, ISTM the long term goal should be two binaries, one of which
manages aspects of clustering/repacking type of activities, and one
which manages vacuum type activities. I don't think that's different
that what Alvaro is proposing, FWIW my original question was about
confirming that was the end goal, but also trying to understand the
coordination of when these changes would take place, because the
changes to the code, changes to the SQL commands and their docs, and
changes to the command line tools, seem to be working at different
cadences. Which can be fine if it's on purpose, but maybe needs to be
tightened up if not; for example, the current patchset doesn't make
any changes to clusterdb, which one might expect to emit a warning
about being deprecated in favor of pg_repackdb, if not just a complete
punting to use pg_repackdb instead.

Robert Treat
https://xzilla.net

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

25 августа, 19:23:09

Hi, Antonin

> How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a
> snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ?

Yes, TransactionIdDidCommit. Another option is just invent a new
snapshot type - SnapshotBelieveEverythingCommitted - for that
particular case it should work - because all xmin/xmax written into
the new table are committed by design.

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

25 августа, 19:36:10

On Mon, Aug 25, 2025 at 10:15 AM Mihail Nikalayeu
<mihailnikalayeu@gmail.com> wrote:
> 1) we have a whole initial table snapshot with all xmin copied from
> the original table. All such xmin are committed.
> 2) appling transaction sees ALL the self-alive (no xmax) tuple in it
> because its xmin\xmax is committed and SnapshotSelf is happy with it
> 3) each update/delete during the replay selects the last existing
> tuple version, updates it xmax=original xid and inserts a new one
> keeping with xmin=orignal xid
> 4) --//--
> 5) --//--
>

Advancing the tables min xid to at least repack XID is a pretty big
feature, but the above scenario sounds like it would result in any
non-modified pre-existing tuples ending up with their original xmin
rather than repack XID, which seems like it could lead to weird
side-effects. Maybe I am mis-thinking it though?

Robert Treat
https://xzilla.net

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

25 августа, 19:54:12

Robert Treat <rob@xzilla.net> wrote:

> On Mon, Aug 25, 2025 at 10:15 AM Mihail Nikalayeu
> <mihailnikalayeu@gmail.com> wrote:
> > 1) we have a whole initial table snapshot with all xmin copied from
> > the original table. All such xmin are committed.
> > 2) appling transaction sees ALL the self-alive (no xmax) tuple in it
> > because its xmin\xmax is committed and SnapshotSelf is happy with it
> > 3) each update/delete during the replay selects the last existing
> > tuple version, updates it xmax=original xid and inserts a new one
> > keeping with xmin=orignal xid
> > 4) --//--
> > 5) --//--
> >
>
> Advancing the tables min xid to at least repack XID is a pretty big
> feature, but the above scenario sounds like it would result in any
> non-modified pre-existing tuples ending up with their original xmin
> rather than repack XID, which seems like it could lead to weird
> side-effects. Maybe I am mis-thinking it though?

What we discuss here is how to keep visibility information of tuples (xmin,
xmax, ...) unchanged. Both CLUSTER and VACUUM FULL already do that. However
it's not trivial to ensure that REPACK with the CONCURRENTLY option does as
well.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

25 августа, 20:22:14

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Hi, Antonin
>
> > How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a
> > snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ?
>
> Yes, TransactionIdDidCommit.

I think the problem is that HeapTupleSatisfiesSelf() uses
TransactionIdIsInProgress() instead of checking the snapshot:

        ...
        else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
                return false;
        else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
        ...

When decoding (and replaying) data changes, you deal with the database state
as it was (far) in the past. However TransactionIdIsInProgress() is not
suitable for this purpose.

And since CommitTransaction() updates the commit log before removing the
transaction from ProcArray, I can even imagine race conditions: if a
transaction is committed and decoded fast enough, TransactionIdIsInProgress()
might still return true. In such a case, HeapTupleSatisfiesSelf() returns
false instead of calling TransactionIdDidCommit().

> Another option is just invent a new
> snapshot type - SnapshotBelieveEverythingCommitted - for that
> particular case it should work - because all xmin/xmax written into
> the new table are committed by design.

I'd prefer optimization of the logical decoding for REPACK CONCURRENTLY, and
using the MVCC snapshots.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

25 августа, 21:18:17

Antonin Houska <ah@cybertec.at>:
> I think the problem is that HeapTupleSatisfiesSelf() uses
> TransactionIdIsInProgress() instead of checking the snapshot:

Yes, some issues might be possible for SnapshotSelf.
Possible solution is to override TransactionIdIsCurrentTransactionId
to true (like you did with nParallelCurrentXids but just return true).
IIUC, in that case all checks are going to behave the same way as in v5 version.

> I'd prefer optimization of the logical decoding for REPACK CONCURRENTLY, and
> using the MVCC snapshots.

It is also possible, but it is much more complex and feels like overkill to me.
We need just a way to find the latest version of row in the world of
all-committed transactions without any concurrent writers - I am
pretty sure it is possible to achieve in a more simple and effective
way.

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

26 августа, 11:46:00

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Antonin Houska <ah@cybertec.at>:
> > I think the problem is that HeapTupleSatisfiesSelf() uses
> > TransactionIdIsInProgress() instead of checking the snapshot:
>
> Yes, some issues might be possible for SnapshotSelf.
> Possible solution is to override TransactionIdIsCurrentTransactionId
> to true (like you did with nParallelCurrentXids but just return true).
> IIUC, in that case all checks are going to behave the same way as in v5 version.

I assume you mean v12-0005. Yes, that modifies
TransactionIdIsCurrentTransactionId(), so that the the transaction being
replayed recognizes if it (or its subtransaction) performed particular change
itself.

Although it could work, I think it'd be confusing to consider the transactions
being replayed as "current" from the point of view of the backend that
executes REPACK CONCURRENTLY.

But the primary issue is that in v12-0005,
TransactionIdIsCurrentTransactionId() gets the information on "current
transactions" from snapshots - see the calls of SetRepackCurrentXids() before
each scan. It's probably not what you want.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

26 августа, 12:02:01

Antonin Houska <ah@cybertec.at>:

> Although it could work, I think it'd be confusing to consider the transactions
> being replayed as "current" from the point of view of the backend that
> executes REPACK CONCURRENTLY.

Just realized SnapshotDirty is the thing that fits into the role - it
respects not-yet committed transactions, giving enough information to
wait for them.
It is already used in a similar pattern in
check_exclusion_or_unique_constraint and RelationFindReplTupleByIndex.

So, it is easy to detect the case of the race you described previously
and retry + there is no sense to hack around
TransactionIdIsCurrentTransactionId.

BWT, btree + SnapshotDirty has issue [0], but it is a different story
and happens only with concurrent updates which are not present in the
current scope.

[0]:
https://www.postgresql.org/message-id/flat/CADzfLwXGhH_qD6RGqPyEeKdmHgr-HpA-tASYdi5onP%2BRyP5TCw%40mail.gmail.com#77f6426ef2d282198f2d930d5334e3fa

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

26 августа, 16:31:33

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Antonin Houska <ah@cybertec.at>:
>
> > Although it could work, I think it'd be confusing to consider the transactions
> > being replayed as "current" from the point of view of the backend that
> > executes REPACK CONCURRENTLY.
>
> Just realized SnapshotDirty is the thing that fits into the role - it
> respects not-yet committed transactions, giving enough information to
> wait for them.
> It is already used in a similar pattern in
> check_exclusion_or_unique_constraint and RelationFindReplTupleByIndex.
>
> So, it is easy to detect the case of the race you described previously
> and retry + there is no sense to hack around
> TransactionIdIsCurrentTransactionId.

Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is
visible? TransactionIdIsCurrentTransactionId() will not do w/o the
modifications that you proposed earlier [1] and TransactionIdIsInProgress() is
not suitable as I explained in [2].

I understand your idea that there are no transaction aborts in the new table,
which makes things simpler. I cannot judge if it's worth inventing a new kind
of snapshot. Anyway, I think you'd then also need to hack
HeapTupleSatisfiesUpdate(). Isn't that too invasive?

[1] https://www.postgresql.org/message-id/CADzfLwUqyOmpkLmciecBy4aBN1sohQVZ2Hgc6m-tjSUqDRHwyQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/24483.1756142534%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

27 августа, 03:38:00

Hello, Antonin!

Antonin Houska <ah@cybertec.at>:
>
> Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is
> visible? TransactionIdIsCurrentTransactionId() will not do w/o the
> modifications that you proposed earlier [1] and TransactionIdIsInProgress() is
> not suitable as I explained in [2].

HeapTupleSatisfiesDirty is designed to respect even not-yet-committed
transactions and provides additional related information.

    else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
    {
       /*
        * Return the speculative token to caller.  Caller can worry about
        * xmax, since it requires a conclusively locked row version, and
        * a concurrent update to this tuple is a conflict of its
        * purposes.
        */
       if (HeapTupleHeaderIsSpeculative(tuple))
       {
          snapshot->speculativeToken =
             HeapTupleHeaderGetSpeculativeToken(tuple);

          Assert(snapshot->speculativeToken != 0);
       }

       snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
       /* XXX shouldn't we fall through to look at xmax? */
       return true;      /* in insertion by other */
    }

So, it returns true when TransactionIdIsInProgress is true.
However, that alone is not sufficient to trust the result in the common case.

You may check check_exclusion_or_unique_constraint or
RelationFindReplTupleByIndex for that pattern:
if xmin is set in the snapshot (a special hack in SnapshotDirty to
provide additional information from the check), we wait for the
ongoing transaction (or one that is actually committed but not yet
properly reflected in the proc array), and then retry the entire tuple
search.

So, the race condition you explained in [2] will be resolved by a
retry, and the changes to TransactionIdIsInProgress described in [1]
are not necessary.

> I understand your idea that there are no transaction aborts in the new table,
> which makes things simpler. I cannot judge if it's worth inventing a new kind
> of snapshot. Anyway, I think you'd then also need to hack
> HeapTupleSatisfiesUpdate(). Isn't that too invasive?

It seems that HeapTupleSatisfiesUpdate is also fine as it currently
exists (we'll see the committed version after retry)..

The solution appears to be non-invasive:
* uses the existing snapshot type
* follows the existing usage pattern
* leaves TransactionIdIsInProgress and HeapTupleSatisfiesUpdate unchanged

The main change is that xmin/xmax values are forced from the arguments
- but that seems unavoidable in any case.

I'll try to make some kind of prototype this weekend + cover race
condition you mentioned in specs.
Maybe some corner cases will appear.

By the way, there's one more optimization we could apply in both
MVCC-safe and non-MVCC-safe cases: setting the HEAP_XMIN_COMMITTED /
HEAP_XMAX_COMMITTED bit in the new table:
* in the MVCC-safe approach, the transaction is already committed.
* in the non-MVCC-safe case, it isn’t committed yet - but no one will
examine that bit before it commits (though this approach does feel
more fragile).

This could help avoid potential storms of full-page writes caused by
SetHintBit after the table switch.

Best regards,
Mikhail.

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

27 августа, 09:16:07

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Hello, Antonin!
>
> Antonin Houska <ah@cybertec.at>:
> >
> > Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is
> > visible? TransactionIdIsCurrentTransactionId() will not do w/o the
> > modifications that you proposed earlier [1] and TransactionIdIsInProgress() is
> > not suitable as I explained in [2].
>
> HeapTupleSatisfiesDirty is designed to respect even not-yet-committed
> transactions and provides additional related information.
>
>     else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
>     {
>        /*
>         * Return the speculative token to caller.  Caller can worry about
>         * xmax, since it requires a conclusively locked row version, and
>         * a concurrent update to this tuple is a conflict of its
>         * purposes.
>         */
>        if (HeapTupleHeaderIsSpeculative(tuple))
>        {
>           snapshot->speculativeToken =
>              HeapTupleHeaderGetSpeculativeToken(tuple);
>
>           Assert(snapshot->speculativeToken != 0);
>        }
>
>        snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
>        /* XXX shouldn't we fall through to look at xmax? */
>        return true;      /* in insertion by other */
>     }
>
> So, it returns true when TransactionIdIsInProgress is true.
> However, that alone is not sufficient to trust the result in the common case.
>
> You may check check_exclusion_or_unique_constraint or
> RelationFindReplTupleByIndex for that pattern:
> if xmin is set in the snapshot (a special hack in SnapshotDirty to
> provide additional information from the check), we wait for the
> ongoing transaction (or one that is actually committed but not yet
> properly reflected in the proc array), and then retry the entire tuple
> search.
>
> So, the race condition you explained in [2] will be resolved by a
> retry, and the changes to TransactionIdIsInProgress described in [1]
> are not necessary.

I insist that this is a misuse of TransactionIdIsInProgress(). When dealing
with logical decoding, only WAL should tell whether particular transaction is
still running. AFAICS this is how reorderbuffer.c works.

A new kind of snapshot seems like (much) cleaner solution at the moment.

> I'll try to make some kind of prototype this weekend + cover race
> condition you mentioned in specs.
> Maybe some corner cases will appear.

No rush. First, the MVCC safety is not likely to be included in v19
[1]. Second, I think it's good to let others propose their ideas before
writing code.

> By the way, there's one more optimization we could apply in both
> MVCC-safe and non-MVCC-safe cases: setting the HEAP_XMIN_COMMITTED /
> HEAP_XMAX_COMMITTED bit in the new table:
> * in the MVCC-safe approach, the transaction is already committed.
> * in the non-MVCC-safe case, it isn’t committed yet - but no one will
> examine that bit before it commits (though this approach does feel
> more fragile).
>
> This could help avoid potential storms of full-page writes caused by
> SetHintBit after the table switch.

Good idea, thanks.

[1] https://www.postgresql.org/message-id/202504040733.ysuy5gad55md%40alvherre.pgsql

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

27 августа, 11:22:24

Antonin Houska <ah@cybertec.at>:

> I insist that this is a misuse of TransactionIdIsInProgress(). When dealing
> with logical decoding, only WAL should tell whether particular transaction is
> still running. AFAICS this is how reorderbuffer.c works.

Hm... Maybe, but at the same time we already have SnapshotDirty used
in that way and it even deals with the same race....
But I agree - a special kind of snapshot is a more accurate solution.

> A new kind of snapshot seems like (much) cleaner solution at the moment.

Do you mean some kind of snapshot which only uses
TransactionIdDidCommit/Abort ignoring
TransactionIdIsCurrentTransactionId/TransactionIdIsInProgress?
Actually it behaves like SnapshotBelieveEverythingCommitted in that
particular case, but TransactionIdDidCommit/Abort may be used as some
kind of assert/error source to be sure everything is going as
designed.
And, yes, for the new snapshot we need to have
HeapTupleSatisfiesUpdate to be modified.

Also, to deal with that particular race we may just use
XactLockTableWait(xid, NULL, NULL, XLTW_None) before starting
transaction replay.

> No rush. First, the MVCC safety is not likely to be included in v19 [1].

That worries me - it is not the behaviour someone expects from a
database by default. At least the warning should be much more visible
and obvious.
I think most of user will expect the same guarantees as [CREATE|RE]
INDEX CONCURRENTLY provides.

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

27 августа, 13:11:45

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> > A new kind of snapshot seems like (much) cleaner solution at the moment.
> 
> Do you mean some kind of snapshot which only uses
> TransactionIdDidCommit/Abort ignoring
> TransactionIdIsCurrentTransactionId/TransactionIdIsInProgress?
> Actually it behaves like SnapshotBelieveEverythingCommitted in that
> particular case, but TransactionIdDidCommit/Abort may be used as some
> kind of assert/error source to be sure everything is going as
> designed.

Given that there should be no (sub)transaction aborts in the new table, I
think you only need to check that the tuple has valid xmin and invalid xmax.

I think the XID should be in the commit log at the time the transaction is
being replayed, so it should be legal to use TransactionIdDidCommit/Abort in
Assert() statements. (And as long as REPACK CONCURRENTLY will use
ShareUpdateExclusiveLock, which conflicts with VACUUM, pg_class(relfrozenxid)
for given table should not advance during the processing, and therefore the
replayed XIDs should not be truncated from the commit log while REPACK
CONCURRENTLY is running.)

> And, yes, for the new snapshot we need to have
> HeapTupleSatisfiesUpdate to be modified.
> 
> Also, to deal with that particular race we may just use
> XactLockTableWait(xid, NULL, NULL, XLTW_None) before starting
> transaction replay.

Do you mean the race related to TransactionIdIsInProgress()? Not sure I
understand, as you suggested above that you no longer need the function.

> > No rush. First, the MVCC safety is not likely to be included in v19 [1].
> 
> That worries me - it is not the behaviour someone expects from a
> database by default. At least the warning should be much more visible
> and obvious.
> I think most of user will expect the same guarantees as [CREATE|RE]
> INDEX CONCURRENTLY provides.

It does not really worry me. The pg_squeeze extension is not MVCC-safe and I
remember there were only 1 or 2 related complaints throughout its
existence. (pg_repack isn't MVCC-safe as well, but I don't keep track of its
issues.)

Of course, user documentation should warn about the problem, in a way it does
for other commands (typically ALTER TABLE).

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

27 августа, 13:55:12

Antonin Houska <ah@cybertec.at>:

> Do you mean the race related to TransactionIdIsInProgress()? Not sure I
> understand, as you suggested above that you no longer need the function.

The "lightweight" approaches I see so far:
* XactLockTableWait before replay + SnapshotSelf(GetLatestSnapshot?)
* SnapshotDirty + retry logic
* SnapshotBelieveEverythingCommitted + modification of
HeapTupleSatisfiesUpdate (because it called by heap_update and looks
into TransactionIdIsInProgress)

> It does not really worry me. The pg_squeeze extension is not MVCC-safe and I
> remember there were only 1 or 2 related complaints throughout its
> existence. (pg_repack isn't MVCC-safe as well, but I don't keep track of its
> issues.)

But pg_squeeze and pg_repack are extensions. If we are moving that
mechanics into core I'd expect some improvements over pg_squeeze.
MVCC-safety of REINDEX CONCURRENTLY makes it possible to run it on a
regular basis as some kind of background job. It would be nice to have
something like this for the heap.

I agree the initial approach is too invasive, complex and
performance-heavy to push it forward now.
But, any of "lightweight" feels like a good candidate to be shipped
with the feature itself - relatively easy and non-invasive.

Best regards,
Mikhail.

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

29 августа, 00:39:49

On 2025-Aug-21, Mihail Nikalayeu wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org>:

> > I proposed to leave this part out initially, which is why it hasn't been
> > reposted.  We can review and discuss after the initial patches are in.
> 
> I think it is worth pushing it at least in the same release cycle.

If others are motivated enough to certify it, maybe we can consider it.
But I don't think I'm going to have time to get this part reviewed and
committed in time for 19, so you might need another committer.

> > Because having an MVCC-safe mode has drawbacks, IMO we should make it
> > optional.
>
> Do you mean some option for the command? Like REPACK (CONCURRENTLY, SAFE)?

Yes, exactly that.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La grandeza es una experiencia transitoria.  Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

29 августа, 03:32:00

Hello, Álvaro!

> If others are motivated enough to certify it, maybe we can consider it.
> But I don't think I'm going to have time to get this part reviewed and
> committed in time for 19, so you might need another committer.

I don't think it is realistic to involve another committer - it is
just a well-known curse of all non-committers :)

> > > Because having an MVCC-safe mode has drawbacks, IMO we should make it
> > > optional.

As far as I can see, the proposed "lightweight" solutions don't
introduce any drawbacks - unless something has been overlooked.

> > Do you mean some option for the command? Like REPACK (CONCURRENTLY, SAFE)?
> Yes, exactly that.

To be honest that approach feels a little bit strange for me. I work
in the database-consumer (not database-developer) industry and 90% of
DevOps engineers (or similar roles who deal with database maintenance
now) have no clue what MVCC is - and it is industry standard nowadays.

From my perspective - it is better to have a less performant, but
MVCC-safe approach by default, with some "more performance, less
safety" flag for those who truly understand the trade-offs.
All kinds of safety and correctness is probably the main reason to use
classic databases instead of storing data in s3\elastis\mongo\etc
these days.

In case of some incident related to that (in a large well-known
company) the typical takeaway for readers of tech blogs will simply be
"some command in Postgres is broken". And maybe also "the database
with a name starting with 'O' is not affected by that flaw".

Yes, some ALTER table-rewriting commands are not MVCC-safe, but those
typically block everything for hours - so they're avoided unless
absolutely necessary, and usually performed during backend outages to
prevent systems from getting stuck waiting on millions of locks.

"CONCURRENTLY" is something that feels like a "working in background,
don't worry, do not stop your 100k RPS" thing, especially in Postgres
because of CIC.

I also have personal experience debugging incidents caused by
incorrect Postgres behavior - and it’s absolute hell.

Sorry, I made a drama here....

I fully understand resource limitations... Maybe Postgres 19 could
introduce something like REPACK (CONCURRENTLY, UNSAFE) first, and
later add a safer REPACK (CONCURRENTLY)?

Best regards,
Mikhail.

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

29 августа, 10:41:56

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> In case of some incident related to that (in a large well-known
> company) the typical takeaway for readers of tech blogs will simply be
> "some command in Postgres is broken".

For _responsible_ users, the message will rather be that "some tech bloggers
do not bother to read user documentation".

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

30 августа, 20:50:56

Hello,

Here's v19 of this patchset.  This is mostly Antonin's v18.  I added a
preparatory v19-0001 commit, which splits vacuumdb.c to create a new
file, vacuuming.c (and its header file vacuuming.h).  If you look at it
under 'git show --color-moved=zebra' you should notice that most of it
is just code movement; there's hardly any code changes.

v19-0002 has absorbed Antonin's v18-0005 (the pg_repackdb binary)
together with the introduction of the REPACK command proper; but instead
of using a symlink, I just created a separate pg_repackdb.c source file
for it and we compile that small new source file with vacuuming.c to
create a regular binary.  BTW the meson.build changes look somewhat
duplicative; maybe there's a less dumb way to go about this.  (For
instance, maybe just have libscripts.a include vacuuming.o, though it's
not used by any of the other programs in that subdir.)

I'm not wedded to the name "vacuuming.c"; happy to take suggestions.

After 0002, the pg_repackdb utility should be ready to take clusterdb's
place, and also vacuumdb --full, with one gotcha: if you try to use
pg_repackdb with an older server version, it will fail, claiming that
REPACK is not supported.  This is not ideal.  Instead, we should make it
run VACUUM FULL (or CLUSTER); so if you have a fleet including older
servers you can use the new utils there too.

All the logic for vacuumdb to select tables to operate on has been moved
to vacuuming.c verbatim.  This means this logic applies to pg_repackdb
as well.  As long as you stick to repacking a single table this is okay
(read: it won't be used at all), but if you want to use parallel mode
(say to process multiple schemas), we might need to change it.  For the
same reason, I think we should add an option to it (--index[=indexname])
to select whether to use the USING INDEX clause or not, and optionally
indicate which index to use; right now there's no way to select which
logic (cluster's or vacuum full's) to use.


Then v19-0003 through v19-0005 are Antonin's subsequent patches to add
the CONCURRENTLY option; I have not reviewed these at all, so I'm
including them here just for completion.  I also included v18-0006 as
posted by Mihail previously, though I have little faith that we're going
to include it in this release.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

Вложения

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

31 августа, 15:09:29

Apparently I mismerged src/bin/scripts/meson.build.  This v20 is
identical to v19, where that mistake has been corrected.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".

Вложения

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

31 августа, 18:29:00

Hello!

I started an attempt to make a "lightweight" MVCC-safe prototype and
stuck into the "it is not working" issue.
After some debugging I realized Antonin's variant (catalog-mode based)
seems to be broken also...

And after a few more hours I realized non-MVCC is broken as well :)

This is a patch with a test to reproduce the issue related to repack +
concurrent modifications.
Seems like some updates may be lost.

I hope the patch logic is clear - but feel free to ask if not.

Best regards,
Mikhail.

Вложения

v22-0002-Add-stress-tests-for-concurrent-index-builds.patch

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

31 августа, 20:43:17

On 2025-Aug-31, Mihail Nikalayeu wrote:

> I started an attempt to make a "lightweight" MVCC-safe prototype and
> stuck into the "it is not working" issue.
> After some debugging I realized Antonin's variant (catalog-mode based)
> seems to be broken also...
> 
> And after a few more hours I realized non-MVCC is broken as well :)

Ugh.  Well, obviously we need to get this fixed if we want CONCURRENTLY
at all :-)

Please don't post patches that aren't the commitfest item's main patch
as attachment with .patch extension.  This confuses the CFbot into
thinking your patch is the patch-of-record (which it isn't) and reports
that the patch fails CI.  See here:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5117
(For the same reason, it isn't useful to number them as if they were
part of the patch series).

If you want to post secondary patches, please rename them to end in
something like .txt or .nocfbot or whatever.  See here:
https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches?

Thanks for your interest in this topic,

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Adding REPACK [concurrently]

От

Michael Paquier

Дата:

01 сентября, 03:16:42

On Wed, Aug 27, 2025 at 10:22:24AM +0200, Mihail Nikalayeu wrote:
> That worries me - it is not the behaviour someone expects from a
> database by default. At least the warning should be much more visible
> and obvious.
> I think most of user will expect the same guarantees as [CREATE|RE]
> INDEX CONCURRENTLY provides.

Having a unified path for the handling of the waits and the locking
sounds to me like a pretty good argument in favor of a basic
implementation.

In my experience, users do not really care about the time it takes to
complete a operation involving CONCURRENTLY if we allow concurrent
reads and writes in parallel of it.  I have not looked at the proposal
in details, but before trying a more folkloric MVCC approach, relying
on basics that we know have been working for some time seems like a
good and sufficient initial step in terms of handling the waits and
the locks with table AMs (aka heap or something else).

Just my 2c.
--
Michael

Вложения

signature.asc

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

01 сентября, 08:12:11

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Hello!
> 
> I started an attempt to make a "lightweight" MVCC-safe prototype and
> stuck into the "it is not working" issue.
> After some debugging I realized Antonin's variant (catalog-mode based)
> seems to be broken also...
> 
> And after a few more hours I realized non-MVCC is broken as well :)
> 
> This is a patch with a test to reproduce the issue related to repack +
> concurrent modifications.
> Seems like some updates may be lost.
> 
> I hope the patch logic is clear - but feel free to ask if not.

Are you sure the test is complete? I see no occurrence of the REPACK command
in it.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

01 сентября, 12:06:35

Hello!

Antonin Houska <ah@cybertec.at>:
> Are you sure the test is complete? I see no occurrence of the REPACK command
> in it.
Oops, send invalid file. The correct one in attachment.

Вложения

Add_test_for_REPACK_CONCURRENTLY_with_concurrent_modifications.patch_

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

01 сентября, 16:00:39

Hello, Álvaro!

Alvaro Herrera <alvherre@alvh.no-ip.org>:
> If you want to post secondary patches, please rename them to end in
> something like .txt or .nocfbot or whatever.  See here:
> https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches?

Sorry, I missed that.
But now it is possible to send ".patch" without changing the extension [0].

> It also ignores any files that start with "nocfbot".

[0]: https://discord.com/channels/1258108670710124574/1328362897189113867/1412021226528051250

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

01 сентября, 18:30:18

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Antonin Houska <ah@cybertec.at>:
> > Are you sure the test is complete? I see no occurrence of the REPACK command
> > in it.
> Oops, send invalid file. The correct one in attachment.

Thanks!

The problem was that when removing the original "preserve visibility patch"
v12-0005 [1] from the series, I forgot to change the value of
'need_full_snapshot' argument of CreateInitDecodingContext().

v12 and earlier treated the repacked table like system catalog, so it was
o.k. to pass need_full_snapshot=false. However, it must be true now, otherwise
the snapshot created for the initial copy does not see commits of transactions
that do not change regular catalogs.

The fix is as simple as

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index f481a3cec6d..7866ac01278 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -502,6 +502,7 @@ SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
        StringInfo      buf = makeStringInfo();

        Assert(builder->state == SNAPBUILD_CONSISTENT);
+       Assert(builder->building_full_snapshot);

        snap = SnapBuildBuildSnapshot(builder);

I'll apply it to the next version of the "Add CONCURRENTLY option to REPACK
command" patch.

[1] https://www.postgresql.org/message-id/flat/CAFj8pRDK89FtY_yyGw7-MW-zTaHOCY4m6qfLRittdoPocz+dMQ@mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

02 сентября, 13:44:27

Hello!

Antonin Houska <ah@cybertec.at>:
> I'll apply it to the next version of the "Add CONCURRENTLY option to REPACK
> command" patch.
I have added it to the v21 patchset.

Also, I’ve updated the MVCC-safe patch:
* it uses the "XactLockTableWait before replay + SnapshotSelf" approach from [0]
* it includes a TAP test to ensure MVCC safety - not intended to be
committed in its current form (too heavy)
* documentation has been updated.

It's now much simpler and does not negatively impact performance. It
is less aggressive in tuple freezing, but can be updated to match the
non-MVCC-safe version if needed.

While testing MVCC-safe version with stress-tests
007_repack_concurrently_mvcc.pl I encountered some random crashes with
such logs:

25-09-02 12:24:40.039 CEST client backend[261907]
007_repack_concurrently_mvcc.pl ERROR:  relcache reference
0x7715b9f394a8 is not owned by resource owner TopTransaction
2025-09-02 12:24:40.039 CEST client backend[261907]
007_repack_concurrently_mvcc.pl STATEMENT:  REPACK (CONCURRENTLY) tbl1
USING INDEX tbl1_pkey;
TRAP: failed Assert("rel->rd_refcnt > 0"), File:
"../src/backend/utils/cache/relcache.c", Line: 6992, PID: 261907
postgres: CIC_test: nkey postgres [local]
REPACK(ExceptionalCondition+0xbe)[0x5b7ac41d79f9]
postgres: CIC_test: nkey postgres [local] REPACK(+0x852d2e)[0x5b7ac41cbd2e]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aa4a6)[0x5b7ac42234a6]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aad3b)[0x5b7ac4223d3b]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aac69)[0x5b7ac4223c69]
postgres: CIC_test: nkey postgres [local]
REPACK(ResourceOwnerRelease+0x32)[0x5b7ac4223c26]
postgres: CIC_test: nkey postgres [local] REPACK(+0x1f43bf)[0x5b7ac3b6d3bf]
postgres: CIC_test: nkey postgres [local] REPACK(+0x1f4dfa)[0x5b7ac3b6ddfa]
postgres: CIC_test: nkey postgres [local]
REPACK(AbortCurrentTransaction+0xe)[0x5b7ac3b6dd6b]
postgres: CIC_test: nkey postgres [local]
REPACK(PostgresMain+0x57d)[0x5b7ac3fd7238]
postgres: CIC_test: nkey postgres [local] REPACK(+0x654102)[0x5b7ac3fcd102]
postgres: CIC_test: nkey postgres [local]
REPACK(postmaster_child_launch+0x191)[0x5b7ac3eceb7a]
postgres: CIC_test: nkey postgres [local] REPACK(+0x55c8c1)[0x5b7ac3ed58c1]
postgres: CIC_test: nkey postgres [local] REPACK(+0x559d1e)[0x5b7ac3ed2d1e]
postgres: CIC_test: nkey postgres [local]
REPACK(PostmasterMain+0x168a)[0x5b7ac3ed25f8]
postgres: CIC_test: nkey postgres [local] REPACK(main+0x3a1)[0x5b7ac3da2bd6]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7715b962a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7715b962a28b]

This time I was clever and tried to attempt to reproduce the issue on
a non-MVCC safe version at first - and it is reproducible.

Just comment \if :p_t1 != :p_t2 (and its internals, because they
catching non-mvcc behaviour which is expected without 0006 patch); and
set
'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=25000'

It takes about a minute on my PC to get the crash.

[0]:
https://www.postgresql.org/message-id/flat/CADzfLwXCTXNdxK-XGTKmObvT%3D_QnaCviwgrcGtG9chsj5sYzrg%40mail.gmail.com#6d6ccf34e6debcd386d735db593379d8

Best regards,
Mikhail.

Вложения

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

03 сентября, 12:55:34

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> While testing MVCC-safe version with stress-tests
> 007_repack_concurrently_mvcc.pl I encountered some random crashes with
> such logs:
> 
> 25-09-02 12:24:40.039 CEST client backend[261907]
> 007_repack_concurrently_mvcc.pl ERROR:  relcache reference
> 0x7715b9f394a8 is not owned by resource owner TopTransaction
> ...
> This time I was clever and tried to attempt to reproduce the issue on
> a non-MVCC safe version at first - and it is reproducible.

Thanks again for a thorough testing!

I think this should be fixed separately [1].

[1] https://www.postgresql.org/message-id/119497.1756892972%40localhost

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

23 сентября, 18:51:06

Hello,

Barring further commentary, I intend to get 0001 committed tomorrow, and
0002 some time later -- perhaps by end of this week, or sometime next
week.

Regards

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Adding REPACK [concurrently]

От

Álvaro Herrera

Дата:

25 сентября, 21:12:41

After looking at this some more, I realized that 0001 had been written a
bit too hastily and that it could use with some more cleanup -- in
particular, we don't need to export most of the function prototypes
other than vacuuming_main() (and the trivial escape_quotes helper).  I
made the other functions static.  Also, prepare_vacuum_command() also
needs the encoding in order to do fmtIdEnc() on a given index name (for
`pg_repackdb -t table --index=foobar`), so I changed it to take the
PGconn instead of just the serverVersion.  I realized that it makes no
sense that objfilter is a global variable instead of living inside
`main` and be passed as argument where needed.  (Heck, maybe it should
be inside vacuumingOpts).  Lastly, it seemed weird coding that the
functions would sometimes exit(1) instead of returning a result code, so
I made them do that and have the callers react appropriately.  These are
all fairly straightforward changes.

So here's v22 with those and rebased to current sources.  Only the first
two patches this time, which are the ones I would be glad to receive
input on.

I also wonder if analyze_only and analyze_in_stages should be new values
in RunMode rather than separate booleans ... I think that might make the
code simpler.  I didn't try though.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)

Вложения

Re: Adding REPACK [concurrently]

От

Marcos Pegoraro

Дата:

25 сентября, 23:20:34

Em qui., 25 de set. de 2025 às 15:12, Álvaro Herrera <alvherre@kurilemu.de> escreveu:

Some typos I've found on usage of pg_repackdb.

+ printf(_(" -n, --schema=SCHEMA repack tables in the specified schema(s) only\n"));
+ printf(_(" -N, --exclude-schema=SCHEMA do not repack tables in the specified schema(s)\n"));

both options can point to a single schema, so "(s)" should be removed.

"in the specified schema(s)" should be "in the specified schema"

Same occurs on this one, which should be table, not table(s)

+ printf(_(" -t, --table='TABLE' repack specific table(s) only\n"));

regards

Marcos

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

26 сентября, 00:31:35

On Thu, Sep 25, 2025 at 4:21 PM Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> Em qui., 25 de set. de 2025 às 15:12, Álvaro Herrera <alvherre@kurilemu.de> escreveu:
>
> Some typos I've found on usage of pg_repackdb.
>
> + printf(_("  -n, --schema=SCHEMA             repack tables in the specified schema(s) only\n"));
> + printf(_("  -N, --exclude-schema=SCHEMA     do not repack tables in the specified schema(s)\n"));
> both options can point to a single schema, so "(s)" should be removed.
> "in the specified schema(s)" should be "in the specified schema"
>
> Same occurs on this one, which should be table, not table(s)
> + printf(_("  -t, --table='TABLE'             repack specific table(s) only\n"));
>

This pattern is used because you can pass more than one argument, for
example, something like

pg_repackdb -d pagila -v -n public -n legacy

While I agree that the wording is a little awkward; I'd prefer "repack
tables only in the specified schema(s)"; but this follows the same
pattern as pg_dump and friends.

Robert Treat
https://xzilla.net

Re: Adding REPACK [concurrently]

От

Marcos Pegoraro

Дата:

26 сентября, 00:46:48

Em qui., 25 de set. de 2025 às 18:31, Robert Treat <rob@xzilla.net> escreveu:

This pattern is used because you can pass more than one argument, for
example, something like

I know that

While I agree that the wording is a little awkward, this follows the same
pattern as pg_dump and friends.

well, I think pg_dump looks wrong too. Because if you explain that it's a

single table or single schema on docs, why you write on plural on usage ?

+ Repack or analyze all tables in
+ <replaceable class="parameter">schema</replaceable> only. Multiple
+ schemas can be repacked by writing multiple <option>-n</option>
+ switches.

instead of

+ printf(_(" -n, --schema=SCHEMA repack tables in the specified schema(s) only\n"));

maybe this ?

+ printf(_(" -n, --schema=SCHEMA repack tables in the specified schema, can be used several times\n"));

regards

Marcos

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

26 сентября, 17:27:12

Hello!

Álvaro Herrera <alvherre@kurilemu.de>:
> So here's v22 with those and rebased to current sources.  Only the first
> two patches this time, which are the ones I would be glad to receive
> input on.

> get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
>                          Oid relid, bool rel_is_index)

Should we rename it to repack_context to be aligned with the calling side?

---------
'cmd' in

> static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
>                           MemoryContext permcxt);

but 'command' in

> get_tables_to_repack(RepackCommand command, bool usingindex,
>                 MemoryContext permcxt)

---------

> cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",

May be changed to RepackCommandAsString

-----------

if (cmd == REPACK_COMMAND_REPACK)
    pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
                          PROGRESS_REPACK_COMMAND_REPACK);
else if (cmd == REPACK_COMMAND_CLUSTER)
{
    pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
                          PROGRESS_CLUSTER_COMMAND_CLUSTER);
} else ....

'{' and '}' looks a little bit weird.

--------
Documentation of pg_repackdb contains a lot of "analyze" and even
"--analyze" parameter - but I can't see anything related in the code.


Best regards,
Mikhail.

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

26 сентября, 20:30:09

On Thu, Sep 25, 2025 at 2:12 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> So here's v22 with those and rebased to current sources.  Only the first
> two patches this time, which are the ones I would be glad to receive
> input on.
>

A number of small issues I noticed. I don't know that they all need
addressing right now, but seems worth asking the questions...

#1
"pg_repackdb --help" does not mention the --index option, although the
flag is accepted. I'm not sure if this is meant to match clusterdb,
but since we need the index option to invoke the clustering behavior,
I think it needs to be there.

#2
[xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
--index=idx_last_name
pg_repackdb: repacking database "pagila"
INFO:  clustering "public.customer" using sequential scan and sort

[xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
pg_repackdb: repacking database "pagila"
INFO:  vacuuming "public.customer"

This was less confusing once I figured out we could pass the --index
option, but even with that it is a little confusing, I think mostly
because it looks like we are "vacuuming" the table, which in a world
of repack and vacuum (ie. no vacuum full) doesn't make sense. I think
the right thing to do here would be to modify it to be "repacking %s"
in both cases, with the "using sequential scan and sort" as the means
to understand which version of repack is being executed.

#3
pg_repackdb does not offer an --analyze option, which istm it should
to match the REPACK command

#4
SQL level REPACK help shows:

where option can be one of:
    VERBOSE [ boolean ]
    ANALYSE | ANALYZE

but SQL level VACUUM does
    VERBOSE [ boolean ]
    ANALYZE [ boolean ]

These operate the same way, so I would expect it to match the language
in vacuum.

#5
[xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index
pg_repackdb: repacking database "pagila"

In the above scenario, I am repacking without having previously
specified an index. At the SQL level this would throw an error, at the
command line it gives me a heart attack. :-)
It's actually not that bad, because we don't actually do anything, but
maybe we should throw an error?

#6
On the individual command pages (like sql-repack.html), I think there
should be more cross-linking, ie. repack should probably say "see also
cluster" and vice versa. Likely similarly with vacuum and repack.

#7
Is there some reason you chose to intermingle the repack regression
tests with the existing tests? I feel like it'd be easier to
differentiate potential regressions and new functionality if these
were separated.

Robert Treat
https://xzilla.net

Re: Adding REPACK [concurrently]

От

Álvaro Herrera

Дата:

07 октября, 17:05:33

On 2025-Sep-26, Mihail Nikalayeu wrote:

> Should we rename it to repack_context to be aligned with the calling side?

Sure, done.

> > cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
> 
> May be changed to RepackCommandAsString

Oh, of course.

> Documentation of pg_repackdb contains a lot of "analyze" and even
> "--analyze" parameter - but I can't see anything related in the code.

Hmm, yeah, that was missing.  I added it.  In doing so I noticed that
because vacuumdb allows a column list to be given, then we should do
likewise here, both in pg_repackdb and in the REPACK command, so I added
support for that.  This changed the grammar a little bit.  Note that we
still don't allow multiple tables to be given to the SQL command REPACK,
so if you want to repack multiple tables, you need to call it without
giving a name or give the name of a partitioned table.  The pg_repackdb
utility allows you to give multiple -t switches, and in that case it
calls REPACK once for each name.

Also, if you give a column list to pg_repackdb, then you must pass -z.
This is consistent with vacuumdb via VACUUM ANALYZE.

On 2025-Sep-26, Robert Treat wrote:

> #1
> "pg_repackdb --help" does not mention the --index option, although the
> flag is accepted. I'm not sure if this is meant to match clusterdb,
> but since we need the index option to invoke the clustering behavior,
> I think it needs to be there.

Oops, yes, added.

> #2
> [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
> --index=idx_last_name
> pg_repackdb: repacking database "pagila"
> INFO:  clustering "public.customer" using sequential scan and sort
> 
> [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
> pg_repackdb: repacking database "pagila"
> INFO:  vacuuming "public.customer"
> 
> This was less confusing once I figured out we could pass the --index
> option, but even with that it is a little confusing, I think mostly
> because it looks like we are "vacuuming" the table, which in a world
> of repack and vacuum (ie. no vacuum full) doesn't make sense. I think
> the right thing to do here would be to modify it to be "repacking %s"
> in both cases, with the "using sequential scan and sort" as the means
> to understand which version of repack is being executed.

I changed these messages to always say "repacking", but it will say
"using sequential scan and sort", or "using index", or "following
physical order", respectively.

That said, on this topic, I've always been bothered by our usage of
command names as verbs, because they are (IMO) horrible for translation.
For instance, in this version of the patch I am making this change:

    if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
        ereport(ERROR,
-               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                errmsg("cannot cluster a shared catalog")));
+               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+               errmsg("cannot run %s on a shared catalog",
+                      RepackCommandAsString(cmd)));

In the old version, the message is not very translatable because you
have to find a native word to say "to cluster" or "to vacuum", and that
doesn't always work very well in a direct translation.  For instance, in
the Spanish message catalog you find this sort of thing:

msgid "vacuuming \"%s.%s.%s\""
msgstr "haciendo vacuum a «%s.%s.%s»"

which is pretty clear ... but the reason it works, is that I have turned
the phrase around before translating it.  I would struggle if I had to
find a Spanish verb that means "to repack" without contorting the
message or saying something absurd and/or against Spanish language
rules, such as "ejecutando repack en table XYZ" or "repaqueando tabl
XYZ" (that's not a word!) or "reempaquetando tabla XYZ" (this is
correct, but far enough from "repack" that it's annoying and potentially
confusing).  So I would rather the original used "running REPACK on
table using method XYZ", which is very very easy to translate, and then
the translator doesn't have to editorialize.

> #3
> pg_repackdb does not offer an --analyze option, which istm it should
> to match the REPACK command

Added, as mentioned above.

> #4

Fixed.

> #5
> [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index
> pg_repackdb: repacking database "pagila"
> 
> In the above scenario, I am repacking without having previously
> specified an index. At the SQL level this would throw an error, at the
> command line it gives me a heart attack. :-)
> It's actually not that bad, because we don't actually do anything, but
> maybe we should throw an error?

Yeah, I think this is confusing.  I think we should make pg_repackdb
explicitly indicate what has been done, in all cases, without requiring
-v.  Otherwise it's too confusing, particularly for the using-index mode
which determines which tables to process based on the existance of an
index marked indiscluster.

> #6
> On the individual command pages (like sql-repack.html), I think there
> should be more cross-linking, ie. repack should probably say "see also
> cluster" and vice versa. Likely similarly with vacuum and repack.

Hmm, I don't necessarily agree -- I think the sql-cluster page should be
mostly empty and reference the sql-repack page.  We don't need any
incoming links to sql-cluster, I think.  All the useful info should be
in the sql-repack page only.  The same applies for VACUUM FULL: an
outgoing link in sql-vacuum to sql-repack is good to have, but we don't
need links from sql-repack to sql-vacuum.

> #7
> Is there some reason you chose to intermingle the repack regression
> tests with the existing tests? I feel like it'd be easier to
> differentiate potential regressions and new functionality if these
> were separated.

I admit I haven't paid too much attention to these tests.  I think I
would rather create a separate src/test/regress/sql/repack.sql file with
the tests for this command.  Let's consider this part a WIP for now --
clearly more tests are needed both for the SQL command CLUSTER and for
pg_repackdb.

In the meantime, this version has been rebased to current sources.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Вложения

v23-0001-Add-REPACK-command.patch

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

09 октября, 09:38:15

Álvaro Herrera <alvherre@kurilemu.de> wrote:

> On 2025-Sep-26, Mihail Nikalayeu wrote:
>
> > Should we rename it to repack_context to be aligned with the calling side?
>
> Sure, done.
>
> > > cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
> >
> > May be changed to RepackCommandAsString
>
> Oh, of course.
>
> > Documentation of pg_repackdb contains a lot of "analyze" and even
> > "--analyze" parameter - but I can't see anything related in the code.
>
> Hmm, yeah, that was missing.  I added it.  In doing so I noticed that
> because vacuumdb allows a column list to be given, then we should do
> likewise here, both in pg_repackdb and in the REPACK command, so I added
> support for that.

+    /*
+     * Make sure ANALYZE is specified if a column list is present.
+     */
+    if ((params->options & CLUOPT_ANALYZE) == 0 && stmt->relation->va_cols != NIL)
+        ereport(ERROR,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                 errmsg("ANALYZE option must be specified when a column list is provided")));

Shouldn't the user documentation mention this restriction?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

Álvaro Herrera

Дата:

09 октября, 14:49:13

On 2025-Oct-09, Antonin Houska wrote:

> +    /*
> +     * Make sure ANALYZE is specified if a column list is present.
> +     */
> +    if ((params->options & CLUOPT_ANALYZE) == 0 && stmt->relation->va_cols != NIL)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                 errmsg("ANALYZE option must be specified when a column list is provided")));
> 
> Shouldn't the user documentation mention this restriction?

Hmm, yeah, I guess it should.  Will add.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"¿Cómo puedes confiar en algo que pagas y que no ves,
y no confiar en algo que te dan y te lo muestran?" (Germán Poo)

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

10 октября, 17:11:32

Hello,

Here's patch v24.  I was hoping to push this today, but I think there
were too many changes from v23 for that.  Here's what I did:

- pg_stat_progress_cluster is no longer a view on top of the low-level
  pg_stat_get_progress_info() function.  Instead, it's a view on top of
  pg_stat_progress_repack.  The only change it applies on top of that
  one is change the command from REPACK to one of VACUUM FULL or
  CLUSTER, depending on whether an index is being used or not.  This
  should keep the behavior identical to previous versions.
  Alternatively we could just hide rows where the command is REPACK, but
  I don't think that would be any better.  This way, we maintain
  compatibility with tools reading pg_stat_progress_cluster.  Maybe this
  is useless and we should just drop the view, not sure, we can discuss
  separately.

- pg_stat_progress_repack itself now shows the command.  Also I got rid
  of the separate enum values for the command, and instead used the
  values from the parse node (RepackCommand); this removes about a dozen
  lines of C code.  To forestall potentially bogus usage of value 0, I
  made the enum start from 1.

- I noticed that you can do "CLUSTER pg_class ON some_index" and it will
  happily modify pg_index.indisclustered, which is a bit weird
  considering that allow_system_table_mods is off -- if you later try
  ALTER TABLE .. SET WITHOUT CLUSTER, it won't let you.  I think this is
  bogus and we should change it so that CLUSTER refuses to change the
  clustered index on a system catalog, unless allow_system_table_mods is
  on.  However, that would be a change from longstanding behavior which
  is specifically tested for in regression tests, so I didn't do it.
  We can discuss such a change separately.  But I did make REPACK refuse
  to do that, because we don't need to propagate bogus historical
  behavior.  So REPACK will fail if you try to change the indisclustered
  index, but it will work fine if you repack based on the same index as
  before, or repack with no index.

- pg_repackdb: if you try with a non-superuser without specifying a
  table name, it will fail as soon as it hits the first catalog table or
  whatever with "ERROR: cannot lock this table".  This is sorta fine for
  vacuumdb, but only because VACUUM itself will instead say "WARNING:
  cannot lock table XYZ, skipping", so it's not an error and vacuumdb
  keeps running.  IMO this is bogus: vacuumdb should not try to process
  tables that it doesn't have privileges to.  However, not wanting to
  change longstanding behavior, I left that alone.  For pg_repackdb, I
  added a condition in the WHERE clause there to only fetch tables that
  the current user has MAINTAIN privilege over.  Then you can do a
  "pg_repackdb -U foobar" and it will nicely process the tables that
  that user is allowed to process.  We can discuss changing the vacuumdb
  behavior separately.

- Added some additional tests for pg_repackdb and REPACK.

- Updated the docs.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Вложения

v24-0001-Add-REPACK-command.patch

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

13 октября, 03:03:18

On Tue, Oct 7, 2025 at 10:05 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> On 2025-Sep-26, Robert Treat wrote:
<snip>
> That said, on this topic, I've always been bothered by our usage of
> command names as verbs, because they are (IMO) horrible for translation.
> For instance, in this version of the patch I am making this change:
>
>     if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
>         ereport(ERROR,
> -               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> -                errmsg("cannot cluster a shared catalog")));
> +               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +               errmsg("cannot run %s on a shared catalog",
> +                      RepackCommandAsString(cmd)));
>
> In the old version, the message is not very translatable because you
> have to find a native word to say "to cluster" or "to vacuum", and that
> doesn't always work very well in a direct translation.  For instance, in
> the Spanish message catalog you find this sort of thing:
>
> msgid "vacuuming \"%s.%s.%s\""
> msgstr "haciendo vacuum a «%s.%s.%s»"
>
> which is pretty clear ... but the reason it works, is that I have turned
> the phrase around before translating it.  I would struggle if I had to
> find a Spanish verb that means "to repack" without contorting the
> message or saying something absurd and/or against Spanish language
> rules, such as "ejecutando repack en table XYZ" or "repaqueando tabl
> XYZ" (that's not a word!) or "reempaquetando tabla XYZ" (this is
> correct, but far enough from "repack" that it's annoying and potentially
> confusing).  So I would rather the original used "running REPACK on
> table using method XYZ", which is very very easy to translate, and then
> the translator doesn't have to editorialize.
>

I see you didn't do this in the current patch, but +1 for this idea
from me. And if you think it'd help, I'm also +1 on the idea for the
main docs as well, for example doing something like

+  <para>
-   <application>pg_repackdb</application> is a utility for repacking a
+   <application>pg_repackdb</application> is a utility for running REPACK on a
+   <productname>PostgreSQL</productname> database.

I'd be inclined to leave the internal comments alone though, since
they aren't translated.

> > #5
> > [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index
> > pg_repackdb: repacking database "pagila"
> >
> > In the above scenario, I am repacking without having previously
> > specified an index. At the SQL level this would throw an error, at the
> > command line it gives me a heart attack. :-)
> > It's actually not that bad, because we don't actually do anything, but
> > maybe we should throw an error?
>
> Yeah, I think this is confusing.  I think we should make pg_repackdb
> explicitly indicate what has been done, in all cases, without requiring
> -v.  Otherwise it's too confusing, particularly for the using-index mode
> which determines which tables to process based on the existance of an
> index marked indiscluster.
>

At the moment, clusterdb runs silently, but vacuumdb emits output, so
there is an argument for either way as default behavior. That said, I
think the current behavior of vacuum, which is what we are currently
following in pg_repackdb, is the worst of the two:

 [xzilla@zebes]  pgsql/bin/vacuumdb -t actor  pagila
vacuumdb: vacuuming database "pagila"

Without any additional information, the information we do give is
misleading; I would rather not say anything. We could of course try to
make this more verbose, but I think clusterdb actually gets this
right...
- say nothing by default (follow the "rule of silence.")
- if we want to see commands, pass -e
- if we want to see the details, pass -v
- if we do something that causes an error, return the error
- if we don't want errors, pass -q

This is also how reindexdb works, and I think most of the other
utilities, and I'd argue this is how vacuumdb should work... to the
extent I almost consider it a bug that it doesn't (I leave a little
room since I am not sure why it doesn't operate like the other
utilities). vacuum is a bit outside the purview of what we are doing
here, but I do think following clusterdb/reindexdb is the behavior we
should follow for pg_repackdb.

> I admit I haven't paid too much attention to these tests.  I think I
> would rather create a separate src/test/regress/sql/repack.sql file with
> the tests for this command.  Let's consider this part a WIP for now --
> clearly more tests are needed both for the SQL command CLUSTER and for
> pg_repackdb.

Yeah, istm as long as we have all 3 commands (repack, cluster, vacuum
full) we need regression tests for all 3.

> - pg_stat_progress_cluster is no longer a view on top of the low-level
>   pg_stat_get_progress_info() function.  Instead, it's a view on top of
>   pg_stat_progress_repack.  The only change it applies on top of that
>   one is change the command from REPACK to one of VACUUM FULL or
>   CLUSTER, depending on whether an index is being used or not.  This
>   should keep the behavior identical to previous versions.
>   Alternatively we could just hide rows where the command is REPACK, but
>   I don't think that would be any better.  This way, we maintain
>   compatibility with tools reading pg_stat_progress_cluster.  Maybe this
>   is useless and we should just drop the view, not sure, we can discuss
>   separately.
>

I think this mostly depends on how aggressive you want to be in moving
people away from cluster and toward repack. If we remove
_progress_cluster, it will force people to update monitoring which
probably encourages people to switch to pg_repackdb. We probably need
to have at least one "bridge" release though, and I think you've got
the right balance for that.

> - I noticed that you can do "CLUSTER pg_class ON some_index" and it will
>   happily modify pg_index.indisclustered, which is a bit weird
>   considering that allow_system_table_mods is off -- if you later try
>   ALTER TABLE .. SET WITHOUT CLUSTER, it won't let you.  I think this is
>   bogus and we should change it so that CLUSTER refuses to change the
>   clustered index on a system catalog, unless allow_system_table_mods is
>   on.  However, that would be a change from longstanding behavior which
>   is specifically tested for in regression tests, so I didn't do it.
>   We can discuss such a change separately.  But I did make REPACK refuse
>   to do that, because we don't need to propagate bogus historical
>   behavior.  So REPACK will fail if you try to change the indisclustered
>   index, but it will work fine if you repack based on the same index as
>   before, or repack with no index.
>

Since cluster will presumably be deprecated with this release, I'd
leave the existing behavior and move forward with repack as you've
laid out.

> - pg_repackdb: if you try with a non-superuser without specifying a
>   table name, it will fail as soon as it hits the first catalog table or
>   whatever with "ERROR: cannot lock this table".  This is sorta fine for
>   vacuumdb, but only because VACUUM itself will instead say "WARNING:
>   cannot lock table XYZ, skipping", so it's not an error and vacuumdb
>   keeps running.  IMO this is bogus: vacuumdb should not try to process
>   tables that it doesn't have privileges to.  However, not wanting to
>   change longstanding behavior, I left that alone.  For pg_repackdb, I
>   added a condition in the WHERE clause there to only fetch tables that
>   the current user has MAINTAIN privilege over.  Then you can do a
>   "pg_repackdb -U foobar" and it will nicely process the tables that
>   that user is allowed to process.  We can discuss changing the vacuumdb
>   behavior separately.

Again, vacuumdb seems to be a good example of what not to do, but I'll
leave that for another thread. In general I like this idea, but it
does make for a weird corner case where if I specify a table with -t
that I don't have permission to repack, repack returns silently whilst
doing nothing. I suppose one way to handle that would be to check if
the table passed in -t is found in the list of tables with MAINTAIN
privileges, and if not to issue a WARNING like "%s not found. Make
sure that the table exists and that you have MAINTAIN privileges".

Robert Treat
https://xzilla.net

Re: Adding REPACK [concurrently]

От

Alvaro Herrera

Дата:

31 октября, 02:17:05

Hello,

Here's a new installment of this series, v25, including the CONCURRENTLY
part, which required some conflict fixes on top of the much-changed
v24-0001 patch.

After the talk on this subject for PGConf.EU, there were some
reservations about this whole project, and if I understand correctly,
they can be summarized in these three points:

1. Would the spill files for reorderbuffers occupy as much disk space as
it takes to copy the initial contents of the table, for each active
logical decoding replication slot?  Antonin claims (I haven't verified
this) that there are some hacks in place to avoid this problem, or that
it is easy to install some -- and if so, then this patch would already
be better than pg_repack.  This perhaps merits more testing.

2. Is the concurrent REPACK operation MVCC-safe?  At the moment, with
the present implementation, no it is not.  There are discussions on
getting this fixed, and Mihail has proposed some patches which at least
are quite short, though their safety is something we need to assess in
more depth.

3. Would the xmin horizon remain stuck at the spot where REPACK started,
thereby preventing VACUUM from cleaning up recently-dead rows in other
tables?  As I understand, with the current implementation, yes it would,
and we cannot easily apply hacks such as PROC_IN_VACUUM to prevent it,
because it would introduce the same problems it did for CREATE INDEX
CONCURRENTLY that was fixed in pg14 (commit 042b584c7f7d62).  Mihail and
Antonin have discussed possible ways to ease this, but we don't have
code for that yet.  This is, again, no worse than VACUUM FULL or
CLUSTER, so lack of this wouldn't be a killer for this project, though
of course it would be much better to do better.


I have not yet addressed Robert Treat's feedback from October 12th.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Вложения

Re: Adding REPACK [concurrently]

От

jian he

Дата:

01 ноября, 15:42:47

On Fri, Oct 31, 2025 at 7:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Hello,
>
> Here's a new installment of this series, v25, including the CONCURRENTLY
> part, which required some conflict fixes on top of the much-changed
> v24-0001 patch.
>
hi.

    if (params.options & CLUOPT_ANALYZE)
        ereport(ERROR,
                errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                errmsg("cannot %s multiple tables", "REPACK (ANALYZE)"));
for this error case, adding a simple test case would be better?

+ /* Do an analyze, if requested */
+ if (params->options & CLUOPT_ANALYZE)
+ {
+ VacuumParams vac_params = {0};
+
+ vac_params.options |= VACOPT_ANALYZE;
+ if (params->options & CLUOPT_VERBOSE)
+ vac_params.options |= VACOPT_VERBOSE;
+ analyze_rel(RelationGetRelid(rel), NULL, vac_params,
+ stmt->relation->va_cols, true, NULL);
+ }

Looking at the comments in struct VacuumParams, some fields have nonzero default
values — for example, log_vacuum_min_duration.
Do we need to explicitly set these fields to their default values?
(see ExecVacuum)

repack.sgml can also add a
<refsect1> <title>See Also</title>
similar to analyze.sgml, vacuum.sgml

doc/src/sgml/ref/repack.sgml
synopsis section missing syntax:
REPACK USING INDEX

I am wondering, can we also support
REPACK opt_utility_option_list USING INDEX

MATERIALIZED VIEW:
create materialized view a_________ as select * from t2;

repack (verbose);
INFO:  repacking "public.a_________" in physical order
INFO:  "public.a_________": found 0 removable, 10 nonremovable row
versions in 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

cluster (verbose);
won't touch materialized view a_________

but materialized views don't have bloat, nothing can be removed.
So here we are waste cycles to repack materialized view?

Re: Adding REPACK [concurrently]

От

Sergei Kornilov

Дата:

01 ноября, 15:53:33

Hello!

> but materialized views don't have bloat, nothing can be removed.

REFRESH MATERIALIZED VIEW CONCURRENTLY does not replace relation completely but updates the relation using insert and
deletequeries (refresh_by_match_merge in src/backend/commands/matview.c) - so there may be bloat.
 

regards, Sergei

Re: Adding REPACK [concurrently]

От

Mihail Nikalayeu

Дата:

01 ноября, 21:16:00

Hello!

On Fri, Oct 31, 2025 at 12:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Here's a new installment of this series, v25, including the CONCURRENTLY
> part, which required some conflict fixes on top of the much-changed
> v24-0001 patch.

 > * cluster.c
 > *      CLUSTER a table on an index.  This is now also used for VACUUM FULL.

 Should we add something about repack here?

 > ii_ExclusinOps
 typo here.

 >  * index is inserted into catalogs and needs to be built later on.
 Now it is only in case concurrently == true

>   * Build the index information for the new index.  Note that rebuild of
>   * indexes with exclusion constraints is not supported, hence there is no
>   * need to fill all the ii_Exclusion* fields.

Now the function supports its in !concurrently mode. Should we fill
ii_Exclusion? Also, it says

> If !concurrently, ii_ExclusinOps is currently not needed.

But it is not clear - why not?

>   newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
>                           oldInfo->ii_NumIndexKeyAttrs,
>                           oldInfo->ii_Am,
>                           indexExprs,
>                           indexPreds,
>                           oldInfo->ii_Unique,
>                           oldInfo->ii_NullsNotDistinct,
>                           false,  /* not ready for inserts */
>                           true,
>                           indexRelation->rd_indam->amsummarizing,
>                           oldInfo->ii_WithoutOverlaps);

Is it ok we pass isready == false if !concurrent?
Also, we pass concurrent == true even if concurrently == false - feels
strange and probably wrong.


> This difference does has no impact on XidInMVCCSnapshot().
Should it be "This difference has no impact"?

> * pgoutput_cluster.c
> *       src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
 it is pgoutput_trepack.c :)

Best regards,
Mikhail.

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

03 ноября, 10:56:40

Mihail Nikalayeu <mihailnikalayeu@gmail.com> wrote:

> Hello!
>
> On Fri, Oct 31, 2025 at 12:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Here's a new installment of this series, v25, including the CONCURRENTLY
> > part, which required some conflict fixes on top of the much-changed
> > v24-0001 patch.
>
>  > * cluster.c
>  > *      CLUSTER a table on an index.  This is now also used for VACUUM FULL.

ok

>  Should we add something about repack here?
>
>  > ii_ExclusinOps
>  typo here.

ok

>  >  * index is inserted into catalogs and needs to be built later on.
>  Now it is only in case concurrently == true

ok

> >   * Build the index information for the new index.  Note that rebuild of
> >   * indexes with exclusion constraints is not supported, hence there is no
> >   * need to fill all the ii_Exclusion* fields.
>
> Now the function supports its in !concurrently mode. Should we fill
> ii_Exclusion? Also, it says
>
> > If !concurrently, ii_ExclusinOps is currently not needed.

> But it is not clear - why not?

Right, makeIndexInfo() needs to be adjusted.

>
> >   newInfo = makeIndexInfo(oldInfo->ii_NumIndexAttrs,
> >                           oldInfo->ii_NumIndexKeyAttrs,
> >                           oldInfo->ii_Am,
> >                           indexExprs,
> >                           indexPreds,
> >                           oldInfo->ii_Unique,
> >                           oldInfo->ii_NullsNotDistinct,
> >                           false,  /* not ready for inserts */
> >                           true,
> >                           indexRelation->rd_indam->amsummarizing,
> >                           oldInfo->ii_WithoutOverlaps);
>
> Is it ok we pass isready == false if !concurrent?
> Also, we pass concurrent == true even if concurrently == false - feels
> strange and probably wrong.

You're right, both arguments are wrong.

> > This difference does has no impact on XidInMVCCSnapshot().
> Should it be "This difference has no impact"?

ok

> > * pgoutput_cluster.c
> > *       src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
>  it is pgoutput_trepack.c :)

ok

I'll fix all the problems in the next version. Thanks!

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

jian he

Дата:

05 ноября, 05:48:21

On Fri, Oct 31, 2025 at 7:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Hello,
>
> Here's a new installment of this series, v25, including the CONCURRENTLY
> part, which required some conflict fixes on top of the much-changed
> v24-0001 patch.
>

 <refnamediv>
  <refname>pg_repackdb</refname>
  <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
  database</refpurpose>
 </refnamediv>

but with --all option specified, it's doing repack whole cluster.
(more than one database).
I am not fully sure this description is OK.


I think pg_repackdb Synopsis section:
pg_repackdb [connection-option...] [option...] [ -t | --table table [(
column [,...] )] ] ... [ dbname | -a | --all ]
pg_repackdb [connection-option...] [option...] [ -n | --schema schema
] ... [ dbname | -a | --all ]
pg_repackdb [connection-option...] [option...] [ -N | --exclude-schema
schema ] ... [ dbname | -a | --all ]

can be simplified the same way as as pg_dump:

pg_repackdb [connection-option...] [option...]  [ dbname | -a | --all ]

------------------------
[-d] dbname
[--dbname=]dbname

what do you think to expand it as below:
dbname
-d dbname
--dbname=dbname
--------------------

+ printf(_("      --index[=INDEX]             repack following an index\n"));
should it be
+ printf(_("--index[=INDEX]                   repack following an index\n"));
?


similar to pg_dump:
    printf(_("\nIf no database name is supplied, then the PGDATABASE
environment\n"
             "variable value is used.\n\n"));

in pg_repackdb help section, we can mention:
    printf(_("\nIf no database name is supplied and --all option not
specified then the PGDATABASE environment\n"
             "variable value is used.\n\n"));
Do you think it's necessary?


what the expectation of
pg_repackdb --index=index_name, the doc is not very helpful.

pg_repackdb --analyze --index=zz --verbose
pg_repackdb: repacking database "src3"
pg_repackdb: error: processing of database "src3" failed: ERROR:  "zz"
is not an index for table "tenk1"

select pg_get_indexdef ('zz'::regclass);
                  pg_get_indexdef
---------------------------------------------------
 CREATE INDEX zz ON public.tenk2 USING btree (two)

------
jian he
EDB: http://www.enterprisedb.com

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

05 ноября, 08:10:47

On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:
> On Fri, Oct 31, 2025 at 7:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Hello,
> >
> > Here's a new installment of this series, v25, including the CONCURRENTLY
> > part, which required some conflict fixes on top of the much-changed
> > v24-0001 patch.
> >
>
>  <refnamediv>
>   <refname>pg_repackdb</refname>
>   <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
>   database</refpurpose>
>  </refnamediv>
>
> but with --all option specified, it's doing repack whole cluster.
> (more than one database).
> I am not fully sure this description is OK.
>

This wording came from vacuumdb, which operates the same way, and I
don't think it's lead to confusion. And while I don't think we need to
take away the option, I see no reason to encourage the idea that
people should be doing cluster wide full database repacks. On that
note, I'd take the "and analyze" from the refpurpose as well; the more
I look at it, I see pg_repackdb as a replacement for clusterdb, with
selected bells and whistles from vacuum full or external repack-type
tooling, but at the end of the day that's a simpler model for
operators, and helps draw a distinction for which features we DONT
need to include, like -Z (ie. analyze only; if you want that, use
vacuumdb, not pg_repackdb)

>
> I think pg_repackdb Synopsis section:
> pg_repackdb [connection-option...] [option...] [ -t | --table table [(
> column [,...] )] ] ... [ dbname | -a | --all ]
> pg_repackdb [connection-option...] [option...] [ -n | --schema schema
> ] ... [ dbname | -a | --all ]
> pg_repackdb [connection-option...] [option...] [ -N | --exclude-schema
> schema ] ... [ dbname | -a | --all ]
>
> can be simplified the same way as as pg_dump:
>
> pg_repackdb [connection-option...] [option...]  [ dbname | -a | --all ]
>

I think it's laid out that way in vacuumdb to indicate that those
options are exclusive of one another. I'm not sure how convincing that
is, but the above would need to do more to make the switch imo.

> ------------------------
> [-d] dbname
> [--dbname=]dbname
>
> what do you think to expand it as below:
> dbname
> -d dbname
> --dbname=dbname

not sure i am following this one, but the brackets are the standard
way we should items to be optional, which in either case they are.

> --------------------
>
> + printf(_("      --index[=INDEX]             repack following an index\n"));
> should it be
> + printf(_("--index[=INDEX]                   repack following an index\n"));
> ?
>

I believe this is included for alignment, since this option has no
shorthand version.

>
> similar to pg_dump:
>     printf(_("\nIf no database name is supplied, then the PGDATABASE
> environment\n"
>              "variable value is used.\n\n"));
>
> in pg_repackdb help section, we can mention:
>     printf(_("\nIf no database name is supplied and --all option not
> specified then the PGDATABASE environment\n"
>              "variable value is used.\n\n"));
> Do you think it's necessary?
>

no. (again, looking first at clusterdb, and also vacuumdb, neither of
which have it).

>
> what the expectation of
> pg_repackdb --index=index_name, the doc is not very helpful.
>
> pg_repackdb --analyze --index=zz --verbose
> pg_repackdb: repacking database "src3"
> pg_repackdb: error: processing of database "src3" failed: ERROR:  "zz"
> is not an index for table "tenk1"
>
> select pg_get_indexdef ('zz'::regclass);
>                   pg_get_indexdef
> ---------------------------------------------------
>  CREATE INDEX zz ON public.tenk2 USING btree (two)
>

Hmm... yes, this is a bit confusing. I didn't verify it in the code,
but from memory I think the --index option is meant to be used only in
conjunction with --table, in which case it would repack the table
using the specified index. I could be overlooking something though.

Robert Treat
https://xzilla.net

Re: Adding REPACK [concurrently]

От

Antonin Houska

Дата:

05 ноября, 10:12:19

Robert Treat <rob@xzilla.net> wrote:

> On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:

> > what the expectation of
> > pg_repackdb --index=index_name, the doc is not very helpful.
> >
> > pg_repackdb --analyze --index=zz --verbose
> > pg_repackdb: repacking database "src3"
> > pg_repackdb: error: processing of database "src3" failed: ERROR:  "zz"
> > is not an index for table "tenk1"
> >
> > select pg_get_indexdef ('zz'::regclass);
> >                   pg_get_indexdef
> > ---------------------------------------------------
> >  CREATE INDEX zz ON public.tenk2 USING btree (two)
> >
>
> Hmm... yes, this is a bit confusing. I didn't verify it in the code,
> but from memory I think the --index option is meant to be used only in
> conjunction with --table, in which case it would repack the table
> using the specified index. I could be overlooking something though.

The corresponding code is:

+    /*
+     * In REPACK mode, if the 'using_index' option was given but no index
+     * name, filter only tables that have an index with indisclustered set.
+     * (If an index name is given, we trust the user to pass a reasonable list
+     * of tables.)
+     *
+     * XXX it may be worth printing an error if an index name is given with no
+     * list of tables.
+     */
+    if (vacopts->mode == MODE_REPACK &&
+        vacopts->using_index && !vacopts->indexname)
+    {
+        appendPQExpBufferStr(&catalog_query,
+                             " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
+                             "    WHERE indrelid = c.oid AND indisclustered)\n");
+    }

I'm not sure if it's worth allowing the --index option to have an
argument. Since the user can specify multiple tables, he should also be able
to specify multiple indexes. And then the question would be: what should
happen if the user forgot to specify (or just mistyped) the index name for a
table which does not yet have the clustering index set? Skip that table (and
print out a warning)? Or consider it an error?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

От

jian he

Дата:

05 ноября, 11:46:04

On Wed, Nov 5, 2025 at 1:11 PM Robert Treat <rob@xzilla.net> wrote:
> > --------------------
> >
> > + printf(_("      --index[=INDEX]             repack following an index\n"));
> > should it be
> > + printf(_("--index[=INDEX]                   repack following an index\n"));
> > ?
> >
>
> I believe this is included for alignment, since this option has no
> shorthand version.
>

if you compare pg_dump --help, pg_repackdb --help
then you will see the inconsistency.

This is legacy behavior, but can we move some of the error checks in
do_analyze_rel to an earlier point?
we call cluster_rel before analyze_rel, cluster_rel is way more time-consuming,
a failure in analyze_rel means all the previous work (cluster_rel) is wasted.

+ else if (HeadMatches("REPACK", "(*") &&
+ !HeadMatches("REPACK", "(*)"))
+ {
+ /*
+ * This fires if we're in an unfinished parenthesized option list.
+ * get_previous_words treats a completed parenthesized option list as
+ * one word, so the above test is correct.
+ */
+ if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+ COMPLETE_WITH("VERBOSE");
+ else if (TailMatches("VERBOSE"))
+ COMPLETE_WITH("ON", "OFF");
+ }
this part can also support the ANALYZE option?

ClusterStmt
should be removed from src/tools/pgindent/typedefs.list?

doc/src/sgml/ref/clusterdb.sgml
  <para>
   <application>clusterdb</application> has been superceded by
   <application>pg_repackdb</application>.
  </para>
google told me, "superceded" should be "superseded"

--
jian he
EDB: http://www.enterprisedb.com

Re: Adding REPACK [concurrently]

От

Robert Treat

Дата:

10 ноября, 01:13:43

On Wed, Nov 5, 2025 at 2:12 AM Antonin Houska <ah@cybertec.at> wrote:
> Robert Treat <rob@xzilla.net> wrote:
> > On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:
>
> > > what the expectation of
> > > pg_repackdb --index=index_name, the doc is not very helpful.
> > >
> > > pg_repackdb --analyze --index=zz --verbose
> > > pg_repackdb: repacking database "src3"
> > > pg_repackdb: error: processing of database "src3" failed: ERROR:  "zz"
> > > is not an index for table "tenk1"
> > >
> > > select pg_get_indexdef ('zz'::regclass);
> > >                   pg_get_indexdef
> > > ---------------------------------------------------
> > >  CREATE INDEX zz ON public.tenk2 USING btree (two)
> > >
> >
> > Hmm... yes, this is a bit confusing. I didn't verify it in the code,
> > but from memory I think the --index option is meant to be used only in
> > conjunction with --table, in which case it would repack the table
> > using the specified index. I could be overlooking something though.
>
> The corresponding code is:
>
> +       /*
> +        * In REPACK mode, if the 'using_index' option was given but no index
> +        * name, filter only tables that have an index with indisclustered set.
> +        * (If an index name is given, we trust the user to pass a reasonable list
> +        * of tables.)
> +        *
> +        * XXX it may be worth printing an error if an index name is given with no
> +        * list of tables.
> +        */
> +       if (vacopts->mode == MODE_REPACK &&
> +               vacopts->using_index && !vacopts->indexname)
> +       {
> +               appendPQExpBufferStr(&catalog_query,
> +                                                        " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
> +                                                        "    WHERE indrelid = c.oid AND indisclustered)\n");
> +       }
>
> I'm not sure if it's worth allowing the --index option to have an
> argument. Since the user can specify multiple tables, he should also be able
> to specify multiple indexes. And then the question would be: what should
> happen if the user forgot to specify (or just mistyped) the index name for a
> table which does not yet have the clustering index set? Skip that table (and
> print out a warning)? Or consider it an error?
>

Ah, yes, this is something completely different. So, we do need a way
to differentiate between "vacuum full" vs "cluster" all tables... as
well as "vacuum full" vs "cluster" of a specific table (including the
idea of "vacuum full" of a previously clustered table, and the
existing code handles all that (though I might quibble with the option
name).

As for having an --index= option, I'd love to hear the use case;
something like partitions or maybe some client per schema situation
comes to mind, but ISTM in all those cases the user would also know
(or be expected to know) the table name, so I agree with Antonin that
the extra complexity doesn't seem worth supporting to me. (It's even
worse the more you think about it, what if some table has the index
named above, but is clustered on a different index, then what should
we do?)

As for the use case I was thinking of, specifying a table and index in
order to repack using that index (and setting indisclustered if not
already); while I feel like that would be a useful option, if it isn't
currently supported I don't see a strong argument for adding it now.

Robert Treat
https://xzilla.net

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Adding REPACK [concurrently]

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения