Обсуждение: rebuild big tables with pgrepack

Поиск
Список
Период
Сортировка

rebuild big tables with pgrepack

От
ek ek
Дата:
Hello everyone,
I’m going to rebuild a 900GB table using pg_repack. I’m hesitant to do such a large operation in one go.
Is there an ideal or recommended way to repack very large tables?

Re: rebuild big tables with pgrepack

От
Ron Johnson
Дата:
On Fri, Nov 14, 2025 at 2:14 PM ek ek <livadidrive@gmail.com> wrote:
Hello everyone,
I’m going to rebuild a 900GB table using pg_repack. I’m hesitant to do such a large operation in one go.
Is there an ideal or recommended way to repack very large tables?

Everything in database maintenance is circumstantial.

The basics that I'd do are:
* Verify that you have enough free disk space for both the new table, the new indices and also the WALs generated.
* Do it during a low-activity window.
* Don't run a database backup at the same time.
* First execute with --dry-run.
* Consider the --no-order option.  That'll speed things up.
* And --no-analyze, though you'll have to manually ANALYZE immediately afterwards.
* (I'd probably disable autoanalyze on that table before the repack and then enable it after the manual ANALYZE.)
* The --jobs option speeds up index rebuilds.
* Run it from cron, and redirect both stdout and stderr to the same log file.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: rebuild big tables with pgrepack

От
pg254kl@georgiou.vip
Дата:

On 11/14/25 2:14 PM, ek ek - livadidrive at gmail.com wrote:
Hello everyone,
I’m going to rebuild a 900GB table using pg_repack. I’m hesitant to do such a large operation in one go.
Is there an ideal or recommended way to repack very large tables?

I recall almost 20 years ago when Bruce Momjian was educating us on PostgreSQL (using Sybase ASE, but migrating to PostgreSQL), we discussed the table and index rebuilding mania.  I never forgot the "we have clients that haven't rebuilt anything for years and they run just fine" haha.  OK, sometimes if you have very "hot" tables its warranted, but the point that stayed with me is that it's usually unnecessary.

But ok, you may have your reasons, so...

  1. Connect to your cluster from a box with good connectivity to it (eg: for AWS RDS this means an EC2 instance on same VPC)
  2. Definitely run pg_repack inside a tmux session
  3. Be safe and have at least 3 x pg_total_relation_size(table) free space
  4. Make sure you understand the -k (--no-kill-backend) and --wait-timeout options.  By default (no -k) pg_repack will wait on blocking backends and on a busy table eventually timeout.  Decisions decisions.  Definitely run when things are calm.
  5. If your default toast compression is not lz4, and this table uses TOAST, consider changing it to lz4 prior to this pg_repack.  lz4 is blazingly fast with low cpu cost.
  6. I don't like running anything heavy handed on production without practicing/testing first.  If things go bad and management asks "did you test this?" the answer should always be "yes".  If this is your first time pg_repacking such table, make sure you first do it on a staging environment under heavy load during a performance run.  The cache churn can be an issue if your workload depends on a hot cache.  Again, definitely run during the calmest window, but test under load ;)

At 900GB I'd start considering partitioning, it will make maintenance jobs more efficient, not to mention the smaller indices should help with better use of your cache (if your partition strategy can segregate old unused data from newer used data).

-- 
regards,
Kiriakos Georgiou