Обсуждение: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 19029 Logged by: Thadeus Anand Email address: thadeus@rmkv.com PostgreSQL version: 17.6 Operating system: Ubuntu 24.04.2 Description: We have seven servers distributed geographically. We publish about 62 tables through logical replication, with six subscribers. We were hugely affected by the memory allocation bug after upgrading to 17.5. We had to suspend the replication and resort to manually updating the tables. So we updated to 17.6 almost immediately. Now the logical replication works perfectly, but we notice the replication slot folders increasing in size quite dramatically, even when there is no lag found in the subscribers. We also have a replica set up for streaming replication, and that works fine and that replication slot folder size is normal. So the issue is only with logical replication. Our wal_keep_size is set to 0, and my max_slot_wal_keep_size is set to 20GB. I'm convinced this must be a bug, as it started happening only after the minor version upgrade which was done three days ago. But if you have any other suggestion, please help. Restarting the subscriber does not help. Dropping the subscription clears the replication slot, but that isn't a solution. Now we are dropping and recreating the subscriptions every night, hoping we will get a patch/solution soon. If you need any further data, please ask and I will provide.
On Sun, Aug 24, 2025 at 3:14 AM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 19029 > Logged by: Thadeus Anand > Email address: thadeus@rmkv.com > PostgreSQL version: 17.6 > Operating system: Ubuntu 24.04.2 > Description: > > We have seven servers distributed geographically. We publish about 62 tables > through logical replication, with six subscribers. We were hugely affected > by the memory allocation bug after upgrading to 17.5. We had to suspend the > replication and resort to manually updating the tables. > > So we updated to 17.6 almost immediately. Now the logical replication works > perfectly, but we notice the replication slot folders increasing in size > quite dramatically, even when there is no lag found in the subscribers. > What do you mean by this? Do you mean that restart_lsn of logical slots is not moving forward or number of slots are growing in $PGDATA/pg_replslot or something else? Specifically, I want to know what do you mean by: 'but we notice the replication slot folders increasing in size'? -- With Regards, Amit Kapila.
Re: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
От
Thadeus Anand
Дата:
Hi,
The number of replication slots does not increase at all. But the size of each folder keeps increasing simultaneously.
When I restart a subscriber, the size of the respective slot resets, then immediately grows up to the size of the slots of other subscribers.
The table data is properly synchronized across the locations, and the status of the replication remains as 'streaming'. So, effectively, the logical replication works (which it wasn't in version 17.5), but the replication slot isn't resetting.
Thadeus Anand.
On Mon, Aug 25, 2025 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Aug 24, 2025 at 3:14 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 19029
> Logged by: Thadeus Anand
> Email address: thadeus@rmkv.com
> PostgreSQL version: 17.6
> Operating system: Ubuntu 24.04.2
> Description:
>
> We have seven servers distributed geographically. We publish about 62 tables
> through logical replication, with six subscribers. We were hugely affected
> by the memory allocation bug after upgrading to 17.5. We had to suspend the
> replication and resort to manually updating the tables.
>
> So we updated to 17.6 almost immediately. Now the logical replication works
> perfectly, but we notice the replication slot folders increasing in size
> quite dramatically, even when there is no lag found in the subscribers.
>
What do you mean by this? Do you mean that restart_lsn of logical
slots is not moving forward or number of slots are growing in
$PGDATA/pg_replslot or something else? Specifically, I want to know
what do you mean by: 'but we notice the replication slot folders
increasing in size'?
--
With Regards,
Amit Kapila.
On Mon, Aug 25, 2025 at 10:39 AM Thadeus Anand <thadeus@rmkv.com> wrote: > > Hi, > > The number of replication slots does not increase at all. But the size of each folder keeps increasing simultaneously. > Can you share an example as to what you mean by folder keeps increasing? Is the slot size on disk is increasing? > When I restart a subscriber, the size of the respective slot resets, then immediately grows up to the size of the slotsof other subscribers. > To understand the problem you are facing, we need to see some real data on the size. Can you share the output of pg_replication_slots both before and after the folder's size increase? > The table data is properly synchronized across the locations, and the status of the replication remains as 'streaming'.So, effectively, the logical replication works (which it wasn't in version 17.5), but the replication slot isn'tresetting. > Is it possible to share a reproducer or some steps to reproduce the problem? -- With Regards, Amit Kapila.
Re: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
От
Thadeus Anand
Дата:
Hi,
> Can you share an example as to what you mean by folder keeps
> increasing? Is the slot size on disk is increasing?
> increasing? Is the slot size on disk is increasing?
The size on the disk keeps increasing. Last week, it went upto 35 GB per slot, then I had to put a 20GB limit on max_slot_wal_keep_size. After reaching the size limit, the subscriptions went inactive. Then I dropped the subscriptions and freed up the replication slots.
> To understand the problem you are facing, we need to see some real
> data on the size. Can you share the output of pg_replication_slots
> both before and after the folder's size increase?
> data on the size. Can you share the output of pg_replication_slots
> both before and after the folder's size increase?
I cannot share the real data right now because I have dropped the subscriptions and publications altogether. I can create them again tonight (I am in India) and share the details tomorrow.
> Is it possible to share a reproducer or some steps to reproduce the problem?
What I had was a straightforward logical replication setup. We had two publications and six subscribing servers. Please also note that this setup was working fine with PostgreSQL 15. During May we upgraded to 17.5, and ran into a bug which is described in the following release notes section of 17.6
Avoid re-distributing cache invalidation messages from other transactions during logical replication (vignesh C) §
Our previous round of minor releases included a bug fix to ensure that replication receiver processes would respond to cross-process cache invalidation messages, preventing them from using stale catalog data while performing replication updates. However, the fix unintentionally made them also redistribute those messages again, leading to an exponential increase in the number of invalidation messages, which would often end in a memory allocation failure. Fix by not redistributing received messages.
Due to the bug, we stopped using logical replication and started it again after updating to 17.6. Now we are having this issue.
Thadeus Anand.
On Mon, Aug 25, 2025 at 10:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 25, 2025 at 10:39 AM Thadeus Anand <thadeus@rmkv.com> wrote:
>
> Hi,
>
> The number of replication slots does not increase at all. But the size of each folder keeps increasing simultaneously.
>
Can you share an example as to what you mean by folder keeps
increasing? Is the slot size on disk is increasing?
> When I restart a subscriber, the size of the respective slot resets, then immediately grows up to the size of the slots of other subscribers.
>
To understand the problem you are facing, we need to see some real
data on the size. Can you share the output of pg_replication_slots
both before and after the folder's size increase?
> The table data is properly synchronized across the locations, and the status of the replication remains as 'streaming'. So, effectively, the logical replication works (which it wasn't in version 17.5), but the replication slot isn't resetting.
>
Is it possible to share a reproducer or some steps to reproduce the problem?
--
With Regards,
Amit Kapila.
RE: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Thadeus, > The size on the disk keeps increasing. Last week, it went upto 35 GB per slot, then I had to put a 20GB limit on max_slot_wal_keep_size. After reaching the size, the subscriptions went inactive. Then I dropped the subscriptions and freed up the replication slots. > Can you clarify which directory occupied the disk? Is it `pg_wal`? You told that "upto 35GB per slot", and 6 replication slots exist on your system, so did the disk usage increase 210GB in total? > I cannot share the real data right now because I have dropped the subscriptions and publications altogether. I can create them again tonight (I am in India) and share the details tomorrow. > Thanks. If possible, can you share the script to emulate system architecture and settings? It is very helpful to fully understand the shape of the system and definitions. > What I had was a straightforward logical replication setup. We had two publications and six subscribing servers. Please also note that this setup was working fine with PostgreSQL 15. During May we upgraded to 17.5, and ran into a bug which is described in the following release notes section of 17.6 > So you did run both DDLs and DMLs on the publisher side right? Can you also provide SQL commands you run on the system? It's very helpful if all SQLs are written in the executable scripts or something. Best regards, Hayato Kuroda FUJITSU LIMITED
Hi,
> Can you clarify which directory occupied the disk? Is it `pg_wal`?
> You told that "upto 35GB per slot", and 6 replication slots exist on your system,
> so did the disk usage increase 210GB in total?
> You told that "upto 35GB per slot", and 6 replication slots exist on your system,
> so did the disk usage increase 210GB in total?
Under the pg_replslot folder, each replication slot's folder increased simultaneously. There were 12 folders (two publications, six subscribers), and the total size increase was about 420 GB).
> Thanks. If possible, can you share the script to emulate system architecture and
> settings? It is very helpful to fully understand the shape of the system and
> definitions.
> settings? It is very helpful to fully understand the shape of the system and
> definitions.
I will share as much data as I can.
> So you did run both DDLs and DMLs on the publisher side right? Can you also
> provide SQL commands you run on the system? It's very helpful if all SQLs are
> written in the executable scripts or something.
> provide SQL commands you run on the system? It's very helpful if all SQLs are
> written in the executable scripts or something.
This is a complex ERP setup and I may not be able to give you all the SQL from the publisher side. All of these 62 tables are either master tables or configuration tables, so only INSERTs, DELETEs and UPDATEs happen, nothing else.
Thank you for your help. I sincerely hope this is a real issue and not a stupid configuration error at my end effectively wasting your valuable time.
Thadeus Anand.
On Mon, Aug 25, 2025 at 1:35 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
Dear Thadeus,
>
The size on the disk keeps increasing. Last week, it went upto 35 GB per slot,
then I had to put a 20GB limit on max_slot_wal_keep_size. After reaching the size,
the subscriptions went inactive. Then I dropped the subscriptions and freed up
the replication slots.
>
Can you clarify which directory occupied the disk? Is it `pg_wal`?
You told that "upto 35GB per slot", and 6 replication slots exist on your system,
so did the disk usage increase 210GB in total?
>
I cannot share the real data right now because I have dropped the subscriptions
and publications altogether. I can create them again tonight (I am in India) and
share the details tomorrow.
>
Thanks. If possible, can you share the script to emulate system architecture and
settings? It is very helpful to fully understand the shape of the system and
definitions.
>
What I had was a straightforward logical replication setup. We had two publications
and six subscribing servers. Please also note that this setup was working fine
with PostgreSQL 15. During May we upgraded to 17.5, and ran into a bug which is
described in the following release notes section of 17.6
>
So you did run both DDLs and DMLs on the publisher side right? Can you also
provide SQL commands you run on the system? It's very helpful if all SQLs are
written in the executable scripts or something.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
RE: [CAUTION: SUSPECT SENDER] RE: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Thadeus, > Under the pg_replslot folder, each replication slot's folder increased > simultaneously. There were 12 folders (two publications, six subscribers), and > the total size increase was about 420 GB). Oh, I misunderstood like that number of WAL files increased. Let me do some more questions to diagnose your system. Can you share us the logical_decoding_work_mem on the publisher side? You can obtain via: ``` SHOW logical_decoding_work_mem; ``` Also, when you succeeded to reproduce, can you run the below command to see the conent of the directory? ${DATA_PUB} can be tuned based on your environment. ``` du -sh ${DATA_PUB}/pg_replslot/*/* ``` This can see for which file uses the disk so much. Best regards, Hayato Kuroda FUJITSU LIMITED
Hi,
The logical_decoding_work_mem at the publisher is currently set at 1 GB.
I remember setting this a while ago as part of my struggle to get rid of the memory allocation issue.
Thadeus Anand.
On Mon, 25 Aug, 2025, 3:19 pm Hayato Kuroda (Fujitsu), <kuroda.hayato@fujitsu.com> wrote:
Dear Thadeus,
> Under the pg_replslot folder, each replication slot's folder increased
> simultaneously. There were 12 folders (two publications, six subscribers), and
> the total size increase was about 420 GB).
Oh, I misunderstood like that number of WAL files increased.
Let me do some more questions to diagnose your system.
Can you share us the logical_decoding_work_mem on the publisher side? You can obtain via:
```
SHOW logical_decoding_work_mem;
```
Also, when you succeeded to reproduce, can you run the below command to see the
conent of the directory? ${DATA_PUB} can be tuned based on your environment.
```
du -sh ${DATA_PUB}/pg_replslot/*/*
```
This can see for which file uses the disk so much.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Mon, 25 Aug 2025 at 12:59, Thadeus Anand <thadeus@rmkv.com> wrote: > > Hi, > > > Can you share an example as to what you mean by folder keeps > > increasing? Is the slot size on disk is increasing? > > The size on the disk keeps increasing. Last week, it went upto 35 GB per slot, then I had to put a 20GB limit on max_slot_wal_keep_size.After reaching the size limit, the subscriptions went inactive. Then I dropped the subscriptions andfreed up the replication slots. > > > To understand the problem you are facing, we need to see some real > > data on the size. Can you share the output of pg_replication_slots > > both before and after the folder's size increase? > > I cannot share the real data right now because I have dropped the subscriptions and publications altogether. I can createthem again tonight (I am in India) and share the details tomorrow. > > > Is it possible to share a reproducer or some steps to reproduce the problem? > > What I had was a straightforward logical replication setup. We had two publications and six subscribing servers. Pleasealso note that this setup was working fine with PostgreSQL 15. During May we upgraded to 17.5, and ran into a bug whichis described in the following release notes section of 17.6 Have you tested this on PG16 or any other PG17 versions where it works successfully? I’m trying to determine which specific version has this issue. Regards, Vignesh
Re: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
От
Thadeus Anand
Дата:
Hi,
In case my previous emails didn't make it clear, we started having this issue after upgrading to 17.6 only.
Thadeus Anand.
On Mon, 25 Aug, 2025, 3:36 pm vignesh C, <vignesh21@gmail.com> wrote:
On Mon, 25 Aug 2025 at 12:59, Thadeus Anand <thadeus@rmkv.com> wrote:
>
> Hi,
>
> > Can you share an example as to what you mean by folder keeps
> > increasing? Is the slot size on disk is increasing?
>
> The size on the disk keeps increasing. Last week, it went upto 35 GB per slot, then I had to put a 20GB limit on max_slot_wal_keep_size. After reaching the size limit, the subscriptions went inactive. Then I dropped the subscriptions and freed up the replication slots.
>
> > To understand the problem you are facing, we need to see some real
> > data on the size. Can you share the output of pg_replication_slots
> > both before and after the folder's size increase?
>
> I cannot share the real data right now because I have dropped the subscriptions and publications altogether. I can create them again tonight (I am in India) and share the details tomorrow.
>
> > Is it possible to share a reproducer or some steps to reproduce the problem?
>
> What I had was a straightforward logical replication setup. We had two publications and six subscribing servers. Please also note that this setup was working fine with PostgreSQL 15. During May we upgraded to 17.5, and ran into a bug which is described in the following release notes section of 17.6
Have you tested this on PG16 or any other PG17 versions where it works
successfully? I’m trying to determine which specific version has this
issue.
Regards,
Vignesh
On Mon, Aug 25, 2025 at 3:34 PM Thadeus Anand <thadeus@rmkv.com> wrote: > > The logical_decoding_work_mem at the publisher is currently set at 1 GB. > > I remember setting this a while ago as part of my struggle to get rid of the memory allocation issue. > Hmm, this means that your transaction size has a large number of changes which leads to spilling of changes. This makes another question of Kuroda-San more important, which is to show the size and contents of pg_replslot. I have tried to check the fixes done in 17.5 and 17.6 but I don't see any obvious change which could lead to such a problem. I could be missing something which we can try to find with more information and probably if you can share a test. -- With Regards, Amit Kapila.
Hi,
In the meanwhile, I'm sharing my configuration file here, just in case you need to know any particular setting.
Again, thank you very much for taking the time to help.
Thadeus Anand.
On Mon, Aug 25, 2025 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 25, 2025 at 3:34 PM Thadeus Anand <thadeus@rmkv.com> wrote:
>
> The logical_decoding_work_mem at the publisher is currently set at 1 GB.
>
> I remember setting this a while ago as part of my struggle to get rid of the memory allocation issue.
>
Hmm, this means that your transaction size has a large number of
changes which leads to spilling of changes. This makes another
question of Kuroda-San more important, which is to show the size and
contents of pg_replslot. I have tried to check the fixes done in 17.5
and 17.6 but I don't see any obvious change which could lead to such a
problem. I could be missing something which we can try to find with
more information and probably if you can share a test.
--
With Regards,
Amit Kapila.
Вложения
Hi,
We created the publications and subscriptions again last night after restarting the publisher and all the subscribers.
So far, everything seems to be working fine, and the replication slot folders remain at 4 KB each.
I did not change anything, and didn't do anything differently this time. In any case, I will monitor this closely and let you know if any malfunction.occurs.
Thanks and regards,
Thadeus Anand.
On Mon, Aug 25, 2025 at 5:05 PM Thadeus Anand <thadeus@rmkv.com> wrote:
Hi,In the meanwhile, I'm sharing my configuration file here, just in case you need to know any particular setting.Again, thank you very much for taking the time to help.Thadeus Anand.On Mon, Aug 25, 2025 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:On Mon, Aug 25, 2025 at 3:34 PM Thadeus Anand <thadeus@rmkv.com> wrote:
>
> The logical_decoding_work_mem at the publisher is currently set at 1 GB.
>
> I remember setting this a while ago as part of my struggle to get rid of the memory allocation issue.
>
Hmm, this means that your transaction size has a large number of
changes which leads to spilling of changes. This makes another
question of Kuroda-San more important, which is to show the size and
contents of pg_replslot. I have tried to check the fixes done in 17.5
and 17.6 but I don't see any obvious change which could lead to such a
problem. I could be missing something which we can try to find with
more information and probably if you can share a test.
--
With Regards,
Amit Kapila.
On Tue, Aug 26, 2025 at 1:45 PM Thadeus Anand <thadeus@rmkv.com> wrote: > > We created the publications and subscriptions again last night after restarting the publisher and all the subscribers. > > So far, everything seems to be working fine, and the replication slot folders remain at 4 KB each. > > I did not change anything, and didn't do anything differently this time. In any case, I will monitor this closely and letyou know if any malfunction.occurs. > Thanks for the update. -- With Regards, Amit Kapila.
Hi,
Since this afternoon, the replication slot folders have started to grow in size again. I have attached a couple of screenshots showing the size and some contents of a folder. Hope you find them useful.
I checked the database log at the publisher, and could find nothing suspicious around that time when the spill files started appearing.
Thadeus Anand.
On Tue, Aug 26, 2025 at 4:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 26, 2025 at 1:45 PM Thadeus Anand <thadeus@rmkv.com> wrote:
>
> We created the publications and subscriptions again last night after restarting the publisher and all the subscribers.
>
> So far, everything seems to be working fine, and the replication slot folders remain at 4 KB each.
>
> I did not change anything, and didn't do anything differently this time. In any case, I will monitor this closely and let you know if any malfunction.occurs.
>
Thanks for the update.
--
With Regards,
Amit Kapila.
Вложения
Hi,
Here is the result of "select * from pg_replication_slots"
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase | inactive_since | conflicting | invalidation_reason | failover | synced |
rmkv_crm_sub_twn | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,409 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_main_sub_twn | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,410 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
standby_slot | [NULL] | physical | [NULL] | [NULL] | FALSE | TRUE | 240,458 | [NULL] | [NULL] | 2EA/834D54B8 | [NULL] | reserved | 21,486,545,736 | FALSE | [NULL] | [NULL] | [NULL] | FALSE | FALSE |
rmkv_crm_sub_cbe | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,404 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_main_sub_cbe | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,406 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_crm_sub_tvl | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,408 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_main_sub_tvl | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,411 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_crm_sub_blr | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,405 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_main_sub_blr | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,407 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_crm_sub_vch | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,403 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_main_sub_vch | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,401 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_crm_sub_vdp | pgoutput | logical | 5 | postgres | FALSE | TRUE | 240,400 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/834D54B8 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
rmkv_main_sub_vdp | pgoutput | logical | 5 | postgres | FALSE | TRUE | 419,107 | [NULL] | 10861325 | 2E9/296801D0 | 2EA/1E788780 | reserved | 15,681,629,000 | FALSE | [NULL] | FALSE | [NULL] | FALSE | FALSE |
Thadeus Anand.
On Tue, Aug 26, 2025 at 5:44 PM Thadeus Anand <thadeus@rmkv.com> wrote:
Hi,Since this afternoon, the replication slot folders have started to grow in size again. I have attached a couple of screenshots showing the size and some contents of a folder. Hope you find them useful.I checked the database log at the publisher, and could find nothing suspicious around that time when the spill files started appearing.Thadeus Anand.On Tue, Aug 26, 2025 at 4:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:On Tue, Aug 26, 2025 at 1:45 PM Thadeus Anand <thadeus@rmkv.com> wrote:
>
> We created the publications and subscriptions again last night after restarting the publisher and all the subscribers.
>
> So far, everything seems to be working fine, and the replication slot folders remain at 4 KB each.
>
> I did not change anything, and didn't do anything differently this time. In any case, I will monitor this closely and let you know if any malfunction.occurs.
>
Thanks for the update.
--
With Regards,
Amit Kapila.
Hi Thadeus Anand, Recently in our postgres 16.4 version system we have encountered similar kind of pg_repslot folder size growing abnormally. The files in this folders are just spill over Wal files due to long running or complex transactions. There are commands to check the slot's holding transactions count which related to this spill. If required can share the same. So suggesting to monitor the transactions/sub transactions that are executing. Well for our case, one of the past transaction invoked the issue.
Not sure whether it's bug which have to be addressed in 17.5.
On Tue, 26 Aug, 2025, 17:51 Thadeus Anand, <thadeus@rmkv.com> wrote:
Hi,Here is the result of "select * from pg_replication_slots"
slot_name plugin slot_type datoid database temporary active active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn wal_status safe_wal_size two_phase inactive_since conflicting invalidation_reason failover synced rmkv_crm_sub_twn pgoutput logical 5 postgres FALSE TRUE 240,409 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_twn pgoutput logical 5 postgres FALSE TRUE 240,410 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE standby_slot [NULL] physical [NULL] [NULL] FALSE TRUE 240,458 [NULL] [NULL] 2EA/834D54B8 [NULL] reserved 21,486,545,736 FALSE [NULL] [NULL] [NULL] FALSE FALSE rmkv_crm_sub_cbe pgoutput logical 5 postgres FALSE TRUE 240,404 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_cbe pgoutput logical 5 postgres FALSE TRUE 240,406 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_crm_sub_tvl pgoutput logical 5 postgres FALSE TRUE 240,408 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_tvl pgoutput logical 5 postgres FALSE TRUE 240,411 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_crm_sub_blr pgoutput logical 5 postgres FALSE TRUE 240,405 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_blr pgoutput logical 5 postgres FALSE TRUE 240,407 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_crm_sub_vch pgoutput logical 5 postgres FALSE TRUE 240,403 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_vch pgoutput logical 5 postgres FALSE TRUE 240,401 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_crm_sub_vdp pgoutput logical 5 postgres FALSE TRUE 240,400 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_vdp pgoutput logical 5 postgres FALSE TRUE 419,107 [NULL] 10861325 2E9/296801D0 2EA/1E788780 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE Thadeus Anand.On Tue, Aug 26, 2025 at 5:44 PM Thadeus Anand <thadeus@rmkv.com> wrote:Hi,Since this afternoon, the replication slot folders have started to grow in size again. I have attached a couple of screenshots showing the size and some contents of a folder. Hope you find them useful.I checked the database log at the publisher, and could find nothing suspicious around that time when the spill files started appearing.Thadeus Anand.On Tue, Aug 26, 2025 at 4:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:On Tue, Aug 26, 2025 at 1:45 PM Thadeus Anand <thadeus@rmkv.com> wrote:
>
> We created the publications and subscriptions again last night after restarting the publisher and all the subscribers.
>
> So far, everything seems to be working fine, and the replication slot folders remain at 4 KB each.
>
> I did not change anything, and didn't do anything differently this time. In any case, I will monitor this closely and let you know if any malfunction.occurs.
>
Thanks for the update.
--
With Regards,
Amit Kapila.
Hi Nantha,
Thank you for your valuable input. So it's really not a bug, but an existing behavior you mean.
I do not know or understand what a "spill" is. I will look it up. But the tables that are part of the publication are not updated as part of any huge transaction. They may be part of some other long running procedures though. If that can create such spills, we can look into them.
Can someone please advise on how to avoid such spills please?
Thanks and regards,
Thadeus Anand.
On Wed, 27 Aug, 2025, 8:09 am Nantha kumar.T., <nanthad@gmail.com> wrote:
Hi Thadeus Anand, Recently in our postgres 16.4 version system we have encountered similar kind of pg_repslot folder size growing abnormally. The files in this folders are just spill over Wal files due to long running or complex transactions. There are commands to check the slot's holding transactions count which related to this spill. If required can share the same. So suggesting to monitor the transactions/sub transactions that are executing. Well for our case, one of the past transaction invoked the issue.Not sure whether it's bug which have to be addressed in 17.5.On Tue, 26 Aug, 2025, 17:51 Thadeus Anand, <thadeus@rmkv.com> wrote:Hi,Here is the result of "select * from pg_replication_slots"
slot_name plugin slot_type datoid database temporary active active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn wal_status safe_wal_size two_phase inactive_since conflicting invalidation_reason failover synced rmkv_crm_sub_twn pgoutput logical 5 postgres FALSE TRUE 240,409 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_twn pgoutput logical 5 postgres FALSE TRUE 240,410 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE standby_slot [NULL] physical [NULL] [NULL] FALSE TRUE 240,458 [NULL] [NULL] 2EA/834D54B8 [NULL] reserved 21,486,545,736 FALSE [NULL] [NULL] [NULL] FALSE FALSE rmkv_crm_sub_cbe pgoutput logical 5 postgres FALSE TRUE 240,404 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_cbe pgoutput logical 5 postgres FALSE TRUE 240,406 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_crm_sub_tvl pgoutput logical 5 postgres FALSE TRUE 240,408 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_tvl pgoutput logical 5 postgres FALSE TRUE 240,411 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_crm_sub_blr pgoutput logical 5 postgres FALSE TRUE 240,405 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_blr pgoutput logical 5 postgres FALSE TRUE 240,407 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_crm_sub_vch pgoutput logical 5 postgres FALSE TRUE 240,403 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_vch pgoutput logical 5 postgres FALSE TRUE 240,401 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_crm_sub_vdp pgoutput logical 5 postgres FALSE TRUE 240,400 [NULL] 10861325 2E9/296801D0 2EA/834D54B8 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE rmkv_main_sub_vdp pgoutput logical 5 postgres FALSE TRUE 419,107 [NULL] 10861325 2E9/296801D0 2EA/1E788780 reserved 15,681,629,000 FALSE [NULL] FALSE [NULL] FALSE FALSE Thadeus Anand.On Tue, Aug 26, 2025 at 5:44 PM Thadeus Anand <thadeus@rmkv.com> wrote:Hi,Since this afternoon, the replication slot folders have started to grow in size again. I have attached a couple of screenshots showing the size and some contents of a folder. Hope you find them useful.I checked the database log at the publisher, and could find nothing suspicious around that time when the spill files started appearing.Thadeus Anand.On Tue, Aug 26, 2025 at 4:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:On Tue, Aug 26, 2025 at 1:45 PM Thadeus Anand <thadeus@rmkv.com> wrote:
>
> We created the publications and subscriptions again last night after restarting the publisher and all the subscribers.
>
> So far, everything seems to be working fine, and the replication slot folders remain at 4 KB each.
>
> I did not change anything, and didn't do anything differently this time. In any case, I will monitor this closely and let you know if any malfunction.occurs.
>
Thanks for the update.
--
With Regards,
Amit Kapila.
Dear Thadeus, Thanks for sharing the info. From your screenshot, I can see that there is a transaction 10861356 which modifies many tuples. Logical decoding has a mechanism to spill a part of changes to the disk to avoid using much amount of memory and it seems to be used here. I think as next step we can clarify which process starts the transaction. Can you run below query when you reproduce the issue? ``` # SELECT * FROM pg_stat_activity WHERE backend_xid = '${XID}'; ``` Files under pg_replslot/${slot_name} has a format: `xid-${XID}-lsn-${LSN_UPPER}-${LSN_LOWER}.spill` so that ${XID} can be tuned when it happns. In your attached case, 10861356. Best regards, Hayato Kuroda FUJITSU LIMITED
Thank you Hayato.
I will check this out today. I'm glad this isn't a bug, but an issue that I can address myself. We can't afford to put the brakes on logical replication for another three months.
Thadeus Anand.
On Wed, 27 Aug, 2025, 9:02 am Hayato Kuroda (Fujitsu), <kuroda.hayato@fujitsu.com> wrote:
Dear Thadeus,
Thanks for sharing the info. From your screenshot, I can see that there is a
transaction 10861356 which modifies many tuples. Logical decoding has a mechanism
to spill a part of changes to the disk to avoid using much amount of memory and
it seems to be used here.
I think as next step we can clarify which process starts the transaction. Can
you run below query when you reproduce the issue?
```
# SELECT * FROM pg_stat_activity WHERE backend_xid = '${XID}';
```
Files under pg_replslot/${slot_name} has a format: `xid-${XID}-lsn-${LSN_UPPER}-${LSN_LOWER}.spill`
so that ${XID} can be tuned when it happns. In your attached case, 10861356.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Thadeus, > I do not know or understand what a "spill" is. As I wrote in a previous mail, logical decoding (and logical replication) sometimes writes changes into the disk to avoid using too-much memory, and the ".spill" file is the fragment. This can happen when the total amount of decoded changes exceeds logical_decoding_work_mem. > I will look it up. But the tables that are part of the > publication are not updated as part of any huge transaction. They may be part of some other > long running procedures Actually, logical decoding decodes all changes even if tables are not published, and suppress outputting changes at commit phase. There was a proposal to skip decoding such changes, it could not improve the performance though. [1]: https://commitfest.postgresql.org/patch/5585/ Best regards, Hayato Kuroda FUJITSU LIMITED
On Wed, Aug 27, 2025 at 9:09 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Thadeus, > > > I do not know or understand what a "spill" is. > > As I wrote in a previous mail, logical decoding (and logical replication) > sometimes writes changes into the disk to avoid using too-much memory, and the > ".spill" file is the fragment. This can happen when the total amount of decoded > changes exceeds logical_decoding_work_mem. > > > I will look it up. But the tables that are part of the > > publication are not updated as part of any huge transaction. They may be part of some other > > long running procedures > > Actually, logical decoding decodes all changes even if tables are not published, > and suppress outputting changes at commit phase. > One can avoid the generation of so many spill files for large transactions by using streaming=on or streaming=parallel while creating subscriptions. -- With Regards, Amit Kapila.
Hi,
Thank you everyone for your wonderful guidance. Really appreciate it.
I have a couple of doubts.
Can these spill files in the pg_replslot folder be deleted safely, if all the subscribers have caught up? (I noticed last night when I restarted my publisher, the spill files were removed automatically)
Can I use ALTER SUBSCRIPTION to set streaming on, with my existing subscriptions?
Thadeus.
On Wed, Aug 27, 2025 at 10:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Aug 27, 2025 at 9:09 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Thadeus,
>
> > I do not know or understand what a "spill" is.
>
> As I wrote in a previous mail, logical decoding (and logical replication)
> sometimes writes changes into the disk to avoid using too-much memory, and the
> ".spill" file is the fragment. This can happen when the total amount of decoded
> changes exceeds logical_decoding_work_mem.
>
> > I will look it up. But the tables that are part of the
> > publication are not updated as part of any huge transaction. They may be part of some other
> > long running procedures
>
> Actually, logical decoding decodes all changes even if tables are not published,
> and suppress outputting changes at commit phase.
>
One can avoid the generation of so many spill files for large
transactions by using streaming=on or streaming=parallel while
creating subscriptions.
--
With Regards,
Amit Kapila.
Hi Thadeus Anand, This spill over files are held by slots thinking that it need for sequential apply of transaction. Manually deletion may have impact on the logical streaming. I too had that thought but ours high critical production system and so I didn't have courage to proceed further.
Rather, alter publication/subscription would work/help.
If any have technical insights on this manual intervention, kindly please share which will help for our case too.
On Wed, 27 Aug, 2025, 11:01 Thadeus Anand, <thadeus@rmkv.com> wrote:
Hi,Thank you everyone for your wonderful guidance. Really appreciate it.I have a couple of doubts.Can these spill files in the pg_replslot folder be deleted safely, if all the subscribers have caught up? (I noticed last night when I restarted my publisher, the spill files were removed automatically)Can I use ALTER SUBSCRIPTION to set streaming on, with my existing subscriptions?Thadeus.On Wed, Aug 27, 2025 at 10:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:On Wed, Aug 27, 2025 at 9:09 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Thadeus,
>
> > I do not know or understand what a "spill" is.
>
> As I wrote in a previous mail, logical decoding (and logical replication)
> sometimes writes changes into the disk to avoid using too-much memory, and the
> ".spill" file is the fragment. This can happen when the total amount of decoded
> changes exceeds logical_decoding_work_mem.
>
> > I will look it up. But the tables that are part of the
> > publication are not updated as part of any huge transaction. They may be part of some other
> > long running procedures
>
> Actually, logical decoding decodes all changes even if tables are not published,
> and suppress outputting changes at commit phase.
>
One can avoid the generation of so many spill files for large
transactions by using streaming=on or streaming=parallel while
creating subscriptions.
--
With Regards,
Amit Kapila.
Dear Thadeus, Nantha, > Can these spill files in the pg_replslot folder be deleted safely, if all the subscribers have > caught up? Do not remove them manually. Spill files exist because the logical replication system considers they are needed. I cannot know your system well, but in your first example, there is a possibility that transaction 10861356 is not committed yet thus walsender keep decoding and spilling. Actually it looks for me that subscribers have not been caught up. Removing them may cause unexpected behavior. > (I noticed last night when I restarted my publisher, the spill files were removed > automatically) When postmaster restarts, it checks all directory under pg_replslot and removes all .spill files. This is harmless because spilled changes would be decoded again. > Can I use ALTER SUBSCRIPTION to set streaming on, with my existing subscriptions? Yes, you can. More detail, after you run ALTER SUBSCRIPTION command, the apply worker on the subscriber restarts and establishes the connection again with the different streaming parameter. Note that pg_stat_activity (with the pid) is still needed to understand the issue correctly. Best regards, Hayato Kuroda FUJITSU LIMITED
Hi Hayato san,
Thank you for your response. Thanks again to your colleagues Amit and Vignesh as well for spending their valuable time to help.
I have altered all my subscriptions to enable streaming. I just set it on to begin with, and did not set it to parallel. I will also watch pg_stat_activity and identify the large transactions that may be causing the spills.
Will update back if anything worthwhile comes up.
On a completely different note, is there any possibility for you developers at PostgreSQL to consider making the tables at the subscriber side read only, so that nobody makes changes to them accidentally? I am asking this because sometimes when it happens, the logical replication fails, and making the subscriber tables read only can reduce a lot of headache. Please consider.
Thadeus Anand.
On Wed, Aug 27, 2025 at 12:51 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
Dear Thadeus, Nantha,
> Can these spill files in the pg_replslot folder be deleted safely, if all the subscribers have
> caught up?
Do not remove them manually. Spill files exist because the logical replication
system considers they are needed. I cannot know your system well, but in your
first example, there is a possibility that transaction 10861356 is not committed
yet thus walsender keep decoding and spilling. Actually it looks for me that
subscribers have not been caught up.
Removing them may cause unexpected behavior.
> (I noticed last night when I restarted my publisher, the spill files were removed
> automatically)
When postmaster restarts, it checks all directory under pg_replslot and removes
all .spill files. This is harmless because spilled changes would be decoded again.
> Can I use ALTER SUBSCRIPTION to set streaming on, with my existing subscriptions?
Yes, you can. More detail, after you run ALTER SUBSCRIPTION command, the apply
worker on the subscriber restarts and establishes the connection again with the
different streaming parameter.
Note that pg_stat_activity (with the pid) is still needed to understand the issue correctly.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Thadeus, > On a completely different note, is there any possibility for you developers at > PostgreSQL to consider making the tables at the subscriber side read only, so that > nobody makes changes to them accidentally? I am asking this because sometimes when > it happens, the logical replication fails, and making the subscriber tables read > only can reduce a lot of headache. Please consider. How about using trigger functions? By default, data changes done by the apply worker do not fire trigger [1]. So, you can prohibit modifications if you define a trigger function which prohibits DMLs on the subscriber side. Attached script set up what I said. After running the script, you can insert tuples on the pub: ``` publisher=# INSERT INTO foo VALUES (generate_series(1, 10)); INSERT 0 10 ``` And you can see it on the sub. However, you cannot modify tuples via UPDATE: ``` subscriber=# SELECT count(*) FROM foo ; count ------- 10 (1 row) subscriber=# UPDATE foo SET id = 11 WHERE id = 1; ERROR: changing data in foo is prohibit CONTEXT: PL/pgSQL function ban_foo() line 3 at RAISE ``` [1]: https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-DESC-DISABLE-ENABLE-TRIGGER Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
That's a great tip. Will definitely make use of it. Thanks.
Thadeus.
On Wed, Aug 27, 2025 at 3:08 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
Dear Thadeus,
> On a completely different note, is there any possibility for you developers at
> PostgreSQL to consider making the tables at the subscriber side read only, so that
> nobody makes changes to them accidentally? I am asking this because sometimes when
> it happens, the logical replication fails, and making the subscriber tables read
> only can reduce a lot of headache. Please consider.
How about using trigger functions? By default, data changes done by the apply
worker do not fire trigger [1].
So, you can prohibit modifications if you define a trigger function which
prohibits DMLs on the subscriber side.
Attached script set up what I said. After running the script, you can insert tuples on the pub:
```
publisher=# INSERT INTO foo VALUES (generate_series(1, 10));
INSERT 0 10
```
And you can see it on the sub. However, you cannot modify tuples via UPDATE:
```
subscriber=# SELECT count(*) FROM foo ;
count
-------
10
(1 row)
subscriber=# UPDATE foo SET id = 11 WHERE id = 1;
ERROR: changing data in foo is prohibit
CONTEXT: PL/pgSQL function ban_foo() line 3 at RAISE
```
[1]: https://ap-south-1.protection.sophos.com?d=postgresql.org&u=aHR0cHM6Ly93d3cucG9zdGdyZXNxbC5vcmcvZG9jcy9jdXJyZW50L3NxbC1hbHRlcnRhYmxlLmh0bWwjU1FMLUFMVEVSVEFCTEUtREVTQy1ESVNBQkxFLUVOQUJMRS1UUklHR0VS&i=NjM3ZjE2NjA3YjE0ZTAwZTUzOWZhNWU4&t=R3BFaFNMNzVxRFE3by9BajdZUndkRU9PTnpBNWpJdGU2blZrbTN1ejlMND0=&h=c39403d7aeea456b92aeb6203afb9128&s=AVNPUEhUT0NFTkNSWVBUSVb7Ui6ZU7wjaq6zUGJyKluO_pTslCZ6E_4N2mTMoaLlhw
Best regards,
Hayato Kuroda
FUJITSU LIMITED