Hi,
We have encountered a few instances where logical replication errors out during SaveSlotToPath() after creating the state.tmp file, but before it was renamed (due to ENOSPC, for example). In these cases, since state.tmp is not cleaned up and is created with the O_EXCL flag, further invocations of SaveSlotToPath() for this slot will error out on OpenTransientFile() with EEXIST, completely blocking slot metadata persistence. The only explicit cleanup for state.tmp occurs during server startup as part of RestoreSlotFromDisk().
It doesn't seem that this function relies on data written to state.tmp previously, so O_EXCL is unnecessary. Attaching a patch that swaps O_EXCL for O_TRUNC, ensuring a fresh state.tmp is available for writing.
Thanks,
Kevin