Re: long-standing data loss bug in initial sync of logical replication
| От | Tomas Vondra |
|---|---|
| Тема | Re: long-standing data loss bug in initial sync of logical replication |
| Дата | |
| Msg-id | c1e5ccd0-9681-4959-8c8a-ad4853064e98@enterprisedb.com обсуждение исходный текст |
| Ответ на | Re: long-standing data loss bug in initial sync of logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
| Ответы |
Re: long-standing data loss bug in initial sync of logical replication
|
| Список | pgsql-hackers |
On 6/25/24 07:04, Amit Kapila wrote: > On Mon, Jun 24, 2024 at 8:06 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 6/24/24 12:54, Amit Kapila wrote: >>> ... >>>> >>>>>> I'm not sure there are any cases where using SRE instead of AE would cause >>>>>> problems for logical decoding, but it seems very hard to prove. I'd be very >>>>>> surprised if just using SRE would not lead to corrupted cache contents in some >>>>>> situations. The cases where a lower lock level is ok are ones where we just >>>>>> don't care that the cache is coherent in that moment. >>>> >>>>> Are you saying it might break cases that are not corrupted now? How >>>>> could obtaining a stronger lock have such effect? >>>> >>>> No, I mean that I don't know if using SRE instead of AE would have negative >>>> consequences for logical decoding. I.e. whether, from a logical decoding POV, >>>> it'd suffice to increase the lock level to just SRE instead of AE. >>>> >>>> Since I don't see how it'd be correct otherwise, it's kind of a moot question. >>>> >>> >>> We lost track of this thread and the bug is still open. IIUC, the >>> conclusion is to use SRE in OpenTableList() to fix the reported issue. >>> Andres, Tomas, please let me know if my understanding is wrong, >>> otherwise, let's proceed and fix this issue. >>> >> >> It's in the commitfest [https://commitfest.postgresql.org/48/4766/] so I >> don't think we 'lost track' of it, but it's true we haven't done much >> progress recently. >> > > Okay, thanks for pointing to the CF entry. Would you like to take care > of this? Are you seeing anything more than the simple fix to use SRE > in OpenTableList()? > I did not find a simpler fix than adding the SRE, and I think pretty much any other fix is guaranteed to be more complex. I don't remember all the details without relearning all the details, but IIRC the main challenge for me was to convince myself it's a sufficient and reliable fix (and not working simply by chance). I won't have time to look into this anytime soon, so feel free to take care of this and push the fix. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: